MotherDuck Retail Analytics: Edge POS to Serverless Data Warehouse
Turn raw point-of-sale data from 50+ retail locations into query-ready Parquet on S3 — automatically indexed by MotherDuck's DuckLake for instant serverless analytics.
The Problem
Retail chains with dozens (or hundreds) of store locations generate massive volumes of POS transaction data:
- Every terminal, every store, every second — basket-level transaction data with SKUs, quantities, prices, payment methods
- Raw data is messy — different POS systems, inconsistent schemas, missing fields, local timestamps
- Central analytics needs clean data — data science and merchandising teams want to query across all stores instantly
- Traditional ETL is slow — nightly batch jobs mean yesterday's data, not today's insights
- The edge gap: There's no lightweight way to transform and batch POS data at each store before shipping it to the warehouse
The result: Terabytes of raw JSON/CSV landing in S3 that requires heavy ETL before anyone can query it.
The Solution
Expanso Edge at every store, MotherDuck DuckLake in the cloud. Expanso handles the messy edge — parsing, enriching, batching, and encoding to Parquet. MotherDuck handles the analytics — auto-indexing Parquet from S3 via DuckLake for instant serverless queries.
Where Expanso Fits In Your Stack
Your analytics backend stays exactly as-is — MotherDuck, DuckDB, your BI tools. Expanso handles what happens at the edge before data reaches S3:
| Layer | What | How Expanso Helps |
|---|---|---|
| Collect | POS transactions from terminals (JSON, CSV, or API) | Single lightweight agent per store — no heavy ETL infrastructure at the edge |
| Enrich | Add store metadata, region, format, calculated fields | Lookups and calculations happen at the store, not in the warehouse |
| Validate | Schema enforcement, null checks, type coercion | Bad records caught at source — only clean data leaves the store |
| Batch | Micro-batch transactions (1000 rows or 10s windows) | Optimal Parquet file sizes for DuckLake indexing |
| Encode | Compress to Parquet with Zstandard compression | 10-50x smaller than raw JSON, columnar for fast analytics |
| Ship | Write to S3 with Hive-style partitioning | DuckLake auto-discovers new partitions instantly |
Store POS → Expanso Edge (enrich, validate, batch, Parquet) → S3 → DuckLake → MotherDuck SQL
Key Benefits
- Real-time-ish analytics: Micro-batched data lands in S3 every 10 seconds, DuckLake indexes it automatically
- 90%+ compression: Raw JSON transactions → Zstandard-compressed Parquet (columnar, typed, compact)
- Zero ETL in the warehouse: Data arrives schema-validated, enriched, and query-ready
- Edge resilience: Local buffering at each store handles network outages — no data loss
- Serverless scale: MotherDuck auto-scales queries across all stores, no cluster management
- Bring your own account: Works with any MotherDuck account and S3 bucket
What You'll Build
This guide walks through creating a production-ready retail POS pipeline that:
- Generates realistic POS transactions with basket items, payment methods, and store metadata
- Enriches with store information (region, format, square footage) and calculated fields
- Validates schema integrity and flags anomalies (negative totals, empty baskets)
- Batches into optimal micro-batches for Parquet encoding
- Encodes to compressed, partitioned Parquet files
- Ships to S3 with Hive-style partitioning for DuckLake auto-indexing
- Queries in MotherDuck with instant SQL across all stores
Sample Transaction Schema
| Field | Type | Description |
|---|---|---|
txn_id | STRING | UUID for each transaction |
store_id | INT32 | Store identifier (1-50) |
terminal_id | INT32 | POS terminal within store (1-10) |
timestamp | INT64 | Unix microseconds |
type | STRING | sale, return, or exchange |
payment_method | STRING | card, cash, mobile, gift_card |
items | JSON | Array of basket items (SKU, qty, unit_price, category) |
item_count | INT32 | Total items in basket |
subtotal | DOUBLE | Pre-tax total |
tax_amount | DOUBLE | Calculated tax |
total_amount | DOUBLE | Final transaction amount |
store_region | STRING | Enriched: NW, SW, NE, SE, MW |
store_format | STRING | Enriched: flagship, standard, express, outlet |
employee_id | STRING | Cashier/associate ID |
Prerequisites
- Expanso Edge installed (installation guide)
- A MotherDuck account (free tier works)
- An S3-compatible bucket (AWS S3, MinIO, or Cloudflare R2)
- AWS credentials configured for S3 writes
Get Started
Choose your path:
Step-by-Step Tutorial
Build the pipeline incrementally:
- Generate POS Transactions
- Enrich with Store Metadata
- Validate and Flag Anomalies
- Batch and Encode to Parquet
- Ship to S3 with Partitioning
- Connect DuckLake and Query
Complete Pipeline
Download the production-ready solution with one-click copy