Skip to main content

MotherDuck Retail Analytics: Edge POS to Serverless Data Warehouse

Turn raw point-of-sale data from 50+ retail locations into query-ready Parquet on S3 — automatically indexed by MotherDuck's DuckLake for instant serverless analytics.

The Problem

Retail chains with dozens (or hundreds) of store locations generate massive volumes of POS transaction data:

  • Every terminal, every store, every second — basket-level transaction data with SKUs, quantities, prices, payment methods
  • Raw data is messy — different POS systems, inconsistent schemas, missing fields, local timestamps
  • Central analytics needs clean data — data science and merchandising teams want to query across all stores instantly
  • Traditional ETL is slow — nightly batch jobs mean yesterday's data, not today's insights
  • The edge gap: There's no lightweight way to transform and batch POS data at each store before shipping it to the warehouse

The result: Terabytes of raw JSON/CSV landing in S3 that requires heavy ETL before anyone can query it.

The Solution

Expanso Edge at every store, MotherDuck DuckLake in the cloud. Expanso handles the messy edge — parsing, enriching, batching, and encoding to Parquet. MotherDuck handles the analytics — auto-indexing Parquet from S3 via DuckLake for instant serverless queries.

Store #1 POS
Store #2 POS
Store #N POS
Expanso Edge (per store)
parse → enrich → validate → batch → Parquet encode
📦 S3 (Partitioned Parquet)
s3://retail-data/store=42/date=2026-02-11/batch_001.parquet
🦆 MotherDuck DuckLake
Auto-indexes new Parquet files → instant SQL queries

Where Expanso Fits In Your Stack

Your analytics backend stays exactly as-is — MotherDuck, DuckDB, your BI tools. Expanso handles what happens at the edge before data reaches S3:

LayerWhatHow Expanso Helps
CollectPOS transactions from terminals (JSON, CSV, or API)Single lightweight agent per store — no heavy ETL infrastructure at the edge
EnrichAdd store metadata, region, format, calculated fieldsLookups and calculations happen at the store, not in the warehouse
ValidateSchema enforcement, null checks, type coercionBad records caught at source — only clean data leaves the store
BatchMicro-batch transactions (1000 rows or 10s windows)Optimal Parquet file sizes for DuckLake indexing
EncodeCompress to Parquet with Zstandard compression10-50x smaller than raw JSON, columnar for fast analytics
ShipWrite to S3 with Hive-style partitioningDuckLake auto-discovers new partitions instantly
Store POS → Expanso Edge (enrich, validate, batch, Parquet) → S3 → DuckLake → MotherDuck SQL

Key Benefits

  • Real-time-ish analytics: Micro-batched data lands in S3 every 10 seconds, DuckLake indexes it automatically
  • 90%+ compression: Raw JSON transactions → Zstandard-compressed Parquet (columnar, typed, compact)
  • Zero ETL in the warehouse: Data arrives schema-validated, enriched, and query-ready
  • Edge resilience: Local buffering at each store handles network outages — no data loss
  • Serverless scale: MotherDuck auto-scales queries across all stores, no cluster management
  • Bring your own account: Works with any MotherDuck account and S3 bucket

What You'll Build

This guide walks through creating a production-ready retail POS pipeline that:

  1. Generates realistic POS transactions with basket items, payment methods, and store metadata
  2. Enriches with store information (region, format, square footage) and calculated fields
  3. Validates schema integrity and flags anomalies (negative totals, empty baskets)
  4. Batches into optimal micro-batches for Parquet encoding
  5. Encodes to compressed, partitioned Parquet files
  6. Ships to S3 with Hive-style partitioning for DuckLake auto-indexing
  7. Queries in MotherDuck with instant SQL across all stores

Sample Transaction Schema

FieldTypeDescription
txn_idSTRINGUUID for each transaction
store_idINT32Store identifier (1-50)
terminal_idINT32POS terminal within store (1-10)
timestampINT64Unix microseconds
typeSTRINGsale, return, or exchange
payment_methodSTRINGcard, cash, mobile, gift_card
itemsJSONArray of basket items (SKU, qty, unit_price, category)
item_countINT32Total items in basket
subtotalDOUBLEPre-tax total
tax_amountDOUBLECalculated tax
total_amountDOUBLEFinal transaction amount
store_regionSTRINGEnriched: NW, SW, NE, SE, MW
store_formatSTRINGEnriched: flagship, standard, express, outlet
employee_idSTRINGCashier/associate ID

Prerequisites

  • Expanso Edge installed (installation guide)
  • A MotherDuck account (free tier works)
  • An S3-compatible bucket (AWS S3, MinIO, or Cloudflare R2)
  • AWS credentials configured for S3 writes

Get Started

Choose your path:

Step-by-Step Tutorial

Build the pipeline incrementally:

  1. Generate POS Transactions
  2. Enrich with Store Metadata
  3. Validate and Flag Anomalies
  4. Batch and Encode to Parquet
  5. Ship to S3 with Partitioning
  6. Connect DuckLake and Query

Complete Pipeline

Download the production-ready solution with one-click copy