MotherDuck Retail Analytics: Edge POS to Serverless Data Warehouse

Turn raw point-of-sale data from 50+ retail locations into query-ready Parquet on S3 — automatically indexed by MotherDuck's DuckLake for instant serverless analytics.

The Problem

Retail chains with dozens (or hundreds) of store locations generate massive volumes of POS transaction data:

Every terminal, every store, every second — basket-level transaction data with SKUs, quantities, prices, payment methods
Raw data is messy — different POS systems, inconsistent schemas, missing fields, local timestamps
Central analytics needs clean data — data science and merchandising teams want to query across all stores instantly
Traditional ETL is slow — nightly batch jobs mean yesterday's data, not today's insights
The edge gap: There's no lightweight way to transform and batch POS data at each store before shipping it to the warehouse

The result: Terabytes of raw JSON/CSV landing in S3 that requires heavy ETL before anyone can query it.

The Solution

Expanso Edge at every store, MotherDuck DuckLake in the cloud. Expanso handles the messy edge — parsing, enriching, batching, and encoding to Parquet. MotherDuck handles the analytics — auto-indexing Parquet from S3 via DuckLake for instant serverless queries.

Store #1 POS

Store #2 POS

Store #N POS

↓

Expanso Edge (per store)

parse → enrich → validate → batch → Parquet encode

↓

📦 S3 (Partitioned Parquet)

s3://retail-data/store=42/date=2026-02-11/batch_001.parquet

↓

🦆 MotherDuck DuckLake

Auto-indexes new Parquet files → instant SQL queries

Where Expanso Fits In Your Stack

Your analytics backend stays exactly as-is — MotherDuck, DuckDB, your BI tools. Expanso handles what happens at the edge before data reaches S3:

Layer	What	How Expanso Helps
Collect	POS transactions from terminals (JSON, CSV, or API)	Single lightweight agent per store — no heavy ETL infrastructure at the edge
Enrich	Add store metadata, region, format, calculated fields	Lookups and calculations happen at the store, not in the warehouse
Validate	Schema enforcement, null checks, type coercion	Bad records caught at source — only clean data leaves the store
Batch	Micro-batch transactions (1000 rows or 10s windows)	Optimal Parquet file sizes for DuckLake indexing
Encode	Compress to Parquet with Zstandard compression	10-50x smaller than raw JSON, columnar for fast analytics
Ship	Write to S3 with Hive-style partitioning	DuckLake auto-discovers new partitions instantly

Store POS → Expanso Edge (enrich, validate, batch, Parquet) → S3 → DuckLake → MotherDuck SQL

Key Benefits

Real-time-ish analytics: Micro-batched data lands in S3 every 10 seconds, DuckLake indexes it automatically
90%+ compression: Raw JSON transactions → Zstandard-compressed Parquet (columnar, typed, compact)
Zero ETL in the warehouse: Data arrives schema-validated, enriched, and query-ready
Edge resilience: Local buffering at each store handles network outages — no data loss
Serverless scale: MotherDuck auto-scales queries across all stores, no cluster management
Bring your own account: Works with any MotherDuck account and S3 bucket

What You'll Build

This guide walks through creating a production-ready retail POS pipeline that:

Generates realistic POS transactions with basket items, payment methods, and store metadata
Enriches with store information (region, format, square footage) and calculated fields
Validates schema integrity and flags anomalies (negative totals, empty baskets)
Batches into optimal micro-batches for Parquet encoding
Encodes to compressed, partitioned Parquet files
Ships to S3 with Hive-style partitioning for DuckLake auto-indexing
Queries in MotherDuck with instant SQL across all stores

Sample Transaction Schema

Field	Type	Description
`txn_id`	STRING	UUID for each transaction
`store_id`	INT32	Store identifier (1-50)
`terminal_id`	INT32	POS terminal within store (1-10)
`timestamp`	INT64	Unix microseconds
`type`	STRING	sale, return, or exchange
`payment_method`	STRING	card, cash, mobile, gift_card
`items`	JSON	Array of basket items (SKU, qty, unit_price, category)
`item_count`	INT32	Total items in basket
`subtotal`	DOUBLE	Pre-tax total
`tax_amount`	DOUBLE	Calculated tax
`total_amount`	DOUBLE	Final transaction amount
`store_region`	STRING	Enriched: NW, SW, NE, SE, MW
`store_format`	STRING	Enriched: flagship, standard, express, outlet
`employee_id`	STRING	Cashier/associate ID

Prerequisites

Expanso Edge installed (installation guide)
A MotherDuck account (free tier works)
An S3-compatible bucket (AWS S3, MinIO, or Cloudflare R2)
AWS credentials configured for S3 writes

Get Started

Choose your path:

Step-by-Step Tutorial

Build the pipeline incrementally:

Complete Pipeline

Download the production-ready solution with one-click copy

The Problem​

The Solution​

Where Expanso Fits In Your Stack​

Key Benefits​

What You'll Build​

Sample Transaction Schema​

Prerequisites​

Get Started​

Step-by-Step Tutorial​

Complete Pipeline​