Skip to main content

Nightly Database Backup

Simple, reliable replication of database tables to cloud cold storage for disaster recovery.

The Problem​

Your organization needs:

  • Reliable backups of critical database tables
  • Cloud storage for DR (not on-premise tape)
  • Cost efficiency - cold storage classes for rarely-accessed backups
  • Verifiable recovery - checksums to confirm data integrity

The Solution: 4 Backup Steps​

This pipeline provides production-ready database backups:

1. Extract Multiple Tables β†’ Sequence Input​

  • Orders (incremental - last 24 hours)
  • Inventory (full - complete table)
  • Order items (incremental - last 24 hours)

2. Add Backup Metadata β†’ Recovery Context​

  • Backup date and timestamp
  • Source host and database
  • Pipeline version
  • Processing node ID

3. Calculate Checksums β†’ Integrity Verification​

  • MD5 hash of row data
  • Verify integrity during recovery

4. Route to Storage β†’ Table-Specific Paths​

  • Organized by table and date
  • Compressed Parquet format
  • Cloud Nearline/Coldline storage class

Data Flow​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PostgreSQL β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ orders │──┼──┐
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ inventory │──┼──┼────▢│ Expanso Edge β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ - Add metadata β”‚
β”‚ β”‚order_itemsβ”‚β”€β”€β”Όβ”€β”€β”˜ β”‚ - Calculate checksumβ”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ - Route by table β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ /orders/ β”‚ β”‚/inventoryβ”‚ β”‚ /items/ β”‚
β”‚ 2024-01-15β”‚ β”‚ 2024-01-15β”‚ β”‚ 2024-01-15β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Cloud Storage (Nearline)

Storage Layout​

gs://backup-bucket/
β”œβ”€β”€ backups/
β”‚ β”œβ”€β”€ orders/
β”‚ β”‚ β”œβ”€β”€ 2024-01-15/
β”‚ β”‚ β”‚ └── orders-1705363200.parquet
β”‚ β”‚ └── 2024-01-16/
β”‚ β”‚ └── orders-1705449600.parquet
β”‚ β”œβ”€β”€ inventory/
β”‚ β”‚ └── 2024-01-15/
β”‚ β”‚ └── inventory-full.parquet
β”‚ └── order_items/
β”‚ └── 2024-01-15/
β”‚ └── items-1705363200.parquet

Why Parquet + Nearline?​

ChoiceBenefit
Parquet70-90% compression, columnar queries
Nearline$0.01/GB/month (vs $0.02 Standard)
Date partitionsPoint-in-time recovery
ChecksumsVerify integrity on restore

What You'll Learn​

By the end of this guide, you'll be able to:

βœ… Extract from multiple tables in a single pipeline
βœ… Add backup metadata for recovery context
βœ… Calculate row checksums for integrity verification
βœ… Route to table-specific paths with date partitions
βœ… Schedule nightly runs with cron

Get Started​

Build the pipeline incrementally:

  1. Setup Guide - Prerequisites and environment
  2. Step 1: Extract Tables - Multi-table sequence
  3. Step 2: Add Metadata - Backup context
  4. Step 3: Calculate Checksum - Integrity
  5. Step 4: Route to Storage - Table paths

Option 2: Jump to Complete Pipeline​

Download the production-ready configuration:

β†’ Get Complete Pipeline

Who This Guide Is For​

  • DBAs implementing backup strategies
  • Platform Engineers building DR infrastructure
  • DevOps Teams automating data protection
  • Compliance Teams meeting backup requirements

Prerequisites​

  • PostgreSQL (or MySQL/MSSQL) database
  • GCP project with Cloud Storage (or AWS S3)
  • Expanso Edge installed with database network access

Time to Complete​

  • Step-by-Step Tutorial: 30-40 minutes
  • Quick Deploy: 5 minutes