Nightly Database Backup
Simple, reliable replication of database tables to cloud cold storage for disaster recovery.
The Problemβ
Your organization needs:
- Reliable backups of critical database tables
- Cloud storage for DR (not on-premise tape)
- Cost efficiency - cold storage classes for rarely-accessed backups
- Verifiable recovery - checksums to confirm data integrity
The Solution: 4 Backup Stepsβ
This pipeline provides production-ready database backups:
1. Extract Multiple Tables β Sequence Inputβ
- Orders (incremental - last 24 hours)
- Inventory (full - complete table)
- Order items (incremental - last 24 hours)
2. Add Backup Metadata β Recovery Contextβ
- Backup date and timestamp
- Source host and database
- Pipeline version
- Processing node ID
3. Calculate Checksums β Integrity Verificationβ
- MD5 hash of row data
- Verify integrity during recovery
4. Route to Storage β Table-Specific Pathsβ
- Organized by table and date
- Compressed Parquet format
- Cloud Nearline/Coldline storage class
Data Flowβ
βββββββββββββββββββ
β PostgreSQL β
β βββββββββββββ β
β β orders ββββΌβββ
β βββββββββββββ€ β β βββββββββββββββββββββββ
β β inventory ββββΌβββΌβββββΆβ Expanso Edge β
β βββββββββββββ€ β β β - Add metadata β
β βorder_itemsββββΌβββ β - Calculate checksumβ
β βββββββββββββ β β - Route by table β
βββββββββββββββββββ ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β /orders/ β β/inventoryβ β /items/ β
β 2024-01-15β β 2024-01-15β β 2024-01-15β
ββββββββββββ ββββββββββββ ββββββββββββ
Cloud Storage (Nearline)
Storage Layoutβ
gs://backup-bucket/
βββ backups/
β βββ orders/
β β βββ 2024-01-15/
β β β βββ orders-1705363200.parquet
β β βββ 2024-01-16/
β β βββ orders-1705449600.parquet
β βββ inventory/
β β βββ 2024-01-15/
β β βββ inventory-full.parquet
β βββ order_items/
β βββ 2024-01-15/
β βββ items-1705363200.parquet
Why Parquet + Nearline?β
| Choice | Benefit |
|---|---|
| Parquet | 70-90% compression, columnar queries |
| Nearline | $0.01/GB/month (vs $0.02 Standard) |
| Date partitions | Point-in-time recovery |
| Checksums | Verify integrity on restore |
What You'll Learnβ
By the end of this guide, you'll be able to:
β
Extract from multiple tables in a single pipeline
β
Add backup metadata for recovery context
β
Calculate row checksums for integrity verification
β
Route to table-specific paths with date partitions
β
Schedule nightly runs with cron
Get Startedβ
Option 1: Step-by-Step Tutorial (Recommended)β
Build the pipeline incrementally:
- Setup Guide - Prerequisites and environment
- Step 1: Extract Tables - Multi-table sequence
- Step 2: Add Metadata - Backup context
- Step 3: Calculate Checksum - Integrity
- Step 4: Route to Storage - Table paths
Option 2: Jump to Complete Pipelineβ
Download the production-ready configuration:
Who This Guide Is Forβ
- DBAs implementing backup strategies
- Platform Engineers building DR infrastructure
- DevOps Teams automating data protection
- Compliance Teams meeting backup requirements
Prerequisitesβ
- PostgreSQL (or MySQL/MSSQL) database
- GCP project with Cloud Storage (or AWS S3)
- Expanso Edge installed with database network access
Time to Completeβ
- Step-by-Step Tutorial: 30-40 minutes
- Quick Deploy: 5 minutes