Troubleshooting
Quick Diagnosis
# Check pipeline status
expanso-cli status
# View recent errors
expanso-cli job logs --tail 50 2>&1 | grep -i error
# Test database connectivity
psql "postgres://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:5432/${DB_NAME}" \
-c "SELECT 1;"
# Test GCS connectivity
gsutil ls gs://${GCS_BACKUP_BUCKET}/
Database Issues
Connection refused
Symptoms:
Error: dial tcp: connection refused
Fixes:
-
Check network connectivity:
telnet ${DB_HOST} 5432 -
Verify firewall rules allow edge node
-
Check database is running:
pg_isready -h ${DB_HOST}
Permission denied on table
Symptoms:
Error: permission denied for table orders
Fix:
-- Grant SELECT to backup user
GRANT SELECT ON orders, inventory, order_items TO backup_reader;
-- Or grant on all tables
GRANT SELECT ON ALL TABLES IN SCHEMA public TO backup_reader;
No rows returned (empty backup)
Symptoms: Backup completes but files are tiny/empty.
Cause: Incremental filter too restrictive.
Fix:
# Check your WHERE clause
where: "updated_at >= CURRENT_DATE - INTERVAL '1 day'"
# Debug: count matching rows
psql -c "SELECT COUNT(*) FROM orders WHERE updated_at >= CURRENT_DATE - INTERVAL '1 day';"
Storage Issues
GCS permission denied
Symptoms:
Error: googleapi: Error 403: Access denied
Fixes:
-
Check authentication:
gcloud auth list
gcloud auth application-default login -
Verify bucket permissions:
gsutil iam get gs://${GCS_BACKUP_BUCKET} -
Required role:
roles/storage.objectCreator
Bucket doesn't exist
Symptoms:
Error: bucket does not exist
Fix:
# Create bucket
gsutil mb -c NEARLINE -l US gs://${GCS_BACKUP_BUCKET}
# Or check for typo
echo $GCS_BACKUP_BUCKET
Parquet encoding failed
Symptoms:
Error: failed to encode parquet
Cause: Data type incompatibility.
Fixes:
- Check for NULL in non-nullable fields
- Ensure consistent types across rows
- Fall back to JSON:
# Temporarily disable Parquet
output:
gcp_cloud_storage:
path: "...data.json"
content_type: application/json
# Remove parquet_encoding section
Performance Issues
Backup taking too long
Diagnosis:
expanso-cli node list
Fixes:
-
Increase batch size:
batching:
count: 50000 # Increase from 10000
period: 120s -
Add parallelism:
pipeline:
threads: 4 -
Optimize database query:
-- Add index on incremental column
CREATE INDEX idx_orders_updated_at ON orders(updated_at);
Running out of memory
Symptoms:
Error: out of memory
Fixes:
-
Reduce batch size:
batching:
count: 1000 # Smaller batches -
Add memory limit:
expanso-edge run --config backup.yaml --memory-limit 512MB
Checksum Issues
Checksums don't match on restore
Cause: Field order changed or floating point precision.
Fix: Ensure consistent JSON serialization:
# Sort keys for consistent ordering
root._checksum = $data_fields.format_json({"sort_keys": true}).hash("md5")
Recovery Issues
Can't read Parquet file
Fix: Install pyarrow or fastparquet:
pip install pyarrow pandas
# Then read
python -c "import pandas; print(pandas.read_parquet('file.parquet').head())"
Restore to wrong table
Prevention: Always restore to a staging table first:
-- Restore to staging
CREATE TABLE orders_restore_2024_01_15 AS SELECT * FROM ...;
-- Verify row counts match backup metadata
SELECT COUNT(*) FROM orders_restore_2024_01_15;
-- Then swap if needed
ALTER TABLE orders RENAME TO orders_old;
ALTER TABLE orders_restore_2024_01_15 RENAME TO orders;
Still Stuck?
-
Enable debug logging:
logger:
level: DEBUG -
Test single table:
# Isolate the problem
expanso-edge run --config backup.yaml \
--set 'input.sequence.inputs=[input.sequence.inputs[0]]' -
Check the Complete Pipeline for reference