Troubleshooting

Quick Diagnosis

# Check pipeline status
expanso-cli status

# View recent errors
expanso-cli job logs --tail 50 2>&1 | grep -i error

# Test database connectivity
psql "postgres://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:5432/${DB_NAME}" \
  -c "SELECT 1;"

# Test GCS connectivity
gsutil ls gs://${GCS_BACKUP_BUCKET}/

Database Issues

Connection refused

Symptoms:

Error: dial tcp: connection refused

Fixes:

Check network connectivity:
```
telnet ${DB_HOST} 5432
```
Verify firewall rules allow edge node
Check database is running:
```
pg_isready -h ${DB_HOST}
```

Permission denied on table

Symptoms:

Error: permission denied for table orders

Fix:

-- Grant SELECT to backup user
GRANT SELECT ON orders, inventory, order_items TO backup_reader;

-- Or grant on all tables
GRANT SELECT ON ALL TABLES IN SCHEMA public TO backup_reader;

No rows returned (empty backup)

Symptoms: Backup completes but files are tiny/empty.

Cause: Incremental filter too restrictive.

Fix:

# Check your WHERE clause
where: "updated_at >= CURRENT_DATE - INTERVAL '1 day'"

# Debug: count matching rows
psql -c "SELECT COUNT(*) FROM orders WHERE updated_at >= CURRENT_DATE - INTERVAL '1 day';"

Storage Issues

GCS permission denied

Symptoms:

Error: googleapi: Error 403: Access denied

Fixes:

Check authentication:

gcloud auth list
gcloud auth application-default login

Verify bucket permissions:

gsutil iam get gs://${GCS_BACKUP_BUCKET}

Required role: roles/storage.objectCreator

Bucket doesn't exist

Symptoms:

Error: bucket does not exist

Fix:

# Create bucket
gsutil mb -c NEARLINE -l US gs://${GCS_BACKUP_BUCKET}

# Or check for typo
echo $GCS_BACKUP_BUCKET

Parquet encoding failed

Symptoms:

Error: failed to encode parquet

Cause: Data type incompatibility.

Fixes:

Check for NULL in non-nullable fields
Ensure consistent types across rows

Fall back to JSON:

# Temporarily disable Parquet
output:
  gcp_cloud_storage:
    path: "...data.json"
    content_type: application/json
    # Remove parquet_encoding section

Performance Issues

Backup taking too long

Diagnosis:

expanso-cli node list

Fixes:

Increase batch size:

batching:
  count: 50000  # Increase from 10000
  period: 120s

Add parallelism:
```
pipeline:
  threads: 4
```

Optimize database query:

-- Add index on incremental column
CREATE INDEX idx_orders_updated_at ON orders(updated_at);

Running out of memory

Symptoms:

Error: out of memory

Fixes:

Reduce batch size:

batching:
  count: 1000  # Smaller batches

Add memory limit:

expanso-edge run --config backup.yaml --memory-limit 512MB

Checksum Issues

Checksums don't match on restore

Cause: Field order changed or floating point precision.

Fix: Ensure consistent JSON serialization:

# Sort keys for consistent ordering
root._checksum = $data_fields.format_json({"sort_keys": true}).hash("md5")

Recovery Issues

Can't read Parquet file

Fix: Install pyarrow or fastparquet:

pip install pyarrow pandas

# Then read
python -c "import pandas; print(pandas.read_parquet('file.parquet').head())"

Restore to wrong table

Prevention: Always restore to a staging table first:

-- Restore to staging
CREATE TABLE orders_restore_2024_01_15 AS SELECT * FROM ...;

-- Verify row counts match backup metadata
SELECT COUNT(*) FROM orders_restore_2024_01_15;

-- Then swap if needed
ALTER TABLE orders RENAME TO orders_old;
ALTER TABLE orders_restore_2024_01_15 RENAME TO orders;

Still Stuck?

Enable debug logging:
```
logger:
  level: DEBUG
```

Test single table:

# Isolate the problem
expanso-edge run --config backup.yaml \
  --set 'input.sequence.inputs=[input.sequence.inputs[0]]'

Check the Complete Pipeline for reference

Quick Diagnosis​

Database Issues​

Connection refused​

Permission denied on table​

No rows returned (empty backup)​

Storage Issues​

GCS permission denied​

Bucket doesn't exist​

Parquet encoding failed​

Performance Issues​

Backup taking too long​

Running out of memory​

Checksum Issues​

Checksums don't match on restore​

Recovery Issues​

Can't read Parquet file​

Restore to wrong table​

Still Stuck?​

Quick Diagnosis

Database Issues

Connection refused

Permission denied on table

No rows returned (empty backup)

Storage Issues

GCS permission denied

Bucket doesn't exist

Parquet encoding failed

Performance Issues

Backup taking too long

Running out of memory

Checksum Issues

Checksums don't match on restore

Recovery Issues

Can't read Parquet file

Restore to wrong table

Still Stuck?