Troubleshooting
Quick Diagnosisβ
# Check pipeline status
expanso-cli status
# View recent errors
expanso-cli job logs --tail 50 2>&1 | grep -i error
# Test EU database connectivity
psql "postgres://${DB_USER}:${DB_PASSWORD}@${EU_DB_HOST}:5432/transactions_eu" \
-c "SELECT COUNT(*) FROM customer_transactions LIMIT 1;"
# Test GCP connectivity
bq query "SELECT 1"
gsutil ls gs://${EU_ARCHIVE_BUCKET}
Validation Failuresβ
GDPR VIOLATION: PII fields still presentβ
Symptoms:
Error: GDPR VIOLATION: PII fields still present: customer_email, iban
Cause: Anonymization step didn't run or field names don't match.
Fix:
-
Check field name case sensitivity:
# Database may return different case
root = root.without("customer_email") # Ensure exact match
root = root.without("CUSTOMER_EMAIL") # Try uppercase if needed -
Verify processor order:
pipeline:
processors:
- mapping: # Step 4: Hash identifiers MUST run
- mapping: # Step 6: Validate MUST come after -
Debug by printing intermediate state:
- mapping: |
root = this
root._debug_keys = this.keys() # See what fields exist
Age bucket returning "unknown"β
Symptoms:
{"customer_age_bucket": "unknown"}
Cause: DOB field is null, empty, or wrong format.
Fix:
# Handle multiple date formats
root.customer_age_bucket = if this.customer_dob == null || this.customer_dob == "" {
"unknown"
} else {
# Try parsing with fallback formats
let dob = this.customer_dob.ts_parse("2006-01-02").catch(
this.customer_dob.ts_parse("02/01/2006").catch(
this.customer_dob.ts_parse("January 2, 2006")
)
)
# ... age calculation
}
Hash Consistency Issuesβ
Anonymized IDs different across runsβ
Symptoms: Same customer ID produces different hashes in different runs.
Cause: Salt not set or varying between executions.
Fix:
-
Verify salt is set:
echo "ANONYMIZATION_SALT: ${ANONYMIZATION_SALT:-NOT SET}" -
Persist salt properly:
# Store in secrets manager, not env file
vault kv put secret/gdpr-pipeline salt="${ANONYMIZATION_SALT}" -
Verify in pipeline:
- mapping: |
let salt = env("ANONYMIZATION_SALT")
root._debug_salt_set = $salt != ""
IP subnet extraction failingβ
Symptoms:
{"ip_subnet": "unknown"}
Cause: IPv6 address or malformed IP.
Fix:
root.ip_subnet = if this.ip_address.contains(":") {
# IPv6
this.ip_address.split(":").slice(0, 4).join(":") + "::/64"
} else if this.ip_address.contains(".") {
# IPv4
this.ip_address.split(".").slice(0, 2).join(".") + ".0.0/16"
} else if this.ip_address == "" || this.ip_address == null {
"not_provided"
} else {
"invalid_format"
}
Destination Issuesβ
BigQuery authentication failedβ
Symptoms:
Error: google: could not find default credentials
Fix:
# Set up credentials
gcloud auth application-default login
# Or use service account
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Verify
gcloud auth list
EU archive bucket not in EU regionβ
Symptoms: Compliance riskβdata leaving EU before anonymization.
Fix:
# Check bucket location
gsutil ls -L -b gs://${EU_ARCHIVE_BUCKET} | grep "Location"
# If not EU, create new bucket
gsutil mb -l EU gs://new-eu-archive-bucket
# Update config
export EU_ARCHIVE_BUCKET=new-eu-archive-bucket
Audit log directory not writableβ
Symptoms:
Error: open /var/log/expanso/gdpr-audit-2024-01-15.jsonl: permission denied
Fix:
# Create directory with correct permissions
sudo mkdir -p /var/log/expanso
sudo chown $(whoami):$(whoami) /var/log/expanso
# Or change log path in config
file:
path: "/tmp/expanso/gdpr-audit-${!timestamp_format(\"2006-01-02\")}.jsonl"
Performance Issuesβ
Pipeline processing slowlyβ
Diagnosis:
expanso-cli node list
Common fixes:
-
Increase batch size:
output:
gcp_bigquery:
batching:
count: 1000 # Increase from 500
period: 60s -
Add parallel processing:
pipeline:
threads: 4 -
Optimize database query:
input:
sql_select:
# Add index on transaction_timestamp
where: "transaction_timestamp >= NOW() - INTERVAL '1 hour'"
Compliance Verificationβ
How to prove anonymization is complete?β
Generate compliance report:
-- Verify no PII in global dataset
SELECT
COUNT(*) as total_records,
COUNTIF(_gdpr_compliance.anonymization_verified = true) as verified,
COUNTIF(_gdpr_compliance.anonymization_verified != true OR _gdpr_compliance.anonymization_verified IS NULL) as unverified
FROM `global_analytics.anonymized_transactions`
WHERE DATE(transaction_hour) = CURRENT_DATE();
Export audit trail:
# Export today's audit log
gsutil cp /var/log/expanso/gdpr-audit-$(date +%Y-%m-%d).jsonl \
gs://${EU_ARCHIVE_BUCKET}/audit/$(date +%Y/%m/%d)/
Still Stuck?β
-
Enable debug logging:
logger:
level: DEBUG
format: json -
Test each step individually:
echo '{"customer_email":"[email protected]"}' | \
expanso-edge run --config step-4-hash.yaml -
Check the Complete Pipeline for reference
-
Review GDPR guidance: ICO Guide to Anonymisation