Skip to main content

Troubleshooting

Quick Diagnosis​

# Check pipeline status
expanso-cli status

# View recent errors
expanso-cli job logs --tail 50 2>&1 | grep -i error

# Test EU database connectivity
psql "postgres://${DB_USER}:${DB_PASSWORD}@${EU_DB_HOST}:5432/transactions_eu" \
-c "SELECT COUNT(*) FROM customer_transactions LIMIT 1;"

# Test GCP connectivity
bq query "SELECT 1"
gsutil ls gs://${EU_ARCHIVE_BUCKET}

Validation Failures​

GDPR VIOLATION: PII fields still present​

Symptoms:

Error: GDPR VIOLATION: PII fields still present: customer_email, iban

Cause: Anonymization step didn't run or field names don't match.

Fix:

  1. Check field name case sensitivity:

    # Database may return different case
    root = root.without("customer_email") # Ensure exact match
    root = root.without("CUSTOMER_EMAIL") # Try uppercase if needed
  2. Verify processor order:

    pipeline:
    processors:
    - mapping: # Step 4: Hash identifiers MUST run
    - mapping: # Step 6: Validate MUST come after
  3. Debug by printing intermediate state:

    - mapping: |
    root = this
    root._debug_keys = this.keys() # See what fields exist

Age bucket returning "unknown"​

Symptoms:

{"customer_age_bucket": "unknown"}

Cause: DOB field is null, empty, or wrong format.

Fix:

# Handle multiple date formats
root.customer_age_bucket = if this.customer_dob == null || this.customer_dob == "" {
"unknown"
} else {
# Try parsing with fallback formats
let dob = this.customer_dob.ts_parse("2006-01-02").catch(
this.customer_dob.ts_parse("02/01/2006").catch(
this.customer_dob.ts_parse("January 2, 2006")
)
)
# ... age calculation
}

Hash Consistency Issues​

Anonymized IDs different across runs​

Symptoms: Same customer ID produces different hashes in different runs.

Cause: Salt not set or varying between executions.

Fix:

  1. Verify salt is set:

    echo "ANONYMIZATION_SALT: ${ANONYMIZATION_SALT:-NOT SET}"
  2. Persist salt properly:

    # Store in secrets manager, not env file
    vault kv put secret/gdpr-pipeline salt="${ANONYMIZATION_SALT}"
  3. Verify in pipeline:

    - mapping: |
    let salt = env("ANONYMIZATION_SALT")
    root._debug_salt_set = $salt != ""

IP subnet extraction failing​

Symptoms:

{"ip_subnet": "unknown"}

Cause: IPv6 address or malformed IP.

Fix:

root.ip_subnet = if this.ip_address.contains(":") {
# IPv6
this.ip_address.split(":").slice(0, 4).join(":") + "::/64"
} else if this.ip_address.contains(".") {
# IPv4
this.ip_address.split(".").slice(0, 2).join(".") + ".0.0/16"
} else if this.ip_address == "" || this.ip_address == null {
"not_provided"
} else {
"invalid_format"
}

Destination Issues​

BigQuery authentication failed​

Symptoms:

Error: google: could not find default credentials

Fix:

# Set up credentials
gcloud auth application-default login

# Or use service account
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Verify
gcloud auth list

EU archive bucket not in EU region​

Symptoms: Compliance riskβ€”data leaving EU before anonymization.

Fix:

# Check bucket location
gsutil ls -L -b gs://${EU_ARCHIVE_BUCKET} | grep "Location"

# If not EU, create new bucket
gsutil mb -l EU gs://new-eu-archive-bucket

# Update config
export EU_ARCHIVE_BUCKET=new-eu-archive-bucket

Audit log directory not writable​

Symptoms:

Error: open /var/log/expanso/gdpr-audit-2024-01-15.jsonl: permission denied

Fix:

# Create directory with correct permissions
sudo mkdir -p /var/log/expanso
sudo chown $(whoami):$(whoami) /var/log/expanso

# Or change log path in config
file:
path: "/tmp/expanso/gdpr-audit-${!timestamp_format(\"2006-01-02\")}.jsonl"

Performance Issues​

Pipeline processing slowly​

Diagnosis:

expanso-cli node list

Common fixes:

  1. Increase batch size:

    output:
    gcp_bigquery:
    batching:
    count: 1000 # Increase from 500
    period: 60s
  2. Add parallel processing:

    pipeline:
    threads: 4
  3. Optimize database query:

    input:
    sql_select:
    # Add index on transaction_timestamp
    where: "transaction_timestamp >= NOW() - INTERVAL '1 hour'"

Compliance Verification​

How to prove anonymization is complete?​

Generate compliance report:

-- Verify no PII in global dataset
SELECT
COUNT(*) as total_records,
COUNTIF(_gdpr_compliance.anonymization_verified = true) as verified,
COUNTIF(_gdpr_compliance.anonymization_verified != true OR _gdpr_compliance.anonymization_verified IS NULL) as unverified
FROM `global_analytics.anonymized_transactions`
WHERE DATE(transaction_hour) = CURRENT_DATE();

Export audit trail:

# Export today's audit log
gsutil cp /var/log/expanso/gdpr-audit-$(date +%Y-%m-%d).jsonl \
gs://${EU_ARCHIVE_BUCKET}/audit/$(date +%Y/%m/%d)/

Still Stuck?​

  1. Enable debug logging:

    logger:
    level: DEBUG
    format: json
  2. Test each step individually:

    echo '{"customer_email":"[email protected]"}' | \
    expanso-edge run --config step-4-hash.yaml
  3. Check the Complete Pipeline for reference

  4. Review GDPR guidance: ICO Guide to Anonymisation