Skip to main content

Troubleshooting

Quick Diagnosis

# Check container status
docker ps | grep transform

# Check recent logs
docker logs transform-formats --tail 50 2>&1 | grep -i error

# Test format transformation
curl -X POST http://localhost:8080/data \
-H "Content-Type: application/json" \
-d '{"sensor_id": "temp-001", "temperature": 72.5}'

Common Issues

Avro schema validation fails

Cause: Data doesn't match schema types

# Check schema errors
docker logs transform-formats --tail 20 2>&1 | grep -i schema

Fix: Coerce types before encoding:

- mapping: |
root.temperature = this.temperature.number()
root.timestamp = this.timestamp.string()

Parquet output corrupt

Cause: Schema mismatch or incomplete batches

Fix: Ensure consistent schema and proper batching:

batching:
count: 1000
period: 60s
byte_size: 10MB

Auto-detection picking wrong format

Cause: Ambiguous input data

Fix: Add explicit format hints:

- mapping: |
root.input_format = if content().has_prefix("{") {
"json"
} else if content().has_prefix("Obj") {
"avro"
} else {
"unknown"
}

Large files causing memory issues

Cause: Entire file loaded into memory

docker stats transform-formats --no-stream

Fix: Process in smaller batches, add byte limits:

batching:
byte_size: 5MB
count: 500

Still stuck?

  1. Add debug logging: logger: {level: DEBUG}
  2. Check the Complete Pipeline for reference config
  3. Review Aggregate Time Windows for batch processing