Troubleshooting
Quick Diagnosis
# Check container status
docker ps | grep transform
# Check recent logs
docker logs transform-formats --tail 50 2>&1 | grep -i error
# Test format transformation
curl -X POST http://localhost:8080/data \
-H "Content-Type: application/json" \
-d '{"sensor_id": "temp-001", "temperature": 72.5}'
Common Issues
Avro schema validation fails
Cause: Data doesn't match schema types
# Check schema errors
docker logs transform-formats --tail 20 2>&1 | grep -i schema
Fix: Coerce types before encoding:
- mapping: |
root.temperature = this.temperature.number()
root.timestamp = this.timestamp.string()
Parquet output corrupt
Cause: Schema mismatch or incomplete batches
Fix: Ensure consistent schema and proper batching:
batching:
count: 1000
period: 60s
byte_size: 10MB
Auto-detection picking wrong format
Cause: Ambiguous input data
Fix: Add explicit format hints:
- mapping: |
root.input_format = if content().has_prefix("{") {
"json"
} else if content().has_prefix("Obj") {
"avro"
} else {
"unknown"
}
Large files causing memory issues
Cause: Entire file loaded into memory
docker stats transform-formats --no-stream
Fix: Process in smaller batches, add byte limits:
batching:
byte_size: 5MB
count: 500
Still stuck?
- Add debug logging:
logger: {level: DEBUG} - Check the Complete Pipeline for reference config
- Review Aggregate Time Windows for batch processing