Troubleshooting
Quick Diagnosis
# Check container status
docker ps | grep deduplicate
# Check recent logs
docker logs deduplicate-events --tail 50 2>&1 | grep -i error
# Test deduplication
curl -X POST http://localhost:8080/events \
-H "Content-Type: application/json" \
-d '{"event_id": "test-001", "data": "hello"}'
# Send same event again (should be deduplicated)
curl -X POST http://localhost:8080/events \
-H "Content-Type: application/json" \
-d '{"event_id": "test-001", "data": "hello"}'
Common Issues
Duplicates not being caught
Cause: Wrong field used for deduplication key
Fix: Check the key field matches your data:
dedup:
key: ${!json("event_id")} # Must match actual field name
cache: memory
All events being dropped
Cause: Dedup key is empty or same for all events
# Check if event_id is present
docker logs deduplicate-events --tail 20 2>&1 | grep event_id
Fix: Verify events have unique IDs, add fallback:
- mapping: |
root.dedup_key = this.event_id.or(this.hash("xxhash64"))
Memory growing indefinitely
Cause: Cache not expiring old entries
docker stats deduplicate-events --no-stream
Fix: Add TTL and size limits to cache:
dedup:
key: ${!json("event_id")}
cache:
memory:
ttl: 1h
cap: 1000000
Duplicates appearing after restart
Cause: Using in-memory cache, state lost on restart
Fix: Use Redis or persistent cache:
cache:
redis:
url: redis://localhost:6379
ttl: 1h
Still stuck?
- Add debug logging:
logger: {level: DEBUG} - Check the Complete Pipeline for reference config
- Review Normalize Timestamps for time-based dedup