Skip to main content

Troubleshooting

Quick Diagnosis

# Check container status
docker ps | grep deduplicate

# Check recent logs
docker logs deduplicate-events --tail 50 2>&1 | grep -i error

# Test deduplication
curl -X POST http://localhost:8080/events \
-H "Content-Type: application/json" \
-d '{"event_id": "test-001", "data": "hello"}'

# Send same event again (should be deduplicated)
curl -X POST http://localhost:8080/events \
-H "Content-Type: application/json" \
-d '{"event_id": "test-001", "data": "hello"}'

Common Issues

Duplicates not being caught

Cause: Wrong field used for deduplication key

Fix: Check the key field matches your data:

dedup:
key: ${!json("event_id")} # Must match actual field name
cache: memory

All events being dropped

Cause: Dedup key is empty or same for all events

# Check if event_id is present
docker logs deduplicate-events --tail 20 2>&1 | grep event_id

Fix: Verify events have unique IDs, add fallback:

- mapping: |
root.dedup_key = this.event_id.or(this.hash("xxhash64"))

Memory growing indefinitely

Cause: Cache not expiring old entries

docker stats deduplicate-events --no-stream

Fix: Add TTL and size limits to cache:

dedup:
key: ${!json("event_id")}
cache:
memory:
ttl: 1h
cap: 1000000

Duplicates appearing after restart

Cause: Using in-memory cache, state lost on restart

Fix: Use Redis or persistent cache:

cache:
redis:
url: redis://localhost:6379
ttl: 1h

Still stuck?

  1. Add debug logging: logger: {level: DEBUG}
  2. Check the Complete Pipeline for reference config
  3. Review Normalize Timestamps for time-based dedup