Skip to main content

Technique 2: Hashing

Hashing is a one-way transformation perfect for fields where you need to check for uniqueness or track activity without needing to know the original value. A common use case is anonymizing IP addresses for GDPR compliance.

The Goal

You will use the .hash() function to replace the plaintext ip_address field with a salted ip_hash, and then delete the original.

Implementation

  1. Start with the Previous Pipeline: Copy the delete-payment.yaml from Step 1 to a new file named hash-ip.yaml.

    cp delete-payment.yaml hash-ip.yaml

    Note: Remember to set the IP_SALT environment variable as described in the setup guide.

  2. Add the Hashing Logic: Open hash-ip.yaml and add the hashing logic to the bottom of the existing mapping processor.

    Add this to the 'mapping' processor in hash-ip.yaml
    # --- Logic from Step 1 (Deletion) ---
    root.payment_method = this.payment_method.without("full_number", "expiry")

    # --- START: New additions for Hashing ---

    # Hash the IP address with a secret salt for security
    root.ip_hash = this.ip_address.hash("sha256", env("IP_SALT"))

    # Delete the original IP address field
    root = this.without("ip_address")

    # --- END: New additions ---
  3. Deploy and Test:

    # Send the sample event data
    curl -X POST http://localhost:8080/events/ingest \
    -H "Content-Type: application/json" \
    -d @~/expanso-remove-pii/sample-event.json
  4. Verify: Check your logs. The ip_address field will be gone, replaced by a 64-character ip_hash. Sending the same event again will produce the exact same hash, allowing you to count unique visitors or track user sessions without storing personal data.

You have now applied the hashing pattern to make your data GDPR-compliant while preserving its value for abuse detection and session tracking.