Skip to main content

Technique 4: Pseudonymization

Pseudonymization is the process of replacing a direct identifier (like a name) with a consistent but meaningless ID. This is a powerful technique for tracking user activity across sessions without storing their real-world identity.

The Goal

You will use a salted hash of the user_name to create a stable user_id, and then delete the original name. This ensures the same user always gets the same ID.

Implementation

  1. Start with the Previous Pipeline: Copy the hash-email.yaml from Step 3 to a new file named pseudonymize-user.yaml.

    cp hash-email.yaml pseudonymize-user.yaml

    Note: Remember to set the USER_SALT environment variable as described in the setup guide.

  2. Add the Pseudonymization Logic: Open pseudonymize-user.yaml and add the logic for user_name to the bottom of the existing mapping processor.

    Add this to your 'mapping' processor in pseudonymize-user.yaml
    # --- Logic from previous steps ---
    # (The existing logic for payment, IP, and email remains here)

    # --- START: New additions for Pseudonymization ---

    # Create a consistent, pseudonymous user ID from the name.
    # We hash the name and take the first 12 characters for a shorter ID.
    root.user_id = "user_" + this.user_name.hash("sha256", env("USER_SALT")).slice(0, 12)

    # Delete the original user name field
    root = this.without("user_name")

    # --- END: New additions ---
  3. Deploy and Test:

    # Send the sample event data
    curl -X POST http://localhost:8080/events/ingest \
    -H "Content-Type: application/json" \
    -d @~/expanso-remove-pii/sample-event.json
  4. Verify: Check your logs. The user_name field will be gone, replaced by a user_id that starts with user_ followed by 12 random-looking characters. If you send the same event again, you will get the exact same user_id, which is the key to this technique.