Technique 4: Pseudonymization
Pseudonymization is the process of replacing a direct identifier (like a name) with a consistent but meaningless ID. This is a powerful technique for tracking user activity across sessions without storing their real-world identity.
The Goal
You will use a salted hash of the user_name to create a stable user_id, and then delete the original name. This ensures the same user always gets the same ID.
Implementation
-
Start with the Previous Pipeline: Copy the
hash-email.yamlfrom Step 3 to a new file namedpseudonymize-user.yaml.cp hash-email.yaml pseudonymize-user.yamlNote: Remember to set the
USER_SALTenvironment variable as described in the setup guide. -
Add the Pseudonymization Logic: Open
pseudonymize-user.yamland add the logic foruser_nameto the bottom of the existingmappingprocessor.Add this to your 'mapping' processor in pseudonymize-user.yaml# --- Logic from previous steps ---
# (The existing logic for payment, IP, and email remains here)
# --- START: New additions for Pseudonymization ---
# Create a consistent, pseudonymous user ID from the name.
# We hash the name and take the first 12 characters for a shorter ID.
root.user_id = "user_" + this.user_name.hash("sha256", env("USER_SALT")).slice(0, 12)
# Delete the original user name field
root = this.without("user_name")
# --- END: New additions --- -
Deploy and Test:
# Send the sample event data
curl -X POST http://localhost:8080/events/ingest \
-H "Content-Type: application/json" \
-d @~/expanso-remove-pii/sample-event.json -
Verify: Check your logs. The
user_namefield will be gone, replaced by auser_idthat starts withuser_followed by 12 random-looking characters. If you send the same event again, you will get the exact sameuser_id, which is the key to this technique.