Technique 3: Hashing with Metadata Extraction
This technique extends hashing. Instead of just replacing a field with its hash, you can also extract and preserve a non-sensitive piece of metadata from it before deletion. This is perfect for fields like email addresses, where the unique identifier is sensitive but the domain is useful for analytics.
The Goal
You will hash the full email address for user counting, extract the email_domain for organizational analytics, and delete the original email.
Implementation
-
Start with the Previous Pipeline: Copy the
hash-ip.yamlfrom Step 2 to a new file namedhash-email.yaml.cp hash-ip.yaml hash-email.yamlNote: Remember to set the
EMAIL_SALTenvironment variable as described in the setup guide. -
Add the Email Hashing Logic: Open
hash-email.yamland add the email logic to the bottom of the existingmappingprocessor.Add this to your 'mapping' processor in hash-email.yaml# --- Logic from previous steps ---
# (The existing logic for payment deletion and IP hashing remains here)
# --- START: New additions for Email Hashing ---
# Hash the full email address for unique user tracking
root.email_hash = this.email.hash("sha256", env("EMAIL_SALT"))
# Extract the domain for organizational analytics
root.email_domain = this.email.split("@").index(1)
# Delete the original email address field
root = this.without("email")
# --- END: New additions --- -
Deploy and Test:
# Send the sample event data
curl -X POST http://localhost:8080/events/ingest \
-H "Content-Type: application/json" \
-d @~/expanso-remove-pii/sample-event.json -
Verify: Check your logs. The
emailfield will be gone, replaced byemail_hashandemail_domain. This allows you to count unique users withCOUNT(DISTINCT email_hash)and analyze user organizations withGROUP BY email_domain, all while remaining GDPR compliant.