Technique 5: Generalization
The final technique is generalization, where you reduce the precision of data to a level that is still useful for analytics but no longer identifies an individual. This is the perfect pattern for location data.
The Goal​
You will remove the precise latitude and longitude fields from the location object, while preserving the city, state, and country for regional analytics.
Implementation​
-
Start with the Previous Pipeline: Copy the
pseudonymize-user.yamlfrom Step 4 to a new file namedgeneralize-location.yaml.cp pseudonymize-user.yaml generalize-location.yaml -
Add the Generalization Logic: Open
generalize-location.yamland add the location logic to the bottom of the existingmappingprocessor.Add this to your 'mapping' processor in generalize-location.yaml# --- Logic from previous steps ---
# (The existing logic for all other PII fields remains here)
# --- START: New additions for Generalization ---
# Remove the precise GPS coordinates to protect user privacy,
# but keep the city, state, and country for regional analytics.
root.location = this.location.without("latitude", "longitude")
# --- END: New additions --- -
Deploy and Test:
# Send the sample event data
curl -X POST http://localhost:8080/events/ingest \
-H "Content-Type: application/json" \
-d @~/expanso-remove-pii/sample-event.json -
Verify: Check your logs. The
locationobject will no longer containlatitudeorlongitude, but the other location fields remain.
You have now built a comprehensive PII removal pipeline that applies different techniques—deletion, hashing, pseudonymization, and generalization—to different fields based on their specific privacy and analytics requirements.