Skip to main content

Technique 5: Generalization

The final technique is generalization, where you reduce the precision of data to a level that is still useful for analytics but no longer identifies an individual. This is the perfect pattern for location data.

The Goal​

You will remove the precise latitude and longitude fields from the location object, while preserving the city, state, and country for regional analytics.

Implementation​

  1. Start with the Previous Pipeline: Copy the pseudonymize-user.yaml from Step 4 to a new file named generalize-location.yaml.

    cp pseudonymize-user.yaml generalize-location.yaml
  2. Add the Generalization Logic: Open generalize-location.yaml and add the location logic to the bottom of the existing mapping processor.

    Add this to your 'mapping' processor in generalize-location.yaml
    # --- Logic from previous steps ---
    # (The existing logic for all other PII fields remains here)

    # --- START: New additions for Generalization ---

    # Remove the precise GPS coordinates to protect user privacy,
    # but keep the city, state, and country for regional analytics.
    root.location = this.location.without("latitude", "longitude")

    # --- END: New additions ---
  3. Deploy and Test:

    # Send the sample event data
    curl -X POST http://localhost:8080/events/ingest \
    -H "Content-Type: application/json" \
    -d @~/expanso-remove-pii/sample-event.json
  4. Verify: Check your logs. The location object will no longer contain latitude or longitude, but the other location fields remain.

You have now built a comprehensive PII removal pipeline that applies different techniques—deletion, hashing, pseudonymization, and generalization—to different fields based on their specific privacy and analytics requirements.