Skip to main content

Redact Sensitive Data

Compliance (GDPR, HIPAA, PCI-DSS) requires strict control over Personally Identifiable Information (PII). In this step, we'll implement processors to detect and redact sensitive data.

Goal

  • Hash identifiers like IP addresses and Emails (pseudonymization)
  • Mask credit card numbers
  • Remove unnecessary sensitive fields

Configuration

1. Field-Level Redaction

We check for known sensitive fields (email, ip, credit_card) and apply transformations.

    - mapping: |
root = this

# Hash Email (SHA256)
if this.exists("email") {
root.email_hash = this.email.hash("sha256")
root = this.without("email")
}

# Hash IP with Salt
if this.exists("ip_address") {
root.ip_hash = (this.ip_address + env("IP_SALT")).hash("sha256")
root = this.without("ip_address")
}

2. Pattern-Based Masking

Sometimes PII is inside the message string. We can use regex replacement (caution: this can be expensive).

    # Mask Credit Cards in message
- mapping: |
root = this
root.message = this.message.re_replace_all(
"\\b(?:\\d[ -]*?){13,16}\\b",
"****-****-****-****"
)

Complete Step 5 Configuration

production-pipeline-step-5.yaml
input:
http_server:
address: "0.0.0.0:8080"
path: /logs/ingest
rate_limit: "1000/1s"
auth:
type: header
header: "X-API-Key"
required_value: "${LOG_API_KEY}"

pipeline:
processors:
- mapping: |
root = this.parse_json().catch({"message": content()})

# Redaction
- mapping: |
root = this

# 1. Email Redaction
if this.exists("email") {
root.email_hash = this.email.hash("sha256")
root = this.without("email")
}

# 2. Credit Card Masking (Field)
if this.exists("credit_card") {
root.credit_card = "****-****-****-" + this.credit_card.slice(-4)
}

# 3. Message Cleaning
root.message = this.message.re_replace_all("\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b", "[EMAIL]")

output:
stdout: {}

Deployment & Verification

  1. Test Redaction:
    curl -X POST http://localhost:8080/logs/ingest \
    -H "X-API-Key: $LOG_API_KEY" \
    -d
    '{
    "message": "Contact [email protected]",
    "email": "[email protected]",
    "credit_card": "1234-5678-9012-3456"
    }'
    Result:
    • email field removed, email_hash present.
    • credit_card masked to ****-****-****-3456.
    • message changed to Contact [EMAIL].

Troubleshooting

IssueSolution
Regex too slowUse field-level checks before pattern matching
PII still visibleVerify field names match your schema exactly
Hash collisionsAdd a salt via environment variable

Next Steps

Our logs are clean, valid, and safe. Finally, let's route them to the right storage systems.

👉 Step 6: Fan-Out Destinations