Redact Sensitive Data
Compliance (GDPR, HIPAA, PCI-DSS) requires strict control over Personally Identifiable Information (PII). In this step, we'll implement processors to detect and redact sensitive data.
Goal
- Hash identifiers like IP addresses and Emails (pseudonymization)
- Mask credit card numbers
- Remove unnecessary sensitive fields
Configuration
1. Field-Level Redaction
We check for known sensitive fields (email, ip, credit_card) and apply transformations.
- mapping: |
root = this
# Hash Email (SHA256)
if this.exists("email") {
root.email_hash = this.email.hash("sha256")
root = this.without("email")
}
# Hash IP with Salt
if this.exists("ip_address") {
root.ip_hash = (this.ip_address + env("IP_SALT")).hash("sha256")
root = this.without("ip_address")
}
2. Pattern-Based Masking
Sometimes PII is inside the message string. We can use regex replacement (caution: this can be expensive).
# Mask Credit Cards in message
- mapping: |
root = this
root.message = this.message.re_replace_all(
"\\b(?:\\d[ -]*?){13,16}\\b",
"****-****-****-****"
)
Complete Step 5 Configuration
production-pipeline-step-5.yaml
input:
http_server:
address: "0.0.0.0:8080"
path: /logs/ingest
rate_limit: "1000/1s"
auth:
type: header
header: "X-API-Key"
required_value: "${LOG_API_KEY}"
pipeline:
processors:
- mapping: |
root = this.parse_json().catch({"message": content()})
# Redaction
- mapping: |
root = this
# 1. Email Redaction
if this.exists("email") {
root.email_hash = this.email.hash("sha256")
root = this.without("email")
}
# 2. Credit Card Masking (Field)
if this.exists("credit_card") {
root.credit_card = "****-****-****-" + this.credit_card.slice(-4)
}
# 3. Message Cleaning
root.message = this.message.re_replace_all("\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b", "[EMAIL]")
output:
stdout: {}
Deployment & Verification
- Test Redaction:
Result:
curl -X POST http://localhost:8080/logs/ingest \
-H "X-API-Key: $LOG_API_KEY" \
-d
'{
"message": "Contact [email protected]",
"email": "[email protected]",
"credit_card": "1234-5678-9012-3456"
}'emailfield removed,email_hashpresent.credit_cardmasked to****-****-****-3456.messagechanged toContact [EMAIL].
Troubleshooting
| Issue | Solution |
|---|---|
| Regex too slow | Use field-level checks before pattern matching |
| PII still visible | Verify field names match your schema exactly |
| Hash collisions | Add a salt via environment variable |
Next Steps
Our logs are clean, valid, and safe. Finally, let's route them to the right storage systems.