Skip to main content

Setup Environment for PII Removal

Before building the PII removal pipeline, you'll set up environment variables for cryptographic salts and download sample data for testing.

Prerequisites

This example requires the following services to be running:

Before you begin, please ensure these services are set up and running according to their respective guides. Additionally, ensure you have completed the Local Development Setup guide for general environment configuration.

Step 1: Generate Cryptographic Salts

Salts are secret values used in hashing to prevent rainbow table attacks. Each type of PII should have its own salt.

# Generate three secure random salts (32 bytes each)
export IP_SALT=$(openssl rand -hex 32)
export EMAIL_SALT=$(openssl rand -hex 32)
export USER_SALT=$(openssl rand -hex 32)

# Verify salts were generated
echo "IP_SALT length: ${#IP_SALT}" # Should print 64
echo "EMAIL_SALT length: ${#EMAIL_SALT}" # Should print 64
echo "USER_SALT length: ${#USER_SALT}" # Should print 64
Protect Your Salts

These salts are cryptographic secrets. If an attacker obtains them, they can reverse-engineer hashed values using rainbow table attacks.

Production best practices:

  • Store salts in a secret management system (HashiCorp Vault, AWS Secrets Manager, etc.)
  • Never commit salts to version control
  • Rotate salts periodically (every 90 days recommended)
  • Use different salts per environment (dev/staging/prod)

Optional: Persist Salts

To avoid regenerating salts every session:

# Save to a file (add to .gitignore!)
cat > ~/.pii-pipeline-secrets <<EOF
export IP_SALT="$IP_SALT"
export EMAIL_SALT="$EMAIL_SALT"
export USER_SALT="$USER_SALT"
EOF

# Load in future sessions
source ~/.pii-pipeline-secrets

Step 2: Set Analytics Endpoint

Configure where processed (PII-free) events will be sent:

# For this tutorial, use a local file sink
export ANALYTICS_ENDPOINT="file:///var/log/expanso/pii-removed.jsonl"

# In production, use your analytics service:
# export ANALYTICS_ENDPOINT="http://analytics-service:8080/events"
# export ANALYTICS_ENDPOINT="https://your-datadog-endpoint.com"

Step 3: Download Sample Data

Download the sample purchase event we'll use for testing:

# Create a working directory
mkdir -p ~/pii-pipeline-tutorial
cd ~/pii-pipeline-tutorial

# Download sample data
curl -o sample-data.json \
https://examples.expanso.io/files/data-security/remove-pii/sample-data.json

# Verify the download
cat sample-data.json | jq .

Next Steps

Environment configured! Now build the pipeline step-by-step:

Or jump to a specific step: