Skip to main content

Step 3: Add S3 for Long-Term Archive

Now you will add a third destination to your fan-out pipeline: Amazon S3. S3 is a common destination for cost-effective, long-term archival of raw event data.

The Goal

You will modify your fan-out.yaml file to send messages to a local file, a Kafka topic, and an S3 bucket, all at the same time.

Implementation

  1. Start with the Previous Pipeline: Open the fan-out.yaml file you created in the previous step. It should look like this:

    fan-out.yaml
    output:
    broker:
    pattern: fan_out
    outputs:
    - file:
    path: /tmp/events.jsonl
    codec: lines
    - kafka:
    addresses: [ ${KAFKA_BROKERS} ]
    topic: "sensor-events"
    key: ${!this.sensor_id}
  2. Add the S3 Output: To add S3, simply add an aws_s3 block to the outputs array.

    Modify the output in fan-out.yaml
    output:
    broker:
    pattern: fan_out
    outputs:
    - file:
    path: /tmp/events.jsonl
    codec: lines
    -kafka:
    addresses: [ ${KAFKA_BROKERS} ]
    topic: "sensor-events"
    key: ${!this.sensor_id}

    # Add this block for S3
    - aws_s3:
    bucket: ${S3_BUCKET} # From your setup step
    region: ${AWS_REGION}
    path: "events/${!timestamp_unix_date()}/${!uuid_v4()}.json"
    batching:
    count: 10
    period: 10s

    Note that we've added a small batching configuration. S3 works best with larger files rather than many small ones, so even for a simple example, batching is important.

  3. Configure AWS Credentials: Ensure your environment is configured with AWS credentials. The simplest way for local development is to set the following environment variables:

    export AWS_ACCESS_KEY_ID="YOUR_KEY_ID"
    export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
    export AWS_REGION="us-east-1"
    export S3_BUCKET="your-unique-bucket-name"
  4. Deploy and Test:

    # Send a test event
    curl -X POST http://localhost:8080/events \
    -H "Content-Type: application/json" \
    -d '{"sensor_id": "sensor-456", "message": "This goes to three places"}'
  5. Verify:

    • Check the local file /tmp/events.jsonl.
    • Check the sensor-events Kafka topic.
    • Check your S3 bucket. After the batch period (10 seconds), you should see a new file in the events/ directory.

You have now built a pipeline that concurrently routes data to three different types of systems: local file, a message queue, and object storage.