Step 5: Export to S3
The final step of this tutorial is to send your enriched, restructured, and batched logs to a production-grade storage system: Amazon S3.
The Goal
You will replace the local file output in your pipeline with an aws_s3 output, configuring it to send your batched logs to an S3 bucket.
The aws_s3 Output
The aws_s3 output processor is very similar to the file output, but requires a few extra fields for the bucket name, region, and AWS credentials.
Implementation
-
Start with the Previous Pipeline: Copy the
batched-output.yamlfrom Step 4 to a new file namedexport-s3.yaml.cp batched-output.yaml export-s3.yaml -
Replace the Output with S3: Open
export-s3.yamland replace the entireoutputsection with theaws_s3block below.Replace the 'output' section in export-s3.yamloutput:
aws_s3:
bucket: ${S3_BUCKET_NAME} # From your setup step
region: ${AWS_REGION}
# This path creates a simple date-based folder structure
path: "logs/${!timestamp_unix_date()}/${!uuid_v4()}.jsonl"
# We can re-use the same batching policy from the previous step
batching:
count: 10
period: 5s
# You must configure your credentials. Using a profile is a common method.
credentials:
profile: "default" # Or your named AWS profile -
Configure AWS Credentials: Ensure your environment is configured with AWS credentials. The pipeline above is configured to use the
defaultprofile. Make sure it's configured or change it to your named profile.# Check your default profile
aws configure list
# Don't forget to set the bucket name and region environment variables
export S3_BUCKET_NAME="your-unique-bucket-name"
export AWS_REGION="us-east-1" -
Deploy and Test: Run the pipeline after configuring your AWS credentials.
-
Verify: After the pipeline has been running for a short time, check your S3 bucket. You will see new files appearing, organized by date. Each file will contain a batch of 10 log messages.
aws s3 ls s3://${S3_BUCKET_NAME}/logs/ --recursive
You have now built a complete log enrichment and export pipeline that generates data, adds lineage, restructures it, and efficiently exports it to cloud storage.