Advanced Enrichment & Export Patterns

Once you have mastered the basic "enrich, restructure, batch, export" pattern, you can implement more sophisticated, production-grade features.

Pattern 1: Hive-Style S3 Partitioning

For large-scale analytics, storing files with a simple date prefix isn't enough. Hive-style partitioning is a naming convention (key=value) that is automatically understood by big data query engines like AWS Athena, Presto, and Spark. This allows them to dramatically prune the data they need to scan, saving time and money.

S3 Output with Hive-Style Partitioning
output:
  aws_s3:
    bucket: ${S3_BUCKET_NAME}
    # This path creates partitions that analytics engines can use for pruning
    path: "logs/year=${!timestamp("2006")}/month=${!timestamp("01")}/day=${!timestamp("02")}/hour=${!timestamp("15")}/${!uuid_v4()}.jsonl.gz"
    # ... other config

A query like WHERE year=2025 AND month=10 would now only scan files within that specific folder, ignoring all other data.

Pattern 2: Compression

To save on storage costs and improve query performance, you should compress your batches before sending them to S3. This can be done with a compress processor inside the batching block.

S3 Output with Gzip Compression
output:
  aws_s3:
    bucket: ${S3_BUCKET_NAME}
    path: "logs/.../${!uuid_v4()}.jsonl.gz" # Note the .gz extension
    batching:
      count: 100
      period: 60s
      # This processor runs on the batch *before* it gets sent
      processors:
        - compress:
            algorithm: gzip
    # Let S3 know the content is compressed
    content_encoding: gzip

Pattern 3: Multi-Destination Routing

You may want to send different types of logs to different places. For example, ERROR logs might go to a high-priority S3 bucket for immediate alerting, while INFO logs go to a standard bucket for archival.

Multi-Destination S3 Export
output:
  broker:
    pattern: fan_out
    outputs:
      # --- Output 1: ERROR logs to a priority bucket ---
      - processors:
          # This processor filters the batch to only include errors
          - mapping: `root = this.filter(log -> log.event.level == "ERROR")`
        aws_s3:
          bucket: "my-priority-error-logs"
          path: "errors/${!timestamp_unix_date()}/${!uuid_v4()}.jsonl.gz"
          # ... use a smaller, faster batching policy here

      # --- Output 2: All other logs to a standard bucket ---
      - processors:
          - mapping: `root = this.filter(log -> log.event.level != "ERROR")`
        aws_s3:
          bucket: "my-standard-logs"
          path: "logs/${!timestamp_unix_date()}/${!uuid_v4()}.jsonl.gz"
          # ... use a larger, more cost-effective batching policy here

This pattern uses a broker to fan out the data, with each output having its own processor to filter the batch for the appropriate log level before sending it to S3.

Pattern 1: Hive-Style S3 Partitioning​

Pattern 2: Compression​

Pattern 3: Multi-Destination Routing​

Pattern 1: Hive-Style S3 Partitioning

Pattern 2: Compression

Pattern 3: Multi-Destination Routing