Advanced Format Transformation Patterns

Once you have mastered the basic conversions between formats, you can implement more sophisticated, production-grade patterns.

Pattern 1: Schema Evolution

Your data schemas will change over time. Schema-based formats like Avro and Protobuf have built-in support for schema evolution, allowing you to add or remove fields without breaking downstream consumers, as long as you follow certain rules (e.g., new fields must have default values).

Example: Adding an optional humidity field to an Avro schema.

v1 schema:

{"type": "record", "name": "Reading", "fields": [{"name": "temperature", "type": "double"}]}

v2 schema (backward-compatible):

{
  "type": "record",
  "name": "Reading",
  "fields": [
    {"name": "temperature", "type": "double"},
    // New fields must have a default value to be backward-compatible
    {"name": "humidity", "type": ["null", "double"], "default": null}
  ]
}

A consumer with the v2 schema can still read old v1 data (the humidity field will be null). A consumer with the v1 schema can read new v2 data (it will simply ignore the extra humidity field).

Managing this typically involves a Schema Registry, a central service that stores and versions your schemas. Processors like to_avro and from_avro can integrate with a schema registry to automatically handle these conversions.

Pattern 2: Multi-Format Output

Sometimes you need to convert a single input into multiple different output formats simultaneously. This is a common pattern for data distribution.

Use Case: An incoming JSON event needs to be sent to a Kafka topic as Avro (for streaming) and archived to S3 as Parquet (for analytics).

Multi-Format Output
output:
  broker:
    pattern: fan_out
    outputs:
      # --- Output 1: Convert to Avro and send to Kafka ---
      - processors:
          - to_avro:
              schema_path: "file://./sensor.avsc"
        kafka:
          addresses: [ ${KAFKA_BROKERS} ]
          topic: "sensor-readings-avro"

      # --- Output 2: Convert to Parquet and send to S3 ---
      - processors:
          - to_parquet:
              schema_path: "file://./sensor.avsc"
              default_compression: snappy
        aws_s3:
          bucket: "my-analytics-bucket"
          path: "readings/${!timestamp_unix_date()}/${!uuid_v4()}.parquet"

This pattern uses a broker output to fan the message out. Each branch of the fan-out has its own processors block to perform the format-specific conversion before sending the data to its final destination.

Pattern 1: Schema Evolution​

Pattern 2: Multi-Format Output​

Pattern 1: Schema Evolution

Pattern 2: Multi-Format Output