Skip to main content

Step 3: Restructure the Log Format

Once you've added metadata, your log can become a flat list of many fields. A good practice is to restructure the log into a cleaner format that separates the original event data from the processing metadata. This makes it much easier for data analysts and downstream systems to use.

The Goal

You will restructure your flat log into a nested object with two main sections: event (the original data) and metadata (the lineage information you've added).

Input Message (Flat Structure):

{
"id": "abc-123",
"timestamp": "...",
"level": "INFO",
"service": "auth-service",
"lineage_node_id": "local-dev-machine",
"lineage_pipeline": "enrich-export-tutorial"
}

Desired Output (Structured):

{
"event": {
"id": "abc-123",
"timestamp": "...",
"level": "INFO",
"service": "auth-service"
},
"metadata": {
"node_id": "local-dev-machine",
"pipeline": "enrich-export-tutorial"
}
}

Implementation

  1. Start with the Previous Pipeline: Copy the add-lineage.yaml from Step 2 to a new file named restructure-log.yaml.

    cp add-lineage.yaml restructure-log.yaml
  2. Add the Restructuring Logic: Open restructure-log.yaml and add a new mapping processor to the end of the pipeline section.

    Add this to the 'processors' array in restructure-log.yaml
    - mapping: |
    # Create the new, structured root object
    root = {
    "event": {
    "id": this.id,
    "timestamp": this.timestamp,
    "level": this.level,
    "service": this.service,
    "message": this.message,
    "user_id": this.user_id,
    "request_id": this.request_id
    },
    "metadata": {
    "node_id": this.lineage.processing_node_id,
    "pipeline_name": this.lineage.pipeline_name,
    "processed_at": this.lineage.processed_at
    }
    }
  3. Deploy and Test: Run the pipeline and watch the output.

  4. Verify: Watch the logs from your pipeline. Each log message will now have the clean, two-part structure, making it much easier to read and query.

You have now learned how to transform your data's structure to make it more organized and optimized for analytics.