Step 3: Restructure the Log Format
Once you've added metadata, your log can become a flat list of many fields. A good practice is to restructure the log into a cleaner format that separates the original event data from the processing metadata. This makes it much easier for data analysts and downstream systems to use.
The Goal
You will restructure your flat log into a nested object with two main sections: event (the original data) and metadata (the lineage information you've added).
Input Message (Flat Structure):
{
"id": "abc-123",
"timestamp": "...",
"level": "INFO",
"service": "auth-service",
"lineage_node_id": "local-dev-machine",
"lineage_pipeline": "enrich-export-tutorial"
}
Desired Output (Structured):
{
"event": {
"id": "abc-123",
"timestamp": "...",
"level": "INFO",
"service": "auth-service"
},
"metadata": {
"node_id": "local-dev-machine",
"pipeline": "enrich-export-tutorial"
}
}
Implementation
-
Start with the Previous Pipeline: Copy the
add-lineage.yamlfrom Step 2 to a new file namedrestructure-log.yaml.cp add-lineage.yaml restructure-log.yaml -
Add the Restructuring Logic: Open
restructure-log.yamland add a newmappingprocessor to the end of thepipelinesection.Add this to the 'processors' array in restructure-log.yaml- mapping: |
# Create the new, structured root object
root = {
"event": {
"id": this.id,
"timestamp": this.timestamp,
"level": this.level,
"service": this.service,
"message": this.message,
"user_id": this.user_id,
"request_id": this.request_id
},
"metadata": {
"node_id": this.lineage.processing_node_id,
"pipeline_name": this.lineage.pipeline_name,
"processed_at": this.lineage.processed_at
}
} -
Deploy and Test: Run the pipeline and watch the output.
-
Verify: Watch the logs from your pipeline. Each log message will now have the clean, two-part structure, making it much easier to read and query.
You have now learned how to transform your data's structure to make it more organized and optimized for analytics.