Skip to main content

Step 2: Parse CSV Data

Another common log format is CSV (Comma-Separated Values). This step teaches you how to parse a raw log line that is a CSV string into a structured JSON object.

The Goal

You will transform a raw message like this, where the message field is a single CSV string:

{
"message": "2025-10-20T14:23:45Z,temperature,temp-sensor-01,35.5,celsius"
}

Into a structured object with named fields:

{
"timestamp": "2025-10-20T14:23:45Z",
"metric_name": "temperature",
"sensor_id": "temp-sensor-01",
"value": "35.5",
"unit": "celsius"
}

The csv Processor

This transformation is done with the csv processor, where you define the names of the columns in the order they appear.

Implementation

  1. Create the Parsing Pipeline: Copy the following configuration into a file named csv-parser.yaml.

    csv-parser.yaml
    name: csv-log-parser
    description: A pipeline that parses a string field containing CSV data.

    config:
    input:
    generate:
    interval: 1s
    mapping: |
    root.raw_log = "2025-10-20T14:23:45Z,temperature,temp-sensor-01,35.5,celsius"

    pipeline:
    processors:
    # This processor parses the 'raw_log' field.
    # Because we don't specify a target field, the result
    # becomes the new root of the message.
    - csv:
    target_field: root.raw_log
    columns:
    - timestamp
    - metric_name
    - sensor_id
    - value
    - unit
    lazy_quotes: true # Handles cases with unclosed quotes

    output:
    stdout:
    codec: lines
  2. Deploy and Observe: Watch the logs. The generate input creates messages with a single raw_log field containing a CSV string. The output will be the structured JSON that was parsed from that string.

Verification

The output will be a stream of structured JSON objects.

Example Output:

{"metric_name":"temperature","sensor_id":"temp-sensor-01","timestamp":"2025-10-20T14:23:45Z","unit":"celsius","value":"35.5"}

You have successfully parsed a CSV log. Notice that all the values are still strings. Subsequent processing steps would be needed to convert value to a number and timestamp to a time object.