Setup Environment for Format Transformation
Before building the complete format transformation solution, you'll set up schema registries and configure format definitions.
Prerequisites
This example requires the following services to be running:
Before you begin, please ensure these services are set up and running according to their respective guides. Additionally, ensure you have completed the Local Development Setup guide for general environment configuration.
Step 1: Configure Environment Variables
Set up the environment variables needed for multi-format processing:
# Format transformation configuration
export SCHEMA_REGISTRY_URL="http://localhost:8081"
export CLOUD_STORAGE_BUCKET="your-format-bucket"
export CLOUD_REGION="us-east-1"
# Verify environment setup
echo "Schema Registry: $SCHEMA_REGISTRY_URL"
echo "Storage Bucket: $CLOUD_STORAGE_BUCKET"
echo "Region: $CLOUD_REGION"
Step 2: Start Schema Registry
Format transformation requires a schema registry to manage Avro schemas and ensure compatibility:
# Pull and start Confluent Schema Registry
docker run -d \
--name format-schema-registry \
-p 8081:8081 \
-e SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS=localhost:9092 \
-e SCHEMA_REGISTRY_HOST_NAME=schema-registry \
-e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 \
confluentinc/cp-schema-registry:latest
# Wait for startup
sleep 10
# Verify schema registry is running
curl -f http://localhost:8081/subjects || echo "Schema registry not ready, waiting..."
Step 3: Create Sample Format Definitions
Create the schema definitions that will guide our format transformations:
Avro Schema
Create sensor-data.avsc:
{
"type": "record",
"name": "SensorData",
"namespace": "com.example.formats",
"fields": [
{
"name": "sensor_id",
"type": "string"
},
{
"name": "location",
"type": "string"
},
{
"name": "temperature",
"type": "double"
},
{
"name": "humidity",
"type": "double"
},
{
"name": "timestamp",
"type": "long"
},
{
"name": "metadata",
"type": {
"type": "record",
"name": "Metadata",
"fields": [
{
"name": "device_type",
"type": "string"
},
{
"name": "firmware_version",
"type": "string"
}
]
}
}
]
}
Protobuf Schema
Create sensor-data.proto:
syntax = "proto3";
package com.example.formats;
import "google/protobuf/timestamp.proto";
message SensorData {
string sensor_id = 1;
string location = 2;
double temperature = 3;
double humidity = 4;
google.protobuf.Timestamp timestamp = 5;
Metadata metadata = 6;
}
message Metadata {
string device_type = 1;
string firmware_version = 2;
}
Step 4: Register Schemas
Register the Avro schema with the schema registry:
# Register the sensor data schema
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data @sensor-data.avsc \
http://localhost:8081/subjects/sensor-data-value/versions
Step 5: Create Sample Data Files
Create sample data in different formats for testing:
JSON Sample Data
Create sample-sensor-data.json:
[
{
"sensor_id": "temp_42",
"location": "warehouse_north",
"temperature": 72.5,
"humidity": 45.2,
"timestamp": "2025-10-20T14:23:45.123Z",
"metadata": {
"device_type": "DHT22",
"firmware_version": "1.2.3"
}
}
]
Next Steps
Your format transformation environment is now ready! Continue with: