Edge Processing for Splunk Environments

Splunk is a global brand with enormous reach. But every Splunk deployment is gated by a critical problem: Splunk only sees data that someone already decided to send it.

Across every enterprise, there are thousands of endpoints generating data that never touches Splunk. Not because it isn't valuable — because there was never a practical way to collect it, structure it, and make it usable:

A wind farm with 200 turbines generating vibration and thermal data? That stays in a local historian.
A hospital with 10,000 medical devices producing telemetry? That lives on the device.
A retail chain with 3,000 stores running point-of-sale systems? Local logs, never aggregated.
A manufacturing floor with hundreds of PLCs and SCADA systems? The OT team guards that data like a dragon sitting on gold.

None of this is in Splunk today. Not because Splunk can't handle it — but because there was never an intelligent way to get it there.

That's the opportunity. Not "do the same thing for less." It's "bring Splunk into places it's never been."

Four Wins, One Pipeline

Expanso Edge sits between your data sources and Splunk. It unlocks four things simultaneously:

Win #1: New Data Sources That Don't Exist in Splunk Today

The Universal Forwarder is brilliant at what it does — collect logs from servers and ship them. But the world has moved way beyond servers.

The edge is exploding. Gartner says 75% of enterprise data will be created and processed outside traditional data centers by 2025. That's data from:

Industrial IoT: Vibration sensors, temperature probes, pressure gauges, flow meters — each generating structured telemetry that's perfect for Splunk analytics but trapped on local PLCs
Connected infrastructure: Smart grid sensors, traffic systems, building management systems — each producing event streams that security and operations teams would love to correlate in ES
Medical devices: Patient monitors, infusion pumps, imaging systems — each generating data that compliance teams need centralized audit trails for
Retail and logistics: POS systems, RFID readers, fleet GPS, cold chain monitors — operational data that's been invisible to enterprise analytics

The problem isn't that this data is unstructured. It's that it's uncollected. There's no Universal Forwarder for a wind turbine. There's no TA for a PLC.

Expanso Edge solves this by sitting at the edge — on a gateway, an industrial PC, a Kubernetes node at a remote site — and doing the work of collecting, structuring, and preparing this data for Splunk:

✅ Protocols Splunk doesn't natively speak — MQTT, OPC-UA, Modbus, AMQP, industrial historians
✅ Proprietary telemetry → CIM-compliant events — raw sensor readings become Splunk-ready data
✅ Smart filtering at source — a 200-turbine wind farm doesn't ship raw vibration data at 1kHz. It sends anomaly summaries
✅ OT/IT bridge — the OT team keeps control of their network; Expanso filters and forwards only what's needed

This isn't taking away anything Splunk already ingests. It's opening up entirely new categories of data that were previously invisible. For every customer, that's a net-new data source — which means net-new ingest, net-new TAs, net-new dashboards, net-new ES correlation rules. The whole ecosystem grows.

Win #2: Making Splunk the AI/ML Platform of Record

Cisco's Data Fabric vision points toward Splunk as the intelligence layer for the entire enterprise — not just the IT operations team. But AI and ML models are only as good as their training data.

Right now, the data that reaches Splunk arrives in whatever format the source decided to emit. Debug logs mixed with security events. Unstructured text blobs from legacy apps. Timestamps in fourteen different formats. PII scattered everywhere. No semantic context, no labeling, no lineage.

This is the gap between "data in Splunk" and "data that's actually useful for AI."

Expanso Edge transforms raw data into AI-ready data before it arrives:

✅ Semantic enrichment at the source — every event gets context before it moves: what system generated it, what business function it supports, what data classification it carries, what regulatory jurisdiction it falls under. An ML model training on this data doesn't need months of data engineering — it arrives ready
✅ Schema normalization across thousands of sources — a customer with 247 different log sources has 247 different ways of expressing "a user logged in." Expanso normalizes all of them into a consistent schema before they hit the index. Cisco's TSFM (Time Series Foundation Model) can work across the entire dataset without fighting format inconsistencies
✅ Continuous data quality validation — enforce schemas, validate field types, check for anomalies in the data itself. Events that fail validation get flagged, routed to dead-letter queues, or corrected. The data that arrives in Splunk is trustworthy — the prerequisite for any serious AI/ML deployment

Training data extraction — here's what makes this concrete: the upstream layer can fork a stream. One copy goes to Splunk for real-time analytics. Another copy — enriched, labeled, structured, PII-stripped — goes to a data lake in Parquet or Delta Lake format, partitioned and ready for model training. Splunk becomes the real-time analytics layer and the source of truth for ML training data. That's the Data Fabric story made concrete.

The pitch to a customer isn't "Splunk costs less." It's "Splunk is now your AI/ML data backbone. Every event arrives enriched, normalized, and validated. Your data science team can train models on operational data that was previously inaccessible or unusable."

Win #3: Federated Data Architecture — Extended to the Actual Edge

The Cisco Data Fabric already federates across S3, Snowflake, Azure, and Iceberg. The Machine Data Lake gives customers a cost-effective place to land data within the Splunk ecosystem. Edge Processor handles filtering and routing within the Splunk pipeline.

The natural next step: extend federation all the way to the endpoint.

Not everything needs to be ingested. Not everything needs to be in S3. Some data is most valuable in situ — sitting on the device that generated it, queryable on demand.

Think about what this means for an analyst running Federated Search:

| federated source=edge_fleet host="factory-floor-*" metric="vibration" earliest=-7d
| join type=inner
    [search index=iot_alerts severity=critical host="factory-floor-*"]
| timechart span=1h avg(vibration_amplitude) by host

The analyst just queried vibration data from 200 factory floor sensors without that data ever being ingested. It stayed on the sensors. The upstream processing layer at each site handled the query, filtered it, and returned only the results. Splunk ran the analytics. The Machine Data Lake didn't need to store it. The ingest pipeline didn't need to carry it.

This is the Cisco Data Fabric's "bring analytics to data" principle taken to its extreme: analytics at the farthest edge, with Splunk as the unified query layer.

For customers, this unlocks a completely new tier of data:

Data Tier	Where It Lives	How Splunk Accesses It
Hot	Splunk Indexes	Direct search (SPL)
Warm	Machine Data Lake	Promoted search
Cold	S3 / Data Lake	Federated Search (FS-S3)
Edge	On the endpoint	Federated Edge Query (NEW)

That fourth tier doesn't exist today. It's a new capability. It's a new SKU opportunity. And it positions Splunk as the only platform that can query from the index all the way down to the actual device.

Win #4: Better Existing Data

For data you're already sending to Splunk, Expanso makes it arrive cleaner, faster, and cheaper:

✅ Faster searches — pre-parsed fields mean less search-time extraction, cleaner data means more relevant results
✅ Simpler management — one pipeline config replaces inputs.conf + props.conf + transforms.conf across every forwarder
✅ Better data quality — normalize timestamps, enrich with metadata, deduplicate at the edge
✅ Smarter routing — security events to security, metrics to metrics, compliance data tagged and masked automatically

Architecture: Expanso + Splunk Integration

Traditional Splunk

Log Sources

↓

Heavy Forwarder

raw, unparsed data

↓ 100% noise + signal

Splunk Indexers

search-time field extraction

↓

Slow Searches

✅ With Expanso Edge

Log Sources

↓

Expanso Edge

parse, enrich, dedupe, route

↓ clean data

Splunk HEC

↓

Fast Searches

pre-parsed fields

↓ archive

S3 / Archive

full fidelity backup

What changes:

Search performance — pre-extracted fields, no more search-time rex commands
Data quality — normalized timestamps, deduplicated events, enriched metadata
Simpler ops — one pipeline YAML replaces configs spread across every forwarder
Multi-destination — same data to Splunk + S3 + metrics systems, parsed once

If You Know Splunk, You Already Know Expanso

Expanso uses familiar Splunk concepts — but simpler to manage:

Splunk Concept	Expanso Equivalent	What's Better
`inputs.conf`	Pipeline `input` section	Real-time parsing, multiformat support
`props.conf` / `transforms.conf`	Pipeline `processors` + Bloblang	One file, not three. Side-by-side SPL translations
`outputs.conf`	Pipeline `output` section	Multi-destination routing built in
Heavy Forwarder fleet	Expanso Edge nodes	Cloud-managed, no config drift
SPL field extraction	Bloblang mapping	Runs at ingest, not search time
Deployment Server	Expanso Cloud Console	GitOps-driven, instant rollouts

What You'll Build

In this tutorial, you'll create a complete Splunk integration that:

Collects logs like inputs.conf — file monitoring, multiline parsing
Parses data like props.conf — field extraction, normalization at ingest time
Filters noise before indexing — so your indexes only contain data people actually query
Routes to Splunk HEC — proper index/sourcetype tagging, pre-parsed fields
Advanced patterns — multi-destination, compliance masking, metrics extraction

Prerequisites

Expanso Edge installed (installation guide)
Splunk instance with HEC token configured
Basic familiarity with Splunk configuration files

Get Started

Choose your path:

Edge Processing for Splunk Environments

The Blind Spot: Data Splunk Has Never Seen

Four Wins, One Pipeline

Win #1: New Data Sources That Don't Exist in Splunk Today

Win #2: Making Splunk the AI/ML Platform of Record

Win #3: Federated Data Architecture — Extended to the Actual Edge

Win #4: Better Existing Data

Architecture: Expanso + Splunk Integration

If You Know Splunk, You Already Know Expanso

What You'll Build

Prerequisites

Get Started

Interactive Explorer

Step-by-Step Tutorial

Complete Pipeline

The Blind Spot: Data Splunk Has Never Seen​

Four Wins, One Pipeline​

Win #1: New Data Sources That Don't Exist in Splunk Today​

Win #2: Making Splunk the AI/ML Platform of Record​

Win #3: Federated Data Architecture — Extended to the Actual Edge​

Win #4: Better Existing Data​

Architecture: Expanso + Splunk Integration​

If You Know Splunk, You Already Know Expanso​

What You'll Build​

Prerequisites​

Get Started​

Interactive Explorer​

Step-by-Step Tutorial​

Complete Pipeline​

The Blind Spot: Data Splunk Has Never Seen

Four Wins, One Pipeline

Win #1: New Data Sources That Don't Exist in Splunk Today

Win #2: Making Splunk the AI/ML Platform of Record

Win #3: Federated Data Architecture — Extended to the Actual Edge

Win #4: Better Existing Data

Architecture: Expanso + Splunk Integration

If You Know Splunk, You Already Know Expanso

What You'll Build

Prerequisites

Get Started

Interactive Explorer

Step-by-Step Tutorial

Complete Pipeline