Skip to main content

Edge Processing for Splunk Environments

The Blind Spot: Data Splunk Has Never Seen

Splunk is a global brand with enormous reach. But every Splunk deployment is gated by a critical problem: Splunk only sees data that someone already decided to send it.

Across every enterprise, there are thousands of endpoints generating data that never touches Splunk. Not because it isn't valuable — because there was never a practical way to collect it, structure it, and make it usable:

  • A wind farm with 200 turbines generating vibration and thermal data? That stays in a local historian.
  • A hospital with 10,000 medical devices producing telemetry? That lives on the device.
  • A retail chain with 3,000 stores running point-of-sale systems? Local logs, never aggregated.
  • A manufacturing floor with hundreds of PLCs and SCADA systems? The OT team guards that data like a dragon sitting on gold.

None of this is in Splunk today. Not because Splunk can't handle it — but because there was never an intelligent way to get it there.

That's the opportunity. Not "do the same thing for less." It's "bring Splunk into places it's never been."

Four Wins, One Pipeline

Expanso Edge sits between your data sources and Splunk. It unlocks four things simultaneously:

Win #1: New Data Sources That Don't Exist in Splunk Today

The Universal Forwarder is brilliant at what it does — collect logs from servers and ship them. But the world has moved way beyond servers.

The edge is exploding. Gartner says 75% of enterprise data will be created and processed outside traditional data centers by 2025. That's data from:

  • Industrial IoT: Vibration sensors, temperature probes, pressure gauges, flow meters — each generating structured telemetry that's perfect for Splunk analytics but trapped on local PLCs
  • Connected infrastructure: Smart grid sensors, traffic systems, building management systems — each producing event streams that security and operations teams would love to correlate in ES
  • Medical devices: Patient monitors, infusion pumps, imaging systems — each generating data that compliance teams need centralized audit trails for
  • Retail and logistics: POS systems, RFID readers, fleet GPS, cold chain monitors — operational data that's been invisible to enterprise analytics

The problem isn't that this data is unstructured. It's that it's uncollected. There's no Universal Forwarder for a wind turbine. There's no TA for a PLC.

Expanso Edge solves this by sitting at the edge — on a gateway, an industrial PC, a Kubernetes node at a remote site — and doing the work of collecting, structuring, and preparing this data for Splunk:

Protocols Splunk doesn't natively speak — MQTT, OPC-UA, Modbus, AMQP, industrial historians
Proprietary telemetry → CIM-compliant events — raw sensor readings become Splunk-ready data
Smart filtering at source — a 200-turbine wind farm doesn't ship raw vibration data at 1kHz. It sends anomaly summaries
OT/IT bridge — the OT team keeps control of their network; Expanso filters and forwards only what's needed

This isn't taking away anything Splunk already ingests. It's opening up entirely new categories of data that were previously invisible. For every customer, that's a net-new data source — which means net-new ingest, net-new TAs, net-new dashboards, net-new ES correlation rules. The whole ecosystem grows.

Win #2: Making Splunk the AI/ML Platform of Record

Cisco's Data Fabric vision points toward Splunk as the intelligence layer for the entire enterprise — not just the IT operations team. But AI and ML models are only as good as their training data.

Right now, the data that reaches Splunk arrives in whatever format the source decided to emit. Debug logs mixed with security events. Unstructured text blobs from legacy apps. Timestamps in fourteen different formats. PII scattered everywhere. No semantic context, no labeling, no lineage.

This is the gap between "data in Splunk" and "data that's actually useful for AI."

Expanso Edge transforms raw data into AI-ready data before it arrives:

Semantic enrichment at the source — every event gets context before it moves: what system generated it, what business function it supports, what data classification it carries, what regulatory jurisdiction it falls under. An ML model training on this data doesn't need months of data engineering — it arrives ready
Schema normalization across thousands of sources — a customer with 247 different log sources has 247 different ways of expressing "a user logged in." Expanso normalizes all of them into a consistent schema before they hit the index. Cisco's TSFM (Time Series Foundation Model) can work across the entire dataset without fighting format inconsistencies
Continuous data quality validation — enforce schemas, validate field types, check for anomalies in the data itself. Events that fail validation get flagged, routed to dead-letter queues, or corrected. The data that arrives in Splunk is trustworthy — the prerequisite for any serious AI/ML deployment

Training data extraction — here's what makes this concrete: the upstream layer can fork a stream. One copy goes to Splunk for real-time analytics. Another copy — enriched, labeled, structured, PII-stripped — goes to a data lake in Parquet or Delta Lake format, partitioned and ready for model training. Splunk becomes the real-time analytics layer and the source of truth for ML training data. That's the Data Fabric story made concrete.

The pitch to a customer isn't "Splunk costs less." It's "Splunk is now your AI/ML data backbone. Every event arrives enriched, normalized, and validated. Your data science team can train models on operational data that was previously inaccessible or unusable."

Win #3: Federated Data Architecture — Extended to the Actual Edge

The Cisco Data Fabric already federates across S3, Snowflake, Azure, and Iceberg. The Machine Data Lake gives customers a cost-effective place to land data within the Splunk ecosystem. Edge Processor handles filtering and routing within the Splunk pipeline.

The natural next step: extend federation all the way to the endpoint.

Not everything needs to be ingested. Not everything needs to be in S3. Some data is most valuable in situ — sitting on the device that generated it, queryable on demand.

Think about what this means for an analyst running Federated Search:

| federated source=edge_fleet host="factory-floor-*" metric="vibration" earliest=-7d
| join type=inner
[search index=iot_alerts severity=critical host="factory-floor-*"]
| timechart span=1h avg(vibration_amplitude) by host

The analyst just queried vibration data from 200 factory floor sensors without that data ever being ingested. It stayed on the sensors. The upstream processing layer at each site handled the query, filtered it, and returned only the results. Splunk ran the analytics. The Machine Data Lake didn't need to store it. The ingest pipeline didn't need to carry it.

This is the Cisco Data Fabric's "bring analytics to data" principle taken to its extreme: analytics at the farthest edge, with Splunk as the unified query layer.

For customers, this unlocks a completely new tier of data:

Data TierWhere It LivesHow Splunk Accesses It
HotSplunk IndexesDirect search (SPL)
WarmMachine Data LakePromoted search
ColdS3 / Data LakeFederated Search (FS-S3)
EdgeOn the endpointFederated Edge Query (NEW)

That fourth tier doesn't exist today. It's a new capability. It's a new SKU opportunity. And it positions Splunk as the only platform that can query from the index all the way down to the actual device.

Win #4: Better Existing Data

For data you're already sending to Splunk, Expanso makes it arrive cleaner, faster, and cheaper:

Faster searches — pre-parsed fields mean less search-time extraction, cleaner data means more relevant results
Simpler management — one pipeline config replaces inputs.conf + props.conf + transforms.conf across every forwarder
Better data quality — normalize timestamps, enrich with metadata, deduplicate at the edge
Smarter routing — security events to security, metrics to metrics, compliance data tagged and masked automatically

Architecture: Expanso + Splunk Integration

Traditional Splunk
Log Sources
Heavy Forwarder
raw, unparsed data
↓ 100% noise + signal
Splunk Indexers
search-time field extraction
Slow Searches
✅ With Expanso Edge
Log Sources
Expanso Edge
parse, enrich, dedupe, route
↓ clean data
Splunk HEC
Fast Searches
pre-parsed fields
↓ archive
S3 / Archive
full fidelity backup

What changes:

  • Search performance — pre-extracted fields, no more search-time rex commands
  • Data quality — normalized timestamps, deduplicated events, enriched metadata
  • Simpler ops — one pipeline YAML replaces configs spread across every forwarder
  • Multi-destination — same data to Splunk + S3 + metrics systems, parsed once

If You Know Splunk, You Already Know Expanso

Expanso uses familiar Splunk concepts — but simpler to manage:

Splunk ConceptExpanso EquivalentWhat's Better
inputs.confPipeline input sectionReal-time parsing, multiformat support
props.conf / transforms.confPipeline processors + BloblangOne file, not three. Side-by-side SPL translations
outputs.confPipeline output sectionMulti-destination routing built in
Heavy Forwarder fleetExpanso Edge nodesCloud-managed, no config drift
SPL field extractionBloblang mappingRuns at ingest, not search time
Deployment ServerExpanso Cloud ConsoleGitOps-driven, instant rollouts

What You'll Build

In this tutorial, you'll create a complete Splunk integration that:

  1. Collects logs like inputs.conf — file monitoring, multiline parsing
  2. Parses data like props.conf — field extraction, normalization at ingest time
  3. Filters noise before indexing — so your indexes only contain data people actually query
  4. Routes to Splunk HEC — proper index/sourcetype tagging, pre-parsed fields
  5. Advanced patterns — multi-destination, compliance masking, metrics extraction

Prerequisites

  • Expanso Edge installed (installation guide)
  • Splunk instance with HEC token configured
  • Basic familiarity with Splunk configuration files

Get Started

Choose your path:

Interactive Explorer

See each Splunk integration technique with side-by-side transformations

Step-by-Step Tutorial

Build the pipeline incrementally:

  1. Collect Like inputs.conf
  2. Parse Like props.conf
  3. Filter Before Indexing
  4. Route to Splunk HEC
  5. Advanced Splunk Patterns

Complete Pipeline

Download the production-ready solution


Expanso makes your Splunk investment go further — by bringing it into places it's never been, and making everything it already does faster and cleaner.