Edge Processing for Splunk Environments
The Blind Spot: Data Splunk Has Never Seen
Splunk is a global brand with enormous reach. But every Splunk deployment is gated by a critical problem: Splunk only sees data that someone already decided to send it.
Across every enterprise, there are thousands of endpoints generating data that never touches Splunk. Not because it isn't valuable — because there was never a practical way to collect it, structure it, and make it usable:
- A wind farm with 200 turbines generating vibration and thermal data? That stays in a local historian.
- A hospital with 10,000 medical devices producing telemetry? That lives on the device.
- A retail chain with 3,000 stores running point-of-sale systems? Local logs, never aggregated.
- A manufacturing floor with hundreds of PLCs and SCADA systems? The OT team guards that data like a dragon sitting on gold.
None of this is in Splunk today. Not because Splunk can't handle it — but because there was never an intelligent way to get it there.
That's the opportunity. Not "do the same thing for less." It's "bring Splunk into places it's never been."
Four Wins, One Pipeline
Expanso Edge sits between your data sources and Splunk. It unlocks four things simultaneously:
Win #1: New Data Sources That Don't Exist in Splunk Today
The Universal Forwarder is brilliant at what it does — collect logs from servers and ship them. But the world has moved way beyond servers.
The edge is exploding. Gartner says 75% of enterprise data will be created and processed outside traditional data centers by 2025. That's data from:
- Industrial IoT: Vibration sensors, temperature probes, pressure gauges, flow meters — each generating structured telemetry that's perfect for Splunk analytics but trapped on local PLCs
- Connected infrastructure: Smart grid sensors, traffic systems, building management systems — each producing event streams that security and operations teams would love to correlate in ES
- Medical devices: Patient monitors, infusion pumps, imaging systems — each generating data that compliance teams need centralized audit trails for
- Retail and logistics: POS systems, RFID readers, fleet GPS, cold chain monitors — operational data that's been invisible to enterprise analytics
The problem isn't that this data is unstructured. It's that it's uncollected. There's no Universal Forwarder for a wind turbine. There's no TA for a PLC.
Expanso Edge solves this by sitting at the edge — on a gateway, an industrial PC, a Kubernetes node at a remote site — and doing the work of collecting, structuring, and preparing this data for Splunk:
✅ Protocols Splunk doesn't natively speak — MQTT, OPC-UA, Modbus, AMQP, industrial historians
✅ Proprietary telemetry → CIM-compliant events — raw sensor readings become Splunk-ready data
✅ Smart filtering at source — a 200-turbine wind farm doesn't ship raw vibration data at 1kHz. It sends anomaly summaries
✅ OT/IT bridge — the OT team keeps control of their network; Expanso filters and forwards only what's needed
This isn't taking away anything Splunk already ingests. It's opening up entirely new categories of data that were previously invisible. For every customer, that's a net-new data source — which means net-new ingest, net-new TAs, net-new dashboards, net-new ES correlation rules. The whole ecosystem grows.
Win #2: Making Splunk the AI/ML Platform of Record
Cisco's Data Fabric vision points toward Splunk as the intelligence layer for the entire enterprise — not just the IT operations team. But AI and ML models are only as good as their training data.
Right now, the data that reaches Splunk arrives in whatever format the source decided to emit. Debug logs mixed with security events. Unstructured text blobs from legacy apps. Timestamps in fourteen different formats. PII scattered everywhere. No semantic context, no labeling, no lineage.
This is the gap between "data in Splunk" and "data that's actually useful for AI."
Expanso Edge transforms raw data into AI-ready data before it arrives:
✅ Semantic enrichment at the source — every event gets context before it moves: what system generated it, what business function it supports, what data classification it carries, what regulatory jurisdiction it falls under. An ML model training on this data doesn't need months of data engineering — it arrives ready
✅ Schema normalization across thousands of sources — a customer with 247 different log sources has 247 different ways of expressing "a user logged in." Expanso normalizes all of them into a consistent schema before they hit the index. Cisco's TSFM (Time Series Foundation Model) can work across the entire dataset without fighting format inconsistencies
✅ Continuous data quality validation — enforce schemas, validate field types, check for anomalies in the data itself. Events that fail validation get flagged, routed to dead-letter queues, or corrected. The data that arrives in Splunk is trustworthy — the prerequisite for any serious AI/ML deployment
Training data extraction — here's what makes this concrete: the upstream layer can fork a stream. One copy goes to Splunk for real-time analytics. Another copy — enriched, labeled, structured, PII-stripped — goes to a data lake in Parquet or Delta Lake format, partitioned and ready for model training. Splunk becomes the real-time analytics layer and the source of truth for ML training data. That's the Data Fabric story made concrete.
The pitch to a customer isn't "Splunk costs less." It's "Splunk is now your AI/ML data backbone. Every event arrives enriched, normalized, and validated. Your data science team can train models on operational data that was previously inaccessible or unusable."
Win #3: Federated Data Architecture — Extended to the Actual Edge
The Cisco Data Fabric already federates across S3, Snowflake, Azure, and Iceberg. The Machine Data Lake gives customers a cost-effective place to land data within the Splunk ecosystem. Edge Processor handles filtering and routing within the Splunk pipeline.
The natural next step: extend federation all the way to the endpoint.
Not everything needs to be ingested. Not everything needs to be in S3. Some data is most valuable in situ — sitting on the device that generated it, queryable on demand.
Think about what this means for an analyst running Federated Search:
| federated source=edge_fleet host="factory-floor-*" metric="vibration" earliest=-7d
| join type=inner
[search index=iot_alerts severity=critical host="factory-floor-*"]
| timechart span=1h avg(vibration_amplitude) by host
The analyst just queried vibration data from 200 factory floor sensors without that data ever being ingested. It stayed on the sensors. The upstream processing layer at each site handled the query, filtered it, and returned only the results. Splunk ran the analytics. The Machine Data Lake didn't need to store it. The ingest pipeline didn't need to carry it.
This is the Cisco Data Fabric's "bring analytics to data" principle taken to its extreme: analytics at the farthest edge, with Splunk as the unified query layer.
For customers, this unlocks a completely new tier of data:
| Data Tier | Where It Lives | How Splunk Accesses It |
|---|---|---|
| Hot | Splunk Indexes | Direct search (SPL) |
| Warm | Machine Data Lake | Promoted search |
| Cold | S3 / Data Lake | Federated Search (FS-S3) |
| Edge | On the endpoint | Federated Edge Query (NEW) |
That fourth tier doesn't exist today. It's a new capability. It's a new SKU opportunity. And it positions Splunk as the only platform that can query from the index all the way down to the actual device.
Win #4: Better Existing Data
For data you're already sending to Splunk, Expanso makes it arrive cleaner, faster, and cheaper:
✅ Faster searches — pre-parsed fields mean less search-time extraction, cleaner data means more relevant results
✅ Simpler management — one pipeline config replaces inputs.conf + props.conf + transforms.conf across every forwarder
✅ Better data quality — normalize timestamps, enrich with metadata, deduplicate at the edge
✅ Smarter routing — security events to security, metrics to metrics, compliance data tagged and masked automatically
Architecture: Expanso + Splunk Integration
What changes:
- Search performance — pre-extracted fields, no more search-time
rexcommands - Data quality — normalized timestamps, deduplicated events, enriched metadata
- Simpler ops — one pipeline YAML replaces configs spread across every forwarder
- Multi-destination — same data to Splunk + S3 + metrics systems, parsed once
If You Know Splunk, You Already Know Expanso
Expanso uses familiar Splunk concepts — but simpler to manage:
| Splunk Concept | Expanso Equivalent | What's Better |
|---|---|---|
inputs.conf | Pipeline input section | Real-time parsing, multiformat support |
props.conf / transforms.conf | Pipeline processors + Bloblang | One file, not three. Side-by-side SPL translations |
outputs.conf | Pipeline output section | Multi-destination routing built in |
| Heavy Forwarder fleet | Expanso Edge nodes | Cloud-managed, no config drift |
| SPL field extraction | Bloblang mapping | Runs at ingest, not search time |
| Deployment Server | Expanso Cloud Console | GitOps-driven, instant rollouts |
What You'll Build
In this tutorial, you'll create a complete Splunk integration that:
- Collects logs like inputs.conf — file monitoring, multiline parsing
- Parses data like props.conf — field extraction, normalization at ingest time
- Filters noise before indexing — so your indexes only contain data people actually query
- Routes to Splunk HEC — proper index/sourcetype tagging, pre-parsed fields
- Advanced patterns — multi-destination, compliance masking, metrics extraction
Prerequisites
- Expanso Edge installed (installation guide)
- Splunk instance with HEC token configured
- Basic familiarity with Splunk configuration files
Get Started
Choose your path:
Interactive Explorer
See each Splunk integration technique with side-by-side transformations
Step-by-Step Tutorial
Build the pipeline incrementally:
- Collect Like inputs.conf
- Parse Like props.conf
- Filter Before Indexing
- Route to Splunk HEC
- Advanced Splunk Patterns
Complete Pipeline
Download the production-ready solution
Expanso makes your Splunk investment go further — by bringing it into places it's never been, and making everything it already does faster and cleaner.