How the Edge Processor solution works
The Edge Processor solution combines Splunk-managed cloud services, on-premises data processing software, and the Search Processing Language, version 2 (SPL2) to support data processing at the edge of your network. The Edge Processor solution consists of the following main components:
Component | Description | Usage |
---|---|---|
Edge Processor | A data processing engine that allocates resources for processing and routing data | You install Edge Processors on machines in your local network. Edge Processors provide an on-premises data plane that lets you reduce and sanitize your data before sending it outside of your network. |
Edge Processor service | A cloud service that provides a centralized console for managing Edge Processors | Splunk hosts the Edge Processor service as part of Splunk Cloud Platform. The Edge Processor service provides a cloud control plane that lets you deploy configurations, monitor the status of your Edge Processors, and gain visibility into the amount of data that is moving through your network. |
Pipeline | A set of data processing instructions written in SPL2, which is the data search and preparation language used by Splunk software | In the Edge Processor service, you create pipelines to specify what data to process, how to process it, and what destination to send the processed data to. Then, you apply pipelines to your Edge Processors to configure them to start processing data according to those instructions. |
By using the Edge Processor solution, you can process data in your own local network while also managing and monitoring your data ingest ecosystem from a Splunk-managed cloud service.
The following diagram provides an overview of these elements:
- The components that comprise the Edge Processor solution, and whether each component is hosted in the Splunk cloud environment or your local environment. See the System architecture section on this page for more information.
- The path your data takes as it moves from source to destination through an Edge Processor. See the Data pathway section on this page for more information.
System architecture
The primary components of the Edge Processor solution include the Edge Processor service, Edge Processors, and SPL2 pipelines that support data processing.
Edge Processor service
The Edge Processor service is a cloud service hosted by Splunk. It is part of the data management experience, which is a set of services that fulfill a variety of data ingest and processing use cases.
You can use the Edge Processor service to do the following:
- Configure and install Edge Processors on your local environment for on-location data processing.
- Create and apply SPL2 pipelines that determine how each Edge Processor processes and routes the data that it receives.
- Define source types to identify the kind of data that you want to process and determine how Edge Processors break and merge that data into distinct events.
- Create connections to the destinations that you want your Edge Processors to send processed data to.
You access the Edge Processor service by logging in to your tenant in the Splunk cloud environment. Your tenant is connected with your Splunk Cloud Platform deployment, and uses it as an identity provider for managing user accounts and logins. To log in and access the Edge Processor service, use the same username and password as you would when logging in to your Splunk Cloud Platform deployment.
The connection between the tenant and the Splunk Cloud Platform deployment also allows the Edge Processor solution to use the deployment as a storage location for the logs and metrics that are generated by Edge Processors. The Edge Processor service retrieves these logs and metrics from the deployment and displays them in the user interface of the service.
These Edge Processor logs and metrics only contain information pertaining to the operational status of a given Edge Processor. They do not contain any of the actual data that you are ingesting and processing through Edge Processors. See the Edge Processors section that follows for more details.
Edge Processors
An Edge Processor is a data processing engine that allocates resources for processing and routing data. You can install an Edge Processor on a single server node in your network or on a cluster of multiple server nodes. Multi-instance Edge Processors provide more powerful data processing capabilities than single-instance Edge Processors. Be aware that multiple Edge Processor instances cannot run on the same machine, so you must install each instance on a different machine.
Each Edge Processor instance is associated with a supervisor, which contacts the cloud service at regular intervals to check for system updates, provide telemetry data, and confirm that the instance is still connected to the service. When you use the Edge Processor service to change your Edge Processor configurations or pipeline definitions, or when Splunk releases new features or bug fixes for Edge Processors, the supervisor detects these changes and automatically updates the instance as needed.
The supervisor sends the following information from the Edge Processor instance to the Edge Processor service in the cloud:
- Configuration information. This includes details such as the following:
- The list of applied pipelines
- The datasets that represent the selected data sources and destinations
- The names of the Splunk indexes that the Edge Processor sends internal logs and metrics to
- The version of the Edge Processor software that the instance is running
- Heartbeats that indicate the status of the Edge Processor instance and confirm if the instance is still connected to the service. These heartbeats include information such as the following:
- Whether the instance is running or stopped
- How much CPU and memory the instance is consuming
- The version of the Edge Processor software that the instance is running
As an Edge Processor works to process data, it generates logs and metrics containing operational information such as the amount of data that was processed and any events, warnings, or errors that have occurred. The Edge Processor sends these logs and metrics to the Splunk Cloud Platform deployment that is connected to the tenant.
The information that an Edge Processor instance and its supervisor sends to the cloud does not contain any of the actual data that is being ingested and processed. The data that you send through an Edge Processor only gets transmitted to the destinations that you choose in the Edge Processor configuration settings and the applied pipelines.
Pipelines
A pipeline is a set of data processing instructions written in SPL2. When you create a pipeline, you write a specialized SPL2 statement that specifies which data to process, how to process it, and where to send the results. For example, you can create a pipeline that filters for syslog events and sends them to a dedicated index in Splunk Cloud Platform. When you apply a pipeline to an Edge Processor, the Edge Processor uses those instructions to process all the data that it receives from data sources such as Splunk forwarders, HTTP clients, and logging agents.
The Edge Processor solution supports a subset of SPL2 commands and functions. Pipelines can include only the commands and functions that are part of the EdgeProcessor
profile. For information about the specific SPL2 commands and functions that you can use to write pipelines for Edge Processors, see Edge Processor pipeline syntax. For a summary of how the EdgeProcessor
profile supports different commands and functions compared to other SPL2 profiles, see the following pages in the SPL2 Search Reference:
- Compatibility Quick Reference for SPL2 commands
- Compatibility Quick Reference for SPL2 evaluation functions
Data pathway
Data moves through the Edge Processor solution as follows:
- A tool, machine, or piece of software in your network generates data such as event logs or traces.
- An agent, such as a Splunk forwarder, receives the data and then sends it to an Edge Processor. Alternatively, the device or software that generated the data can send it to an Edge Processor without using an agent.
- The Edge Processor filters and transforms the data, and then sends the resulting processed data to a destination such as an indexer then into a Splunk index.
By default, Edge Processors route processed data to destinations based on any pipelines you applied. If there are no applicable pipelines, then unprocessed data is either dropped or routed to the default destination specified in the configuration setting of the Edge Processor. For more information about how data moves through an Edge Processor, see Partitions.
If you don't specify a default destination, then Edge Processors drop unprocessed data.
As the Edge Processor receives and processes data, it measures metrics indicating the volume of data that was received, processed, and sent to a destination. These metrics are stored in the _metrics index of the Splunk Cloud Platform deployment that is connected to your tenant. The Edge Processor service surfaces the metrics in the dashboard, providing detailed overviews of the amount of data that is moving through the system.
Partitions
Each Edge Processor instance merges the received data into an internal dataset before processing and routing that data. A partition is a subset of data that is selected for processing by a pipeline. Each pipeline that you apply to an Edge Processor creates a partition from the internal dataset. For information about how to specify a partition when creating a pipeline, see Create pipeline for Edge Processors.
The partitions that you create and the configuration of your Edge Processor determines how the Edge Processor routes the received data and whether any data is dropped:
- The data that the Edge Processor receives is defined as processed or unprocessed based on whether there is at least one partition for that data. For example, if your Edge Processor receives Windows event logs and Linux audit logs, but you only applied a pipeline for Windows event logs, then those Windows event logs are selected in a partition and considered to be processed while the Linux audit logs are considered to be unprocessed.
- Each pipeline creates a partition of the incoming data based on specified conditions, and only processes data that meets those conditions. Any data that does not meet those conditions is considered to be unprocessed.
- If you configure your pipeline to filter the processed data, the data that is filtered out gets dropped.
- If you configure your Edge Processor to have a default destination, then the unprocessed data goes to that default destination.
- If you do not set a default destination, then the unprocessed data is dropped.
The following is a diagram of the Edge Processor data pathway.
See also
For information about how to set up and use specific components of the Edge Processor solution, see the following resources.
About the Edge Processor solution | First-time setup instructions for the Edge Processor solution |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!