Splunk® Data Stream Processor

Use the Data Stream Processor

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor reached its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.

All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

terminology

The uses the following terminology to refer to concepts and features.

Canvas View

The Canvas View is the UI that displays when you open a data pipeline for viewing or editing. You can view a graphical representation of the pipeline, build your pipeline using GUI elements, or enable the SPL View. See Navigating the for more information. See also SPL View.

Cluster

The is installed and deployed into a Kubernetes cluster. In Kubernetes, a cluster is a group of master and worker nodes that share resources to run containerized tasks. After you install and deploy the onto a cluster, the cluster intelligently handles distributing work to the individual nodes for you. If any nodes are added or removed, the cluster redistributes the work as necessary. Search for "Kubernetes Components" in the Kubernetes documentation for more information.

See node.

Collect service

The Collect service is a scalable data collection service that pulls large volumes of data from external data sources and sends it into your data pipeline. The Collect service can be integrated into the through pull-based connectors.

Connection

A connection is a configuration of a connector. Connections contain the identification details and credentials that the uses to access data sources or data destinations. You must create a connection to get data from a source into a pipeline or send data from a pipeline to a destination. You can reuse a connection across multiple data pipelines. See the Connect to Data Sources and Destinations with the manual for more information.

Any credentials that you specify in a connection are uploaded and secured by HTTPS, encrypted, and stored in a secrets manager.

Connector

A connector is a part of the that connects a data pipeline with a data source or destination. See the Connect to Data Sources and Destinations with the manual for more information.

Data pipeline

A data pipeline is a series of functions that define the flow of data from a data source to a data destination. All the data that the handles flows through a data pipeline. Each pipeline starts with a source function that reads data from a data source, and terminates with a sink function that sends data to a data destination. Pipelines can include optional streaming functions that transform the data while it's in transit. See the Building a pipeline chapter for more information.

DSP event

A DSP event is a type of record that is formatted according to a specific schema, as defined in Event schema. Typically, an event contains the data from a distinct occurrence of an incident or activity. For example, a sales transaction might generate an event containing a transaction ID, date of purchase, and price value.

DSP metric

A DSP metric is a type of record that is formatted according to a specific schema, as defined in Metrics schema. Typically, a metric contains quantitative data from a specific point in time. For example, a metric might capture the amount of CPU usage on a server at 11:00 AM on January 5, 2021.

Forwarder service

The Forwarder service is a data collection method that ingests data from Splunk universal or heavy forwarders into a data pipeline. See the Splunk forwarders chapter in the Connect to Data Sources and Destinations with the manual for more information.

Function

A function is the basic building block of a data pipeline. Each function performs a specific action for the pipeline, such as reading data from a particular data source or changing the format of the data at a certain point in the pipeline. Use functions to define the flow of data and manipulate the streaming data as it flows through your pipeline.

There are two types of functions: streaming functions, which include source and sink functions, and scalar functions. For a full list of available functions, see the Function Reference manual.

HTTP Event Collector

The HTTP Event Collector (DSP HEC) is a data collection method that supports the Splunk HTTP Event Collector (Splunk HEC) endpoints. You can use DSP HEC with your existing Splunk HEC workflow to ingest data from HTTP clients into the . See the HTTP clients chapter in the Connect to Data Sources and Destinations with the manual for more information.

Ingest service

The Ingest service is a data collection method that supports JSON-formatted data sent from the Splunk Cloud Services CLI. See About the Ingest service for more information.

Node

A node is the smallest unit of computing hardware in Kubernetes. It represents a single machine in a cluster and can be either a physical machine or a virtual machine hosted on a cloud provider like Google Cloud Platform. During installation, the environment is created by joining multiple nodes to form a cluster. When you join nodes to form a cluster, the nodes are assigned to one of the following profiles depending on the install flavor that you selected.

Using the ha<x> install flavor
  • The initial node is assigned the master profile. When you install using the ha<x> flavor, the first x nodes that join the cluster are assigned the master profile to become a master node. For example, if you install using the ha3 flavor and you have 5 nodes, then your cluster will have 3 master nodes and 2 worker nodes. A master node controls the state of the cluster. For example, the master node determines which applications are running at any given time. The master node also coordinates processes such as implementing updates, scheduling and scaling applications, and maintaining a cluster's state.
  • Additional nodes after x join with the worker profile to become a worker node. A worker node performs the tasks assigned by the master node. If you do not have any worker nodes, then the master nodes will act as both master and worker.
Using the hacp<x> install flavor
For information about the node profiles associated with this role, see the control plane and data plane node profiles.

Regardless of the installation flavor, nodes all operate as part of one cluster. Search for "Kubernetes Components" in the Kubernetes documentation for more information.

See cluster.

Parsed data

Parsed data is data that has been processed into discrete events. When you stream parsed data through your pipeline, each record contains the information for one specific event. See also Unparsed data.

Pipeline activation

Activate a pipeline to start processing and sending data to a chosen destination. See Using activation checkpoints to activate your pipeline for more information.

Pipeline validation

Validate a pipeline to check the configuration of each function in your pipeline. The performs pipeline validation when you start a preview session, activate a pipeline, build a pipeline from an SPL2 statement, or explicitly click the Validate option.

Record

A record is any single piece of data in a data stream. Data flows through your pipeline as records.

Scalar function

A scalar functions is a function that operates in the context of the streaming functions that they are called in. Unlike streaming functions, scalar functions are not full nodes in a pipeline. Instead, they are SPL2 expressions that you can specify as part of the configuration of a streaming function. You can use scalar functions to perform tasks such as addition and subtraction, comparison operations, and data type conversions. For a full list of available scalar functions, see DSP functions by category.

Sink function

A sink function is a type of streaming function that sends data from a pipeline to a data destination. All completed data pipelines end with one or more sink functions. Each sink function supports a specific type of data destination. For example, you would use the Send to Amazon S3 sink function to send data to an Amazon S3 bucket.

Source function

A source function is a type of streaming function that gets data from a data source into a pipeline. All pipelines start with one or more source functions. Each source function supports a specific type of data source. For example, you would use the Forwarder Service source function to get data from a universal forwarder.

SPL View

The SPL View displays the underlying Search Processing Language (SPL2) statement that defines your pipeline, and lets you build your pipeline by typing SPL2 expressions. You can enable or disable the SPL View from the Canvas View. See also Canvas View.

Splunk Cloud Services CLI

A command-line tool for making API calls to the . You can use Splunk Cloud Services CLI to interact with DSP APIs such as the Ingest service. See Get started with the Splunk Cloud Services CLI in the Install and administer the manual and Use the Ingest service to send test events to your pipeline for more information.

Splunk DSP Firehose

The Splunk DSP Firehose is a continuous flow of data from the Forwarder service, the Ingest service, the HTTP Event Collector (DSP HEC), and Syslog servers. The Splunk DSP Firehose function reads the data coming through the Splunk DSP Firehose and makes this data available to your pipeline. This allows you to use a single function to ingest your data instead of using a different function for each data source.

Streaming function

A streaming function is a function that operates on a data stream. These functions are visible as pipeline nodes in the UI. Data streams from one streaming function to the next, and gets processed and transformed along the way. For a full list of available streaming functions, see the Function Reference manual. See also Source function and Sink function.

Template

A template is a copy of a partially or fully configured data pipeline that can be reused as the starting point for another pipeline. See Create a template for a DSP pipeline for information about creating templates.

Unparsed data

Unparsed data is data that hasn't been processed into discrete events. When you stream unparsed data through your pipeline, the might divide a single event across multiple records, or group multiple events into a single record. You can process unparsed data into parsed data using the Apply Line Break function. See also Parsed data.

If your data is ingested from a universal forwarder, it is unparsed.

Last modified on 21 October, 2021
PREVIOUS
Processing data in motion using the
  NEXT
SPL2 in the

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5, 1.3.0, 1.3.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters