Data Stream Processor terminology

The Splunk Data Stream Processor (DSP) uses the following terminology to refer to concepts and features.

Canvas

The Canvas view is the Data Stream Processor UI for building pipelines.

Collect service

The Collect service is a scalable data collection service that pulls large volumes of data from external data sources and sends it into your data pipeline. The Collect service can be integrated into the Splunk Data Stream Processor (DSP) through pull-based connectors.

Connection

The configuration of a connector is called a connection. Connections include the credentials that are required by connectors to connect to their endpoints. These credentials are uploaded and secured by HTTPs, encrypted, and stored in a secrets manager. You must configure a connection before using the associated connector. You can reuse a connection across multiple data pipelines.

Connector

Connectors connect a data pipeline with an external data source or destination, such as AWS Kinesis. There are two types of connectors: batch connectors and streaming connectors. Batch connectors run on scheduled jobs, whereas streaming connectors push data ad hoc. See Get data in with the Collect Service for more information on batch connectors and streaming connectors for information on streaming connectors in the Splunk DSP Getting Data In manual.

Data pipeline

A data pipeline is a series of functions that define the flow of data from a data source to a data destination and can include optional transformations that you want to perform. Data sources and data destinations are special functions that start and terminate a pipeline and are referred to as source and sink functions. A data pipeline is, at a minimum, a source and a sink. All data flows through a data pipeline.

DSP event

A special record that has a specific schema, defined here. Data sent from the Ingest, Forwarders, and Collect services are in this schema format. All DSP events are records, but not all records are DSP events.

DSP metric event

A special record that has a specific schema, defined here. All DSP metric events are records.

Forwarder Service

The Forwarder Service is a data ingestion method that forwards data from Splunk universal or heavy forwarders into a data pipeline.

Function

Functions are the basic building block of a pipeline. Use functions to interact with your streaming data as it comes into your pipeline. There are two types of functions: streaming functions and scalar functions. For a full list of available functions, see the Splunk DSP Function Reference manual.

HTTP Event Collector

The DSP HTTP Event Collector (DSP HEC) is a data ingestion method that supports the Splunk HTTP Event Collector (HEC) endpoints. You can use DSP HEC with the Read from Splunk Firehose data source function and your existing Splunk HEC workflow to ingest data into DSP.

Ingest Service

The Ingest Service is a data ingestion method that allows you to send formatted JSON events or metrics using the command line.

Pipeline activation

Activate a pipeline to start processing and sending data to a chosen destination.

Pipeline validation

Validate a pipeline to check the configuration of each function in your pipeline. Clicking Start Preview, Activate, or Validate in the Data Stream Processor UI performs validation.

Record

Data flows through your pipeline as records. Any single piece of data in a data stream is a record. DSP events and DSP metric events are also records, although records can have any arbitrary schema. Records include data from data sources that don't have the event or metrics schema, such as data that comes through a DSP connector like AWS Kinesis. Sending records to a Splunk Enterprise index may also require additional processing on your pipeline. See the Unique pipeline requirements for specific data sources chapter or About sending data to Splunk Enterprise in this manual.

Scalar function

Scalar functions are functions that operate in the context of the streaming functions they are called in. Unlike streaming functions, scalar functions are not full nodes in a pipeline. You can use scalar functions to do things like addition and subtraction, perform comparison operations, convert between data types, or other similar tasks. For a full list of available scalar functions, see the DSP Function Reference.

SCloud

A command-line tool that makes API calls to the Data Stream Processor. You can use SCloud to interact with DSP APIs such as the Ingest REST API.

Sink function

A special type of streaming function that represents your data destination. A sink function is the last function that you see in a completed data pipeline.

Source function

A special type of streaming function that represents your data source. A source function is the first function that you see in a completed data pipeline.

Splunk Firehose

The Splunk Firehose is a steady stream of data from the Forwarder, Collect, Ingest, DSP HEC, and Syslog (through SC4S) API services. The Read from Splunk Firehose function reads the data coming through the Splunk Firehose and makes this data available to your pipeline. This allows you to use a single function to ingest your data instead of using a different function for each data source.

All data received by Splunk Firehose is stored for 24 hours. After 24 hours the oldest data is deleted. See Data retention policies for more information.

Streams JSON

An abstract representation of data pipelines in JSON. For troubleshooting purposes, a Splunk representative may ask for the Streams JSON associated with your pipeline. See Troubleshoot the Data Stream Processor for instructions on how to get the full Streams JSON for your pipeline.

Streaming function

Streaming functions are functions that operate on a data stream and are the functions that are visible in the Data Stream Processor UI. Data streams from one streaming function to the next streaming function and gets processed and transformed along the way. Data sources and destinations are also streaming functions and are referred to as Source and Sink functions respectively. For a full list of available streaming functions, see the DSP Function Reference.

Template

Templates are partially or fully configured pipelines for specific use-cases that can be saved for re-use. See Save a pipeline as a template for information on how to create templates.

Related answers from Splunk Community

Data Stream Processor terminology

Canvas

Collect service

Connection

Connector

Data pipeline

DSP event

DSP metric event

Forwarder Service

Function

HTTP Event Collector

Ingest Service

Pipeline activation

Pipeline validation

Record

Scalar function

SCloud

Sink function

Source function

Splunk Firehose

Streams JSON

Streaming function

Template

Comments

Data Stream Processor terminology

Was this topic useful?