Splunk® Data Stream Processor

Use the Data Stream Processor

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Data Stream Processor terminology

The Splunk Data Stream Processor (DSP) uses the following terminology to refer to concepts and features.

Canvas

The Canvas view is the Data Stream Processor UI for building pipelines.

Collect service

The Collect service is a scalable data collection service that pulls large volumes of data from external data sources and sends it into your data pipeline. The Collect service can be integrated into the Splunk Data Stream Processor (DSP) through pull-based connectors.

Connection

The configuration of a connector is called a connection. Connections include the credentials that are required by connectors to connect to their endpoints. These credentials are uploaded and secured by HTTPs, encrypted, and stored in a secrets manager. You must configure a connection before using the associated connector. You can reuse a connection across multiple data pipelines.

Connector

Connectors connect a data pipeline with an external data source, such as AWS Kinesis. There are two types of connectors: pull-based connectors and push-based connectors. Pull-based connectors run on scheduled jobs, whereas push-based connectors push data ad hoc. See Get data in with the Collect Service for more information on pull-based connectors and Get data in with a DSP push-based connector for information on push-based connectors in the Splunk DSP Getting Data In manual.

Data pipeline

A data pipeline is a series of functions that define the flow of data from a data source to a data destination and can include optional transformations that you want to perform. Data sources and data destinations are special functions that start and terminate a pipeline and are referred to as source and sink functions. A data pipeline is, at a minimum, a source and a sink. All data flows through a data pipeline.

DSP event

A special record that has a specific schema, defined here. Data sent from the Ingest, Forwarders, and Collect services are in this schema format. All DSP events are records, but not all records are DSP events.

DSP metric event

A special record that has a specific schema, defined here. All DSP metric events are records.

Forwarder Service

The Forwarder Service is a data ingestion method that forwards data from Splunk universal or heavy forwarders into a data pipeline.

Function

Functions are the basic building block of a pipeline. Use functions to interact with your streaming data as it comes into your pipeline. There are two types of functions: streaming functions and scalar functions. For a full list of available functions, see the Splunk DSP Function Reference manual.

Ingest Service

The Ingest Service is a data ingestion method that allows you to send formatted JSON events or metrics using the command line.

Pipeline activation

Activate a pipeline to start processing and sending data to a chosen destination.

Pipeline validation

Validate a pipeline to check the configuration of each function in your pipeline. Clicking Start Preview, Activate, or Validate in the Data Stream Processor UI performs validation.

Record

Data flows through your pipeline as records. Any single piece of data in a data stream is a record. DSP events and DSP metric events are also records, although records can have any arbitrary schema. Records include data from data sources that don't have the event or metrics schema, such as data that comes through a DSP connector like AWS Kinesis. Sending records to a Splunk Enterprise index may also require additional processing on your pipeline. See the Pipeline requirements for specific data sources in DSP chapter or About sending data to Splunk Enterprise in this manual.

Scalar function

Scalar functions are functions that operate in the context of the streaming functions they are called in. You can use scalar functions to do things like addition and subtraction, perform comparison operations, convert between data types, or other similar tasks. Unlike streaming functions, scalar functions are not full nodes in a pipeline. For a full list of available scalar functions, see the Function Reference.

SCloud

A command-line tool that makes API calls to the Data Stream Processor. You can use scloud to interact with DSP APIs such as the Ingest REST API.

Sink function

A special type of streaming function that represents your data destination. A sink function is the last function that you see in a completed data pipeline.

Source function

A special type of streaming function that represents your data source. A source function is the first function that you see in a completed data pipeline.

Splunk Firehose

A source function that contains the data sent from all data sources that use the Data Pipeline Event or Data Pipeline Metric Event schema. These data sources are all of the DSP API services: Ingest, Forwarders, and Collect.

Streams DSL

A domain-specific language used to specify the arguments of a function and express partial expressions within functions.

Streams JSON

An abstract representation of data pipelines in JSON. A Splunk representative may ask for the Streams JSON associated with your pipeline. See Troubleshoot the Data Stream Processor for instructions on how to get the full Streams JSON for your pipeline.

Streaming function

Streaming functions are functions that operate on a data stream and are visible in the Data Stream Processor UI. Data streams from one streaming function to the next streaming function and gets processed and transformed along the way. Data sources and destinations are also streaming functions and are referred to as source and sink functions. For a full list of available streaming functions, see the DSP Function Reference.

Template

Templates are partially or fully configured pipelines for specific use cases that you can save and reuse. See Save a pipeline as a template for information on how to create templates.

Last modified on 28 March, 2020
PREVIOUS
About the Data Stream Processor
  NEXT
Get data in to the Splunk Data Stream Processor

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters