Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Source functions (Data Sources)

The following are available source functions for your pipeline.

Amazon CloudWatch Metrics Connector

Get data from Amazon CloudWatch Metrics. You must create a connection to use this source function. See Use the Amazon CloudWatch Metrics connector with Splunk DSP.

API function name: read_from_aws_cloudwatch_metrics
Function Output:
This function outputs data pipeline metric events in the schema shown here.
Arguments:

  • connection_id: The ID of your Amazon CloudWatch Metrics connection.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_from_aws_cloudwatch_metrics("my-connection-id", "TRIM_HORIZON") |... ;

Amazon Metadata Connector

Get data from the resources and infrastructure in Amazon Web Services (AWS). You must create a connection to use this source function. See Use the AWS Metadata Connector with Splunk DSP.

API function name: read_from_aws_metadata
Function Output:
This function outputs data pipeline events in the schema shown here.
Arguments:

  • connection_id: The ID of your Amazon Metadata connection.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_from_aws_metadata("my-connection-id", "TRIM_HORIZON") |... ;

Amazon S3 Connector

Get data from Amazon S3. You must create a connection to use this source function. See Use the Amazon S3 Connector.

API function name: read_from_aws_s3
Function Output:
This function outputs data pipeline events in the schema shown here for events.
Arguments:

  • connection_id: The ID of your Amazon S3 connection.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_from_aws_s3("my-connection-id", "TRIM_HORIZON") | ...;

Azure Event Hubs Using SAS Key

Get data from an Azure Event Hubs namespace. You must create a connection to use this source function. See Create a connection for the DSP Azure Event Hubs Connector with an SAS key. For information on how to create a pipeline using Azure Event Hubs as your source, see Deserialize and send Azure Event Hubs data from a DSP pipeline.

API function name: read_event_hubs
Function Output:
This function outputs records with the following schema:

{
partitionKey: <partitionKey> as a string. 
body: <body> in bytes. 
partitionId: <partitionId> as a string. 
offset: <offset> as a string.
sequenceNumber: <sequenceNumber> as a long. 
enqueuedTime: <enqueuedTime> as a long.
properties: <properties> as a map<string, string> 
}

Arguments:

  • connection_id: The ID of your Azure Event Hubs connection.
  • event_hub_name: The name of the Event Hub entity to subscribe to.
  • consumer_group_name: The name of a consumer group. This must match the consumer group name as defined in Azure Event Hubs. If the consumer group does not exist, the pipeline will fail. Consumer groups are limited to 5 concurrent readers. To avoid reaching this limit, create a new, dedicated consumer group for each pipeline.
  • starting_position: The position in the data stream where you want to start reading data. Set this argument to one of the following values:
    • LATEST: Start reading data from the latest position on the data stream.
    • EARLIEST: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_event_hubs("my-connection-id", "my-event-hub-name", "my-consumer-group", "latest") | ...;

Azure Monitor Metrics Connector

Get data from Microsoft Azure Monitor. You must create a connection to use this source function. See Use the Azure Monitor Metrics Connector with Splunk DSP.

API function name: read_from_azure_monitor_metrics
Function Output:
This function outputs data pipeline metric events in the schema shown here.
Arguments:

  • connection_id: The ID of your Azure Monitor Metrics connection.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Specify one of the following values:
    • LATEST (Default): Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_from_azure_monitor_metrics("my-connection-id", "TRIM_HORIZON") |... ;

Google Cloud Monitoring Metrics Connector

Get data from Google Cloud Monitoring. You must create a connection to use this source function. See Use the Google Cloud Monitoring Metrics Connector with Splunk DSP.

API function name: read_from_gcp_monitoring_metrics
Function Output:
This function outputs data pipeline metric events in the schema shown here.
Arguments:

  • connection_id: The ID of your Google Cloud Monitoring Metrics connection.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST (Default): Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_from_gcp_monitoring_metrics("my-connection-id", "TRIM_HORIZON") |... ;

Microsoft 365 Connector

Get data from the Office 365 Management Activity API. You must create a connection to use this source function. See Use the Microsoft 365 Connector with Splunk DSP.

API function name: read_from_microsoft_365
Function Output:
This function outputs data pipeline events in the schema shown here.
Arguments:

  • connection_id: The ID of your Microsoft 365 connection.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_from_microsoft_365("my-connection-id", "TRIM_HORIZON") |... ;

Read from Amazon Kinesis Stream

Get data from AWS Kinesis using static credentials. You must create a connection to use this source function. See Kinesis Static Connector. To deserialize and preview your data, see Deserialize and preview data from Kinesis.

API function name: read_kinesis
Function Output:
This function outputs records with the following schema:

{
key: <key> as a string
value: <value> in bytes
stream: <stream> as a string 
shard: <shard> as a string
sequence: <sequence> as a string
approxArrivalTimestamp: <approxArrivalTimestamp> as a long
accountId: <accountId> as a string
region: <region> as a string
}

Arguments:

  • connection_id: The ID of your Amazon Kinesis connection.
  • stream_name: The name of the stream.
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_kinesis("my-connection-id", "my-stream-name", "TRIM_HORIZON") |...;

Read from Kafka

Get data from an Apache or Confluent Kafka topic using a Kafka connection. See Create a Kafka pipeline.

This function is available as two different connectors: one that connects to SSL-enabled servers, and another that connects to servers without SSL authentication. See Create a connection for the DSP Kafka SSL Connector and Create a connection for the DSP Apache Kafka Connector without authentication.

API function name: read_kafka
Function Output:
This function outputs records with the following schema:

{
key: <key> in bytes. 
value: <value> in bytes. 
topic: <topic> as a string. 
partition: <integer> as an integer.
offset: <long> as a long. 
}

Arguments:

  • connection_id: The ID of your Kafka connection.
  • topic: The name of the Kafka topic.
  • consumer_properties (Optional): The consumer properties by which you want to delimit your data. Specify each property using the format "<name>": "<value>", and separate each property with a comma (,). Make sure to enclose the entire argument in braces ({ }). Defaults to { } .

SPL2 Pipeline Builder example:

| from read_kafka("my-connection-id", "my-topic", {"property1": "value1", "property2": "value2"}) |...;

Read from Splunk Firehose

Reads data sent from the Forwarder, Collect, Ingest, DSP HEC, and Syslog (through SC4S) API services. Events from this function have the data pipeline event schema or metrics schema.

API function name: read_splunk_firehose
Function Output:
This function outputs data pipeline events in the schema shown here for events or here for metrics.
Arguments:

  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from read_splunk_firehose("TRIM_HORIZON") |...;

Receive from Ingest REST API

Get data from the Ingest REST API. This is a source function that filters your data to only ingest data from the Ingest REST API Service. For information on how to send data to DSP using the Ingest REST API, see Format and send events to a DSP data pipeline using the Ingest REST API.

API function name: receive_from_ingest_rest_api
Function Output:
This function outputs data pipeline events in the schema shown here for events or here for metrics.

Arguments:

  • connection_id: rest_api:all
  • initial_position (Optional): The position in the data stream where you want to start reading data. Defaults to LATEST.
    • LATEST: Start reading data from the latest position on the data stream.
    • TRIM_HORIZON: Start reading data from the very beginning of the data stream.

SPL2 Pipeline Builder example:

| from receive_from_ingest_rest_api("rest_api:all", "TRIM_HORIZON") | ...;

Receive from Splunk Forwarders

Get data from the Splunk Forwarders Service. This is a source function that filters your data to only ingest data from the Splunk Forwarders Service. See also Create a Splunk Universal Forwarder pipeline.

API function name: receive_from_forwarders
Function Output:
This function outputs data pipeline events in the schema shown here for events or here for metrics.
Arguments:

  • connection_id: forwarders:all

SPL2 Pipeline Builder example:

| from receive_from_forwarders("forwarders:all") |....;
Last modified on 05 August, 2020
PREVIOUS
Where
  NEXT
Sink functions (Data Destinations)

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters