Connecting Amazon Kinesis Data Streams to your DSP pipeline as a data source

When creating a data pipeline in the , you can connect to Amazon Kinesis Data Streams and use it as a data source. You can get data from a Kinesis data stream into a pipeline, transform the data as needed, and then send the transformed data out from the pipeline to a destination of your choosing.

To connect to Kinesis as a data source, you must complete the following tasks:

Create a connection that allows DSP to access your Kinesis data. See Create a DSP connection to Amazon Kinesis Data Streams.
Create a pipeline that starts with the Amazon Kinesis Data Stream source function. See the Building a pipeline chapter in the Use the Data Stream Processor manual for instructions on how to build a data pipeline.
Configure the Amazon Kinesis Data Stream source function to use your Kinesis connection. See Get data from Amazon Kinesis Data Stream in the Function Reference manual.
(Optional) To verify that you've configured the connection and source function correctly, start a pipeline preview and confirm that your Kinesis data appears in the Preview Results pane as expected. Notice that the payloads of your Kinesis records are stored in a field named value, which is a bytes field.

Amazon Kinesis Data Streams always encodes data using Base64 before transporting it. DSP automatically decodes the incoming data from Kinesis, so you don't need to include pipeline functions for decoding the data.

(Optional) Convert the value field from bytes to a more commonly supported data type such as string. This conversion makes the field compatible with a wider range of streaming functions. To convert your data, start by adding an Eval function to your pipeline. Place this function either immediately after the Amazon Kinesis Data Stream source function or after the Where function if you're using one to filter the incoming data. Then, configure the Eval function to use the appropriate conversion scalar function.
The specific scalar function that you need to use varies depending on the format of your Kinesis payload. In most cases, you can use one of the following expressions in your Eval function:
- To convert the Kinesis payload from bytes to a string: value=tostring(value)
- To convert the Kinesis payload from bytes to a map of key-value pairs: value=deserialize_json_object(value)
See Eval and Conversion in the Function Reference manual for more information about these functions.
If you're planning to send the Kinesis data to a Splunk index, make sure to format the records so that they can be indexed meaningfully. See Formatting data from Amazon Kinesis Data Streams for indexing in the Splunk platform.

When you activate the pipeline, the Amazon Kinesis Data Stream source function starts collecting data from Kinesis.

Connecting Amazon Kinesis Data Streams to your DSP pipeline as a data source

Comments

Was this topic useful?