Create a pipeline with multiple data sources

When creating a data pipeline in the , you can choose to connect multiple data sources to the pipeline. For example, you can create a single pipeline that gets data from a Splunk forwarder, an Amazon S3 bucket, and Microsoft Azure Event Hubs concurrently. You can apply transformations to the data from all three data sources as the data passes through the pipeline, and then send the transformed data out from the pipeline to a destination of your choosing.

If you want to create a pipeline with multiple data sources, in most cases, you can use the Splunk DSP Firehose source function. See the Data sources supported by Splunk DSP Firehose topic in the Connect to Data Sources and Destinations with manual.

However, if you want to use multiple data sources that are not supported by the Splunk DSP Firehose function or if you want to apply specific transformations to the data streams before combining them then do the following tasks:

From the Build Pipeline page, select a data source.
(Optional) From the Canvas view of your pipeline, click the + icon and add any desired transformation functions to the pipeline.
Once you have added all the desired transformation functions to your pipeline, click the + icon and add a union function to your pipeline.
Click the + icon on the immediate left of the Union function, and then add a second source function to your pipeline. You can optionally union more data sources, if desired.
(Optional) In order to union all of your data streams, they must have the same schema. If your data streams don't have the same schema, you can use the select streaming function to match your schemas.
After unioning your data streams, continue building your pipeline by clicking the + icon to the immediate right of the union function.

Create a pipeline with two data sources: Kafka and Splunk DSP Firehose

In this example, create a pipeline with two data sources, Kafka and Splunk DSP Firehose, and union the two data streams by normalizing them to fit the expected Kafka schema.

The following screenshot shows two data streams from two different data sources being unioned together into one data stream in a pipeline.

Prerequisites

A Kafka connection

Steps

From the Build Pipeline page, select the Splunk DSP Firehose data source.
From the Canvas view of your pipeline, add a union function to your pipeline.
. Click the + icon on the immediate left of the Union function, and then select the Kafka source function.
With the Kafka source function selected, on the View Configurations tab, provide your connection ID and topic name.
Normalize the schemas to match. Hover over the circle in between the Splunk DSP Firehose and Union functions, click the + icon, and add an Eval function.
In the Eval function, type the following SPL2. This SPL2 converts the event schema to the default Kafka schema.
```
value=to_bytes(cast(body, "string")),
topic=source_type,
key=to_bytes(time())
```
Hover over the circle in between the Eval and Union functions, click the + icon, and add a Fields function.
To modify the records from the Splunk DSP Firehose so that the schema matches the Kafka record schema, drop all the fields from these records except for the value, topic, and key fields. In the fields_list parameter of the Fields function, do the following:
1. Type value.
2. Click + Add, and then type topic.
3. Click + Add, and then type key.
Now, let's normalize the other data stream. Hover over the circle in between the Kafka and Union functions, click the + icon, and add another Fields function.
In the fields_list parameter of the Fields function, do the following:
1. Type value.
2. Click + Add, and then type topic.
3. Click + Add, and then type key.
Validate your pipeline.

You now have a pipeline that reads from two data sources, Kafka and Splunk DSP Firehose, and merges the data from both sources into one data stream.

Related answers from Splunk Community

Create a pipeline with multiple data sources

Create a pipeline with two data sources: Kafka and Splunk DSP Firehose

Prerequisites

Steps

Comments

Create a pipeline with multiple data sources

Was this topic useful?