Splunk® Data Stream Processor

Splunk Data Stream Processor Tutorial

Acrobat logo Download manual as PDF


DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Tutorial

The is a stream processing service that processes data while sending it from a source to a destination through a data pipeline. If you are new to DSP, use this tutorial to get familiar with the capabilities of DSP.

In this tutorial, you will do the following:

  • Create a data pipeline in the DSP user interface (UI).
  • Send data to your pipeline using the Ingest service. To do that, you will use the Splunk Cloud Services CLI (command-line interface) tool to make API calls to the Ingest service.
  • Transform the data in your pipeline by extracting interesting data points and redacting confidential information.
  • Send the transformed data to a Splunk Enterprise index.

Before you begin

To complete this tutorial, you need to have the following:

Prerequisite How to obtain
A licensed instance of the . See the following documentation:
The URL for accessing the user interface (UI) of your DSP instance. The URL is https://<IP_Address>:30000, where <IP_Address> is the IP address associated with your DSP instance. You can also confirm the URL by opening a terminal window in the extracted DSP installer directory and running the sudo ./print-login command. The hostname field shows the URL.
Credentials for logging into your DSP instance. At the end of the DSP installation process, the installer shows the credentials for your DSP administrator account. If you need to retrieve that information, open a terminal window in the extracted DSP installer directory and then run the sudo ./print-login command.
The Splunk Cloud Services CLI tool, configured to send data to the DSP instance. See Get started with the Splunk Cloud Services CLI for more details.

This tutorial uses Splunk Cloud Services CLI 4.0.0, the version used when running the base ./scloud command in DSP 1.2.0.

The demodata.txt tutorial data. The /examples folder in the extracted DSP installer directory contains a copy of this file. You can also download a copy of the file here.
Access to a Splunk Enterprise instance that has the HTTP Event Collector (HEC) enabled. You'll also need to know the HEC endpoint URL and HEC token associated with the instance. Ask your Splunk administrator for assistance. See Set up and use HTTP Event Collector in Splunk Web in the Splunk Enterprise Getting Data In manual.

What's in the tutorial data

The demodata.txt tutorial data contains purchase history for the fictitious Buttercup Games store.

The tutorial data looks like this:

A5D125F5550BE7822FC6EE156E37733A,08DB3F9FCF01530D6F7E70EB88C3AE5B,Credit Card,14,2018-01-13 04:37:00,2018-01-13 04:47:00,-73.966843,40.756741,4539385381557252
1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0,Game Card,16.5,2018-01-13 04:26:00,2018-01-13 04:46:00,-73.956451,40.771442

Log in to the DSP UI

Start by navigating to the DSP UI and logging in to it.

  1. In a browser, navigate to the URL for the DSP UI.
  2. Log in with your DSP username and password.
  3. Click Use this tenant beside the name of your tenant.

The DSP UI shows the home page.

Create pipeline using the Canvas Builder in the DSP UI

Create a pipeline that receives data from the Ingest service and sends the data to your default Splunk Enterprise instance.

In DSP, you can choose to create pipelines using either the Canvas Builder or the SPL2 (Search Processing Language 2) Builder. The Canvas Builder provides graphical user interface elements for building a pipeline, while the SPL2 builder accepts SPL2 statements that define the functions and configurations in a pipeline.

For this tutorial, use the Canvas Builder to create and configure your pipeline.

  1. Click Build Pipeline, then select the Splunk DSP Firehose data source.
  2. Confirm that the builder toggle switch is set to Canvas.
  3. On the pipeline canvas, click the + icon beside the Splunk DSP Firehose function, and select Send to a Splunk Index with Batching from the function picker.
  4. Create a connection to your Splunk Enterprise instance so that the Send to a Splunk Index with Batching function can send data to an index on that instance. On the View Configurations tab, from the connection_id drop-down list, select Create New Connection.
    1. In the Connection Name field, enter a name for your connection.
    2. In the Splunk HEC endpoint URLs field, enter the HEC endpoint URL associated with your Splunk Enterprise instance.
    3. In the Splunk HEC endpoint token field, enter the HEC endpoint token associated with your Splunk Enterprise instance.
    4. Click Save.
  5. Back on the View Configurations tab, finish configuring the Send to a Splunk Index with Batching function.
    1. In the index field, enter null.
    2. In the default_index field, enter "main" (including the quotation marks).
  6. Click the More Options DSP "More Options" button button located beside the Activate Pipeline button, and select Validate to check that all functions in your pipeline are configured correctly.
  7. To save your pipeline, click Save.
  8. Give your pipeline a name and a description, and make sure that the Save As drop-down list is set to Pipeline. Then, click Save again.

You now have a basic pipeline that can read data from Splunk DSP API services such as the Ingest service and then send the data to an index on your Splunk Enterprise instance.

Send data to your pipeline using the Ingest service

Now that you have a pipeline to send data to, let's send some data to it!

Use the Ingest service to send data from the demodata.txt file into your pipeline. To achieve this, you use the Splunk Cloud Services CLI to make API calls to the Ingest service.

Because this pipeline is not activated yet, it is not actually sending data through to your Splunk Enterprise instance at this time. You'll activate the pipeline and send the data to its destination later in this tutorial.

  1. Open a terminal window and navigate to the extracted DSP installer directory. This directory is also the working directory for the Splunk Cloud Services CLI tool.
  2. Log in to the Splunk Cloud Services CLI using the following command. When prompted to provide your username and password, use the credentials from your DSP account.
    ./scloud login
    

    The Splunk Cloud Services CLI doesn't return your login metadata or access token. If you want to see your access token you must log in using the verbose flag: ./scloud login --verbose.

  3. In the DSP UI, with your pipeline open in the Canvas Builder, select the Splunk DSP Firehose function and then click Start Preview to begin a preview session.
  4. In the terminal window, send the sample data to your pipeline by running the following command.
    cat examples/demodata.txt | while read line; do echo $line | ./scloud ingest post-events --host Buttercup --source syslog --sourcetype Unknown; done
    

    It can take up to a minute to send the entire file.

  5. In the DSP UI, view the Preview Results tab. Confirm that your data events are flowing through the pipeline and appearing in the preview.

You've now confirmed that your pipeline is ingesting data successfully.

Transform your data

Let's add transformations to your data before sending it to a Splunk Enterprise index.

In this section, you change the value of a field to provide more meaningful information, extract interesting nested fields into top-level fields, and redact credit card information from the data before sending it off to a Splunk Enterprise index for indexing.

  1. Change the source_type of your data events from Unknown to purchases.
    1. In the DSP UI, on the pipeline canvas, click the + icon between the Splunk DSP Firehose and Send to a Splunk Index with Batching functions, and select Eval from the function picker.
    2. On the View Configurations tab, enter the following SPL2 expression in the function field:
      source_type="purchases"
      
    3. Click the More Options DSP "More Options" button button located beside the Activate Pipeline button, and select Validate to check that all functions in your pipeline are configured properly.
  2. To see how this function transforms your data, send your sample data to your pipeline again and preview the data that comes out from the Eval function.
    1. On the pipeline canvas, select the Eval function in your pipeline and then click Start Preview to restart your preview session.
    2. Wait a few seconds before running the following command in the Splunk Cloud Services CLI. Because your pipeline isn't activated, resending your sample data doesn't result in data duplication.
      cat examples/demodata.txt | while read line; do echo $line | ./scloud ingest post-events --host Buttercup --source syslog --sourcetype Unknown; done
      
    3. Back in DSP, view the data on the Preview Results tab and confirm that the value of the source_type field is now purchases.
  3. In the data preview, you'll notice there are several interesting fields in the body field including the type of card used, the purchase amount, and the sale date. Extract some of these nested fields into the attributes field.
    1. On the pipeline canvas, select the Eval function.
    2. On the View Configurations tab, click +Add and then enter the following SPL2 expression in the newly added field. This SPL2 expression casts the body data to the string data type, extracts key-value pairs from those strings using regular expressions, and then inserts the extracted key-value pairs into the attributes field. The body data must be cast to a different data type first because by default it is considered to be a union of all DSP data types, and the extract_regex function only accepts string data as its input.
      attributes=extract_regex(cast(body, "string"), /(?<tid>[A-Z0-9]+?),(?<cid>[A-Z0-9]+?),(?<Type>[\w]+\s\w+),(?<Amount>[\S]+),(?<sdate>[\S]+)\s(?<stime>[\S]+),(?<edate>[\S]+)\s(?<etime>[\S]+?),(?<Longitude>[\S]+?),(?<Latitude>[\S]+?),(?<Card>[\d]*)/)
      
  4. Now that you've extracted some of your nested fields into the attributes field, take it one step further and promote these attributes as top-level fields in your data.
    1. On the pipeline canvas, click the + icon between the Eval and Send to a Splunk Index with Batching functions, and then select Eval from the function picker.
    2. On the View Configurations tab, enter the following SPL2 expression in the function field for the newly added Eval function. These expressions turn the key-value pairs in the attributes field into top-level fields so that you can easily see the fields that you've extracted.
      Transaction_ID=map_get(attributes, "tid"),
      Customer_ID=map_get(attributes, "cid"),
      Type=map_get(attributes, "Type"),
      Amount=map_get(attributes, "Amount"),
      Start_Date=map_get(attributes, "sdate"),
      Start_Time=map_get(attributes, "stime"),
      End_Date=map_get(attributes, "edate"),
      End_Time=map_get(attributes, "etime"),
      Longitude=map_get(attributes, "Longitude"),
      Latitude=map_get(attributes, "Latitude"),
      Credit_Card=map_get(attributes, "Card")
      
  5. Notice that your data contains the credit card number used to make a purchase. Redact that information before sending it to your index.
    1. On the pipeline canvas, click the + icon between the second Eval function and the Send to a Splunk Index with Batching function, and then select Eval from the function picker.
    2. On the View Configurations tab, enter the following SPL2 expression in the function field for the newly added Eval function. This SPL2 expression uses a regular expression pattern to detect credit card numbers and replace the numbers with <redacted>.
      Credit_Card=replace(cast(Credit_Card, "string"), /\d{15,16}/, "<redacted>")
      
  6. The original body of the event had sensitive credit card information, let's also redact that information from the body field.
    1. On the pipeline canvas, select the Eval function that you added during the previous step.
    2. On the View Configurations tab, click +Add and then enter the following SPL2 expression in the newly added field. This SPL2 expression casts the body data to the string data type, detects credit card numbers in those strings using a regular expression pattern, and then replaces the numbers with <redacted>. The body data must be cast to a different data type first because by default it is considered to be a union of all DSP data types, and the replace function only accepts string data as its input.
      body=replace(cast(body,"string"),/[1-5][0-9]{15}/,"<redacted>")
      
  7. The attributes field also contains sensitive credit card information, so let's also redact the information there.
    1. On the pipeline canvas, select the Eval function that you modified during the previous step.
    2. On the View Configurations tab, click +Add and then enter the following SPL2 expression in the newly added field. This SPL2 expression does the following:
      • Gets the Card value from the attributes map, and then casts that value to the string data type.
      • Detects credit card numbers in the Card value using a regular expression pattern, and then replaces the numbers with <redacted>.
      • Converts the Card value from the string data type back to the map data type.

      The Card value must be converted between the string and map data types because attributes contains map data, and the replace function only accepts string data as its input.

      attributes=map_set(attributes, "Card", replace(ucast(map_get(attributes, "Card"), "string", null), /[1-5][0-9]{15}/, "<redacted>"))
      
  8. Now that you've constructed a full pipeline, preview your data again to see what your transformed data looks like.
    1. Click the More Options DSP "More Options" button button located beside the Activate Pipeline button, and select Validate to check that all functions in your pipeline are configured properly.
    2. Select the last Eval function in the pipeline, and then click Start Preview to restart your preview session.
    3. Wait a few seconds, and then run this command again in the Splunk Cloud Services CLI. Because your pipeline isn't activated yet, resending your sample data will not result in data duplication.
      cat examples/demodata.txt | while read line; do echo $line | ./scloud ingest post-events --host Buttercup --source syslog --sourcetype Unknown; done
      
    4. Back in DSP, view the data on the Preview Results tab and confirm the following:
      • The value in the source_type field is purchases.
      • The attributes field contains information such as the type of card used, the purchase amount, and the sale date.
      • The body and attributes fields do not show any credit card numbers.
  9. To save these transformations to your pipeline, click Save.

After following these steps, you have a DSP pipeline that resembles the following.

DSP Tutorial Screenshot Updated-1.2.0.PNG

Send your transformed data to Splunk Enterprise

Now that you have a completed pipeline that extracts useful information and redacts sensitive information, activate the pipeline and send the transformed data to your default Splunk Enterprise instance.

  1. In DSP, click Activate Pipeline > Activate. Do not check Skip Restore State or Allow Non-Restored State. Neither of these options are valid when you activate your pipeline for the first time. Activating your pipeline also checks to make sure that your pipeline is valid. If you are unable to activate your pipeline, check to see if you've configured your functions correctly.
  2. Wait a few seconds after activating your pipeline, and then send the sample data to your activated pipeline by running the following command in the Splunk Cloud Services CLI.
    cat examples/demodata.txt | while read line; do echo $line | ./scloud ingest post-events --host Buttercup --source syslog --sourcetype Unknown; done
    
  3. To confirm that DSP is sending your transformed data to your Splunk Enterprise instance, open the Search & Reporting app in your Splunk Enterprise instance and search for your data. Use the following search criteria:

    index="main" host="Buttercup" | table *

You've now successfully sent transformed data to Splunk Enterprise through DSP!

Additional resources

For detailed information about the DSP features and workflows that you used in this tutorial, see the following DSP documentation pages:

Last modified on 29 July, 2022
 

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters