Splunk® Data Stream Processor

Tutorial

Download manual as PDF

Download topic as PDF

Getting Started

The Data Stream Processor (DSP) is the primary interface for ingesting data into Splunk software. If you are new to the Data Stream Processor, use this tutorial to get familiar with the capabilities of the Data Stream Processor.

In this tutorial, you will use the Splunk Cloud CLI to ingest your data with the Ingest service, send it to the Data Stream Processor for transforming, and then send it to your pre-configured Splunk Enterprise index.

Before you begin

You'll need the following to complete this tutorial.

What's in the tutorial data

This tutorial data contains purchase history for the fictitious Buttercup Games store.

The tutorial data looks like this:

A5D125F5550BE7822FC6EE156E37733A,08DB3F9FCF01530D6F7E70EB88C3AE5B,Credit Card,14,2018-01-13 04:37:00,2018-01-13 04:47:00,-73.966843,40.756741,4539385381557252
1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0,Game Card,16.5,2018-01-13 04:26:00,2018-01-13 04:46:00,-73.956451,40.771442

Create a Data Stream Processor pipeline using the UI

First, create a Data Stream Processor pipeline using the DSP UI that sends data to a preconfigured Splunk Enterprise instance.

  1. Click Build Pipeline, then select the Splunk Firehose to Splunk Index template.
  2. Select Standard Editor.
  3. Click on the ellipsis (...) in the top-right corner, and select Update Pipeline Metadata.
  4. Give your pipeline a name and a description and then click Update.
  5. Validate your pipeline.
  6. Save your pipeline.

You now have a basic pipeline that reads all data and sends your data to the preconfigured Splunk Enterprise instance's main index.

Send data with the Ingest Service using Splunk Cloud CLI

Now that you have a pipeline to send data to, let's send some data to it!

  1. In the pipeline that you just saved in the previous step, click Start Preview to begin a preview session.
  2. Open a command prompt and navigate to a working directory.
  3. Log in to the Splunk Cloud CLI with ./scloud login. The password is the same one that you use to log in to the Data Stream Processor: sudo ./print-login to re-print your username and password.
    Your access token and other metadata is returned. Your access token does expire, so you may need to log in periodically to refresh it.
        Password:
        {
            "token_type": "Bearer",
            "access_token": "eyJraWQiOiJuRGNXNi1WWVJUZWh0QXdZbExwRTBZWm1wTlltMWo2a3JBeXlMSVpZT0pVIiwiYWxnIjoiUlMyNTYifQ.eyJ2ZXIiOjEsImp0aSI6IkFULkU0aXI5a1RuRmtsaGVjc1lBcHZzeHNzRmJvaVVOU0dPaU8xRGFVZldSOXcuaFJkNGFKd3RobWV5MXo5LzBuMGUxTG5SanBXZGdSd0I2OHhmMytqQVpFYz0iLCJpc3MiOiJodHRwczovL3NwbHVuay1jaWFtLm9rdGEuY29tL29hdXRoMi9hdXMxcmFyajZ0UVBKZkpsejJwNyIsImF1ZCI6ImFwaTovL3NjcC1kZWZhdWx0IiwiaWF0IjoxNTQ5MDY5MDY5LCJleHAiOjE1NDkxMTIyNjksImNpZCI6IjBvYTIzNDliMTVWYk1waFFvMnA3IiwidWlkIjoiMDB1MWluY25qb1hvWGExanMycDciLCJzY3AiOlsib2ZmbGluZV9hY2Nlc3MiLCJvcGVuaWQiLCJlbWFpbCIsInByb2ZpbGUiXSwic3ViIjoiYXBydW5lZGFAc3BsdW5rLmNvbSJ9.N4-ZTM_fhh0BLMh4EQs2UkuEub7OImZlYpPXMDv0E9PauYyE3eDSPmWa9eSeHEyCfI1RMb4RPhYvs5i7QMFHEgdUjegyP2qybFu3MNjSVuA6sTZNIjejyvgFTHD_Ifr9_o0ttCcp3kU5y664xlJUzxlqZDuBugXuErZaZ49r-y1AIipORvHR9VTdsIUSEVIyuD8FdMelVgXhz0zfW3leHq0QzavbUj5FOO8OPr0-rVX7Rur7YcGBTq2QQgJPHNLRjrN8lNpJMVGWRcTHgR4yihVH8SNEBErkeUyJdmg28EkoXeyp6lncpfjSADCghJet4Iu3vUgsMgqJeTCHQJIJZA",
            "expires_in": 43199,
            "scope": "openid",
            "id_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJSZWRhY3RlZCIsIm5hbWUiOiJSZWRhY3RlZCIsImlhdCI6MTUxNjIzOTAyMn0.IXPMwzXRZ1JSHwuS1CR5l7Z0JMWvQ6Dj0xe8Z6ZFxHs",
            "StartTime": 1571796311
        }
    
  4. Send the sample data to your data pipeline by running the following command.
    cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
    
  5. Navigate back to your saved pipeline and click the Preview Results tab. View that your events are flowing through your pipeline.

Now you have a basic pipeline that receives data from the Ingest Service and sends it to your default Splunk Enterprise instance. In the following sections, we'll apply transformations on your pipeline.

Transform your data

Now that you've verified that your pipeline is ingesting your data successfully, let's add transformations to your data before sending it to a Splunk Index.

In this section, you re-organize your top-level fields, extract interesting nested fields into top-level fields, and redact credit card information from the data before sending it off to a Splunk Index for indexing.

  1. Click the + icon between the Read from Splunk Firehose and Write to Index functions, and select Eval from the function picker.
  2. Enter the following Streams DSL in the text box. This Streams DSL updates the order of your top-level fields, and changes your sourcetype from Unknown to purchases.
    as(get("body"), "body");
    as(get("host"), "host");
    as(get("id"), "id");
    as(get("kind"), "kind");
    as(get("nanos"), "nanos");
    as(get("source"), "source");
    as(get("timestamp"), "timestamp");
    as(literal("purchases"), "source_type");
    
  3. (Optional) Validate your pipeline to check that all functions in your pipeline are configured properly.
  4. (Optional) To see how this function transforms your data, send your sample data to your pipeline once again. Click Start Preview to restart your preview session, wait a few seconds, and then re-enter the Splunk Cloud CLI command. Because your pipeline isn't activated, re-sending your sample data will not result in data duplication.
    cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
    
  5. If you preview the data, you'll notice there are several interesting fields in the body field including the type of card used, the purchase amount, and the sale date, amongst others. The following DSL extracts some of those nested fields and populates the attributes field with them. Enter the following Streams DSL in the same Eval function as step 2. This DSL takes the content of body and uses the extract-regex function to extract key-value pairs from the body using regular expressions. Because extract-regex outputs data into maps, you also need to use a cast scalar function to cast a string into a map.
    as(extract_regex(cast(get("body"), "string"), /(?<tid>[A-Z0-9]+?),(?<cid>[A-Z0-9]+?),(?<Type>[\w]+\s\w+),(?<Amount>[\S]+),(?<sdate>[\S]+)\s(?<stime>[\S]+),(?<edate>[\S]+)\s(?<etime>[\S]+?),(?<Longitude>[\S]+?),(?<Latitude>[\S]+?),(?<Card>[\d]*)/), "attributes");
    
  6. Now that you've extracted some of your nested fields into attributes, take it one step further and promote these attributes as top-level fields in your data. Click the + icon between the Eval and Write to Index functions and add a new Eval function.
  7. Enter the following Streams DSL in the new Eval function. This Streams DSL turns the key-value pairs in the attributes field into top-level fields so that you can easily see the fields that you've extracted.
    as(map-get(get("attributes"), "tid"), "Transaction_ID");
    as(map-get(get("attributes"), "cid"), "Customer_ID");
    as(map-get(get("attributes"), "Type"), "Type");
    as(map-get(get("attributes"), "Amount"), "Amount");
    as(map-get(get("attributes"), "sdate"), "Start_Date");
    as(map-get(get("attributes"), "stime"), "Start_Time");
    as(map-get(get("attributes"), "edate"), "End_Date");
    as(map-get(get("attributes"), "etime"), "End_Time");
    as(map-get(get("attributes"), "Longitude"), "Longitude");
    as(map-get(get("attributes"), "Latitude"), "Latitude");
    as(map-get(get("attributes"), "Card"), "Credit_Card");
    
  8. Notice that your data contains the credit card number used to make a purchase. You can redact that information before sending it to your index. Add a new Eval function, and type the following Streams DSL to redact any credit card information from your data. This Streams DSL uses a regular expression pattern to detect credit card numbers and replaces any numbers found with <redacted>.
    as(replace(cast(get("Credit_Card"), "string"),/\b\d{15,16}\b/, "<redacted>"), "Credit_Card");
    
  9. Because the original body of your event had sensitive credit card information, remove the body field so that the original event does not get indexed. Click on the + icon between the Eval and Write to Index functions and add a new Normalize function.
  10. Find the body field under the Mapping columns, and click Delete to remove this field from your output. This removes the body field from your data before it gets indexed so you don't accidentally index sensitive credit card information.
  11. Now that you've constructed a full pipeline, preview your data again to see what your data looks like at this point. Click Start Preview to restart your preview session, wait a few seconds, and then re-enter the command. Because your pipeline isn't activated yet, re-sending your sample data will not result in data duplication.
    cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
    
  12. Click on the ellipsis in the right-hand corner, and then Save As.
  13. Give your pipeline a name, a description, and save your pipeline as a pipeline.
  14. Activate your pipeline. Do not check Skip Restore State or Allow Non Restored State. Neither of these options are valid when you activate your pipeline for the first time. Activating your pipeline also checks to make sure that your pipeline is valid. If you are unable to activate your pipeline, check to see if you've configured your functions correctly.
  15. Wait a few seconds after activating your pipeline, and then send your Buttercup data to your activated pipeline.
    cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
    
  16. Open your Splunk Enterprise instance and search for your data.

    index="main"

 

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters