
Getting Started
The Data Stream Processor (DSP) is the primary interface for ingesting data into Splunk software. If you are new to the Data Stream Processor, use this tutorial to get familiar with the capabilities of the Data Stream Processor.
In this tutorial, you will use the Splunk Cloud CLI to ingest your data with the Ingest service, send it to the Data Stream Processor for transforming, and then send it to your pre-configured Splunk Enterprise index.
Before you begin
You'll need the following to complete this tutorial.
- A licensed instance of the Splunk Data Stream Processor.
- The SCloud tool, preconfigured to send data to an on-premises DSP instance.
- The tutorial data
- A default Splunk Enterprise environment set by your Splunk Administrator.
What's in the tutorial data
This tutorial data contains purchase history for the fictitious Buttercup Games store.
The tutorial data looks like this:
A5D125F5550BE7822FC6EE156E37733A,08DB3F9FCF01530D6F7E70EB88C3AE5B,Credit Card,14,2018-01-13 04:37:00,2018-01-13 04:47:00,-73.966843,40.756741,4539385381557252 1E65B7E2D1297CF3B2CA87888C05FE43,F9ABCCCC4483152C248634ADE2435CF0,Game Card,16.5,2018-01-13 04:26:00,2018-01-13 04:46:00,-73.956451,40.771442
Create a Data Stream Processor pipeline using the UI
First, create a Data Stream Processor pipeline using the DSP UI that sends data to a preconfigured Splunk Enterprise instance.
- Click Build Pipeline, then select the Splunk Firehose to Splunk Index template.
- Select Standard Editor.
- Click on the ellipsis (...) in the top-right corner, and select Update Pipeline Metadata.
- Give your pipeline a name and a description and then click Update.
- Validate your pipeline.
- Save your pipeline.
You now have a basic pipeline that reads all data and sends your data to the preconfigured Splunk Enterprise instance's main index.
Send data with the Ingest Service using Splunk Cloud CLI
Now that you have a pipeline to send data to, let's send some data to it!
- In the pipeline that you just saved in the previous step, click Start Preview to begin a preview session.
- Open a command prompt and navigate to a working directory.
- Log in to the Splunk Cloud CLI with
./scloud login
. The password is the same one that you use to log in to the Data Stream Processor:sudo ./print-login
to re-print your username and password.- Your access token and other metadata is returned. Your access token does expire, so you may need to log in periodically to refresh it.
Password: { "token_type": "Bearer", "access_token": "eyJraWQiOiJuRGNXNi1WWVJUZWh0QXdZbExwRTBZWm1wTlltMWo2a3JBeXlMSVpZT0pVIiwiYWxnIjoiUlMyNTYifQ.eyJ2ZXIiOjEsImp0aSI6IkFULkU0aXI5a1RuRmtsaGVjc1lBcHZzeHNzRmJvaVVOU0dPaU8xRGFVZldSOXcuaFJkNGFKd3RobWV5MXo5LzBuMGUxTG5SanBXZGdSd0I2OHhmMytqQVpFYz0iLCJpc3MiOiJodHRwczovL3NwbHVuay1jaWFtLm9rdGEuY29tL29hdXRoMi9hdXMxcmFyajZ0UVBKZkpsejJwNyIsImF1ZCI6ImFwaTovL3NjcC1kZWZhdWx0IiwiaWF0IjoxNTQ5MDY5MDY5LCJleHAiOjE1NDkxMTIyNjksImNpZCI6IjBvYTIzNDliMTVWYk1waFFvMnA3IiwidWlkIjoiMDB1MWluY25qb1hvWGExanMycDciLCJzY3AiOlsib2ZmbGluZV9hY2Nlc3MiLCJvcGVuaWQiLCJlbWFpbCIsInByb2ZpbGUiXSwic3ViIjoiYXBydW5lZGFAc3BsdW5rLmNvbSJ9.N4-ZTM_fhh0BLMh4EQs2UkuEub7OImZlYpPXMDv0E9PauYyE3eDSPmWa9eSeHEyCfI1RMb4RPhYvs5i7QMFHEgdUjegyP2qybFu3MNjSVuA6sTZNIjejyvgFTHD_Ifr9_o0ttCcp3kU5y664xlJUzxlqZDuBugXuErZaZ49r-y1AIipORvHR9VTdsIUSEVIyuD8FdMelVgXhz0zfW3leHq0QzavbUj5FOO8OPr0-rVX7Rur7YcGBTq2QQgJPHNLRjrN8lNpJMVGWRcTHgR4yihVH8SNEBErkeUyJdmg28EkoXeyp6lncpfjSADCghJet4Iu3vUgsMgqJeTCHQJIJZA", "expires_in": 43199, "scope": "openid", "id_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJSZWRhY3RlZCIsIm5hbWUiOiJSZWRhY3RlZCIsImlhdCI6MTUxNjIzOTAyMn0.IXPMwzXRZ1JSHwuS1CR5l7Z0JMWvQ6Dj0xe8Z6ZFxHs", "StartTime": 1571796311 }
- Send the sample data to your data pipeline by running the following command.
cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
- Navigate back to your saved pipeline and click the Preview Results tab. View that your events are flowing through your pipeline.
Now you have a basic pipeline that receives data from the Ingest Service and sends it to your default Splunk Enterprise instance. In the following sections, we'll apply transformations on your pipeline.
Transform your data
Now that you've verified that your pipeline is ingesting your data successfully, let's add transformations to your data before sending it to a Splunk Index.
In this section, you re-organize your top-level fields, extract interesting nested fields into top-level fields, and redact credit card information from the data before sending it off to a Splunk Index for indexing.
- Click the + icon between the Read from Splunk Firehose and Write to Index functions, and select Eval from the function picker.
- Enter the following Streams DSL in the text box. This Streams DSL updates the order of your top-level fields, and changes your sourcetype from Unknown to purchases.
as(get("body"), "body"); as(get("host"), "host"); as(get("id"), "id"); as(get("kind"), "kind"); as(get("nanos"), "nanos"); as(get("source"), "source"); as(get("timestamp"), "timestamp"); as(literal("purchases"), "source_type");
- (Optional) Validate your pipeline to check that all functions in your pipeline are configured properly.
- (Optional) To see how this function transforms your data, send your sample data to your pipeline once again. Click Start Preview to restart your preview session, wait a few seconds, and then re-enter the Splunk Cloud CLI command. Because your pipeline isn't activated, re-sending your sample data will not result in data duplication.
cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
- If you preview the data, you'll notice there are several interesting fields in the
body
field including the type of card used, the purchase amount, and the sale date, amongst others. The following DSL extracts some of those nested fields and populates theattributes
field with them. Enter the following Streams DSL in the same Eval function as step 2. This DSL takes the content ofbody
and uses theextract-regex
function to extract key-value pairs from the body using regular expressions. Becauseextract-regex
outputs data into maps, you also need to use acast
scalar function to cast a string into a map.as(extract_regex(cast(get("body"), "string"), /(?<tid>[A-Z0-9]+?),(?<cid>[A-Z0-9]+?),(?<Type>[\w]+\s\w+),(?<Amount>[\S]+),(?<sdate>[\S]+)\s(?<stime>[\S]+),(?<edate>[\S]+)\s(?<etime>[\S]+?),(?<Longitude>[\S]+?),(?<Latitude>[\S]+?),(?<Card>[\d]*)/), "attributes");
- Now that you've extracted some of your nested fields into
attributes
, take it one step further and promote theseattributes
as top-level fields in your data. Click the + icon between the Eval and Write to Index functions and add a new Eval function. - Enter the following Streams DSL in the new Eval function. This Streams DSL turns the key-value pairs in the
attributes
field into top-level fields so that you can easily see the fields that you've extracted.as(map-get(get("attributes"), "tid"), "Transaction_ID"); as(map-get(get("attributes"), "cid"), "Customer_ID"); as(map-get(get("attributes"), "Type"), "Type"); as(map-get(get("attributes"), "Amount"), "Amount"); as(map-get(get("attributes"), "sdate"), "Start_Date"); as(map-get(get("attributes"), "stime"), "Start_Time"); as(map-get(get("attributes"), "edate"), "End_Date"); as(map-get(get("attributes"), "etime"), "End_Time"); as(map-get(get("attributes"), "Longitude"), "Longitude"); as(map-get(get("attributes"), "Latitude"), "Latitude"); as(map-get(get("attributes"), "Card"), "Credit_Card");
- Notice that your data contains the credit card number used to make a purchase. You can redact that information before sending it to your index. Add a new Eval function, and type the following Streams DSL to redact any credit card information from your data. This Streams DSL uses a regular expression pattern to detect credit card numbers and replaces any numbers found with
<redacted>
.as(replace(cast(get("Credit_Card"), "string"),/\b\d{15,16}\b/, "<redacted>"), "Credit_Card");
- Because the original body of your event had sensitive credit card information, remove the body field so that the original event does not get indexed. Click on the + icon between the Eval and Write to Index functions and add a new Normalize function.
- Find the body field under the Mapping columns, and click Delete to remove this field from your output. This removes the body field from your data before it gets indexed so you don't accidentally index sensitive credit card information.
- Now that you've constructed a full pipeline, preview your data again to see what your data looks like at this point. Click Start Preview to restart your preview session, wait a few seconds, and then re-enter the command. Because your pipeline isn't activated yet, re-sending your sample data will not result in data duplication.
cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
- Click on the ellipsis in the right-hand corner, and then Save As.
- Give your pipeline a name, a description, and save your pipeline as a pipeline.
- Activate your pipeline. Do not check Skip Restore State or Allow Non Restored State. Neither of these options are valid when you activate your pipeline for the first time. Activating your pipeline also checks to make sure that your pipeline is valid. If you are unable to activate your pipeline, check to see if you've configured your functions correctly.
- Wait a few seconds after activating your pipeline, and then send your Buttercup data to your activated pipeline.
cat demodata.txt | while read line; do echo $line | ./scloud ingest post-events -host Buttercup -source syslog -sourcetype Unknown; done
- Open your Splunk Enterprise instance and search for your data.
index="main"
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0
Feedback submitted, thanks!