Splunk® Data Stream Processor

Use the Data Stream Processor

DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Extracting fields in events data

You can extract fields from the records that are streaming through the , allowing you to capture and track information that is important to your needs. The does not automatically extract fields from your data. Instead, by default, records have the following fields in their schema: the fields associated with the event schema, the metrics schema, or with the custom schema specific to the selected data source.

In order to further enhance your data before you send your data to a destination, the comes with an arsenal of functions that you can leverage to find and extract fields in your data. We'll talk about how to use the following functions to perform field extractions.

  • Apply Timestamp Extraction extracts a timestamp from the body field using a specified extraction type.
  • Spath extracts information from maps or collections.
  • Parse with regex performs field extractions using named capturing groups in Java regular expressions. Parse with regex also creates a top-level field in your data where the value of your field extraction is placed.
  • Extract regex also performs field extractions using named capturing groups in Java regular expressions, but outputs your fields as a map.

Guidelines for extracting data

The exact method that you use to extract fields depends on what your data looks like and what data source you are ingesting data from. The following are some general guidelines on extracting fields:

  • Before you create custom field extractions, get to know your data. Ensuring that you are familiar with the formats and patterns present in your data makes it easier to create a field extraction that accurately captures field values from it.
  • If your pipeline is streaming records that use the standard event or metric event schemas, check the body and attributes fields for data that needs to be extracted. The body field is a union-typed field, and the attributes field is a map of string type to union type.
  • For guidance on how to extract data from nested fields, see Working with nested data.

Extract timestamps in your data

The has several different methods to extract timestamps from the body field of your data. The most straight-forward way is to use the Apply Timestamp Extraction function, which extracts a timestamp from your record's body using a user-specified extraction type. See the Apply Timestamp Extraction in the Function Reference for more information.

Extract fields from maps

If your data is formatted as a collection (list) or as a map of key-value pairs, you can use the spath scalar function to extract fields from your data. The spath function has the additional benefit of returning type any making its output easy to work with in downstream functions. For an example of how to use the spath function, see the example in promote a nested field to a top-level field.

Extract fields to create top-level fields in your data

To extract fields from your data, use the Parse with regex function to extract a field with a Java regular expression and add that field as a top-level field in your data. In this example, we'll use the Parse with regex function to extract a game card number from body.

Assume that your incoming records contain the following in the body, where the last string of numbers 45393853815572 is a game card number.

18C4DF96F5A69E35952134948DB94424,98B4686144A13EE8378510888F22D782,Game Card,12.5,2018-01-13 09:15:00,2018-01-13 09:29:00,-73.986061,40.727932, 45393853815572
  1. From the UI, click on Build Pipeline and select the Splunk DSP Firehose source function.
  2. Prepare the body field so that it can be parsed and values can be extracted from it.
    1. Click the + icon, and add an Eval function to the pipeline.
    2. Enter the following expression in the function field to cast body to a string:
      body=cast(body, "string")
  3. Extract the game card value from the body field.
    1. Click the + icon, and add the Parse with regex function to the pipeline.
    2. In the Parse with Regex function, complete the following fields:
      Field Description Example
      Field The field that you want to extract information from. body
      Pattern The Java regular expression that defines the information to match and extract from the specified field. You can include named capturing groups, as shown in the example. (?<gamecard>\b\d{14}\b)
      offset_field The name of the new top-level field that you'd like to create. gamecard_number
  4. (Optional) Verify that the game card number is being extracted.
    1. Click Start Preview and select the Parse with regex function.
    2. Log in to SCloud.
      ./scloud login

      SCloud doesn't return your login metadata or access token. If you want to see your access token you must log in to SCloud using the verbose flag: ./scloud login --verbose.

    3. Send a sample record to your pipeline to verify that the game card number is being extracted.
      ./scloud ingest post-events <<< "18C4DF96F5A69E35952134948DB94424,98B4686144A13EE8378510888F22D782,Game Card,12.5,2018-01-13 09:15:00,2018-01-13 09:29:00,-73.986061,40.727932, 45393853815572"

Extract fields as maps

In this example, we'll use the extract regex scalar function to extract ASA numbers from an incoming stream of Cisco ASA firewall log data. In Cisco ASA data, the string of numbers that follows %ASA-#- have specific meanings. You can find their definitions in the Cisco documentation. When you have unique event identifiers like these in your data, specify them as required text in your field extraction.

Assume that you have an incoming stream of Cisco ASA firewall log data that looks like this:

Record 1 Jul 15 20:10:27 10.11.36.31 %ASA-6-113003: AAA group policy for user AmorAubrey is being set to Acme_techoutbound
Record 2 Jul 15 20:12:42 10.11.36.11 %ASA-7-710006: IGMP request discarded from 10.11.36.36 to outside:87.194.216.51
Record 3 Jul 15 20:13:52 10.11.36.28 %ASA-6-302014: Teardown TCP connection 517934 for Outside:128.241.220.82/1561 to Inside:10.123.124.28/8443 duration 0:05:02 bytes 297 Tunnel has been torn down (AMOSORTILEGIO)
Record 4 Apr 19 11:24:32 PROD-MFS-002 %ASA-4-106103: access-list fmVPN-1300 denied udp for user 'sdewilde7' outside/12.130.60.4(137) -> inside1/10.157.200.154(137) hit-cnt 1 first hit [0x286364c7, 0x0] "

The following example shows how you would extract these ASA numbers from the body field using the extract regex scalar function.

  1. From the UI, click on Build Pipeline and select the Splunk DSP Firehose source function.
  2. Extract the ASA number from body.
    1. Click the + icon, and add the Eval function to the pipeline.
    2. Enter the following expression in the function field to extract the ASA number into a new top-level field called ASA.
      ASA=extract_regex(cast(body, "string"), /(?<ASA>ASA-\d-\d{6})/i)
  3. (Optional) Verify that the ASA number is being extracted.
    1. Click Start Preview and select the Eval function.
    2. Log in to SCloud.
      ./scloud login

      SCloud doesn't return your login metadata or access token. If you want to see your access token you must log in to SCloud using the verbose flag: ./scloud login --verbose.

    3. Send a sample record to your pipeline to verify that the ASA number is being extracted. Notice that the extract_regex outputs a map. For the following sample, the extracted value looks like: {"ASA":"ASA-6-113003"}.
      ./scloud ingest post-events <<< "Jul 15 20:10:27 10.11.36.31 %ASA-6-113003: AAA group policy for user AmorAubrey is being set to Acme_techoutbound"
      

See also

Functions
Apply Timestamp Extraction
Spath
Parse with regex
Extract regex
Eval
Cast
Related topics
Working with nested data.
Last modified on 11 March, 2022
Adding and updating fields in the   Working with nested data

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters