Extracting fields in events data
You can extract fields from the records that are streaming through the , allowing you to capture and track information that is important to your needs. The does not automatically extract fields from your data. Instead, by default, records have the following fields in their schema: the fields associated with the event schema, the metrics schema, or with the custom schema specific to the selected data source.
In order to further enhance your data before you send your data to a destination, the comes with an arsenal of functions that you can leverage to find and extract fields in your data. We'll talk about how to use the following functions to perform field extractions.
- Apply Timestamp Extraction extracts a timestamp from the
body
field using a specified extraction type. - Spath extracts information from maps or collections.
- Parse with regex performs field extractions using named capturing groups in Java regular expressions. Parse with regex also creates a top-level field in your data where the value of your field extraction is placed.
- Extract regex also performs field extractions using named capturing groups in Java regular expressions, but outputs your fields as a map.
Guidelines for extracting data
The exact method that you use to extract fields depends on what your data looks like and what data source you are ingesting data from. The following are some general guidelines on extracting fields:
- Before you create custom field extractions, get to know your data. Ensuring that you are familiar with the formats and patterns present in your data makes it easier to create a field extraction that accurately captures field values from it.
- If your pipeline is streaming records that use the standard event or metric event schemas, check the
body
andattributes
fields for data that needs to be extracted. Thebody
field is a union-typed field, and theattributes
field is a map of string type to union type. - For guidance on how to extract data from nested fields, see Working with nested data.
Extract timestamps in your data
The has several different methods to extract timestamps from the body
field of your data. The most straight-forward way is to use the Apply Timestamp Extraction function, which extracts a timestamp from your record's body
using a user-specified extraction type. See the Apply Timestamp Extraction in the Function Reference for more information.
Extract fields from maps
If your data is formatted as a collection (list) or as a map of key-value pairs, you can use the spath scalar function to extract fields from your data. The spath function has the additional benefit of returning type any
making its output easy to work with in downstream functions. For an example of how to use the spath function, see the example in promote a nested field to a top-level field.
Extract fields to create top-level fields in your data
To extract fields from your data, use the Parse with regex function to extract a field with a Java regular expression and add that field as a top-level field in your data. In this example, we'll use the Parse with regex function to extract a game card number from body
.
Assume that your incoming records contain the following in the body
, where the last string of numbers 45393853815572
is a game card number.
18C4DF96F5A69E35952134948DB94424,98B4686144A13EE8378510888F22D782,Game Card,12.5,2018-01-13 09:15:00,2018-01-13 09:29:00,-73.986061,40.727932, 45393853815572
- From the UI, click on Build Pipeline and select the Splunk DSP Firehose source function.
- Prepare the
body
field so that it can be parsed and values can be extracted from it.- Click the + icon, and add an Eval function to the pipeline.
- Enter the following expression in the function field to cast
body
to a string:body=cast(body, "string")
- Extract the game card value from the
body
field.- Click the + icon, and add the Parse with regex function to the pipeline.
- In the Parse with Regex function, complete the following fields:
Field Description Example Field The field that you want to extract information from. body Pattern The Java regular expression that defines the information to match and extract from the specified field. You can include named capturing groups, as shown in the example. (?<gamecard>\b\d{14}\b) offset_field The name of the new top-level field that you'd like to create. gamecard_number
- (Optional) Verify that the game card number is being extracted.
- Click Start Preview and select the Parse with regex function.
- Log in to SCloud.
./scloud login
SCloud doesn't return your login metadata or access token. If you want to see your access token you must log in to SCloud using the verbose flag:
./scloud login --verbose
. - Send a sample record to your pipeline to verify that the game card number is being extracted.
./scloud ingest post-events <<< "18C4DF96F5A69E35952134948DB94424,98B4686144A13EE8378510888F22D782,Game Card,12.5,2018-01-13 09:15:00,2018-01-13 09:29:00,-73.986061,40.727932, 45393853815572"
Extract fields as maps
In this example, we'll use the extract regex scalar function to extract ASA numbers from an incoming stream of Cisco ASA firewall log data. In Cisco ASA data, the string of numbers that follows %ASA-#-
have specific meanings. You can find their definitions in the Cisco documentation. When you have unique event identifiers like these in your data, specify them as required text in your field extraction.
Assume that you have an incoming stream of Cisco ASA firewall log data that looks like this:
Record 1 | Jul 15 20:10:27 10.11.36.31 %ASA-6-113003: AAA group policy for user AmorAubrey is being set to Acme_techoutbound
|
Record 2 | Jul 15 20:12:42 10.11.36.11 %ASA-7-710006: IGMP request discarded from 10.11.36.36 to outside:87.194.216.51
|
Record 3 | Jul 15 20:13:52 10.11.36.28 %ASA-6-302014: Teardown TCP connection 517934 for Outside:128.241.220.82/1561 to Inside:10.123.124.28/8443 duration 0:05:02 bytes 297 Tunnel has been torn down (AMOSORTILEGIO)
|
Record 4 | Apr 19 11:24:32 PROD-MFS-002 %ASA-4-106103: access-list fmVPN-1300 denied udp for user 'sdewilde7' outside/12.130.60.4(137) -> inside1/10.157.200.154(137) hit-cnt 1 first hit [0x286364c7, 0x0] "
|
The following example shows how you would extract these ASA numbers from the body
field using the extract regex scalar function.
- From the UI, click on Build Pipeline and select the Splunk DSP Firehose source function.
- Extract the ASA number from
body
.- Click the + icon, and add the Eval function to the pipeline.
- Enter the following expression in the function field to extract the ASA number into a new top-level field called
ASA
.ASA=extract_regex(cast(body, "string"), /(?<ASA>ASA-\d-\d{6})/i)
- (Optional) Verify that the ASA number is being extracted.
- Click Start Preview and select the Eval function.
- Log in to SCloud.
./scloud login
SCloud doesn't return your login metadata or access token. If you want to see your access token you must log in to SCloud using the verbose flag:
./scloud login --verbose
. - Send a sample record to your pipeline to verify that the ASA number is being extracted. Notice that the extract_regex outputs a map. For the following sample, the extracted value looks like:
{"ASA":"ASA-6-113003"}
../scloud ingest post-events <<< "Jul 15 20:10:27 10.11.36.31 %ASA-6-113003: AAA group policy for user AmorAubrey is being set to Acme_techoutbound"
See also
- Related topics
- Working with nested data.
Adding and updating fields in the | Working with nested data |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!