Splunk® Enterprise

Knowledge Manager Manual

Download manual as PDF

Download topic as PDF

Configure advanced extractions with field transforms

A transform field extraction is made up of two components: a field transform configuration in transforms.conf and a REPORT-<class> field extraction configuration in props.conf. You can find transforms.conf and props.conf in $SPLUNK_HOME/etc/system/local. This section shows you how to configure field transforms in transforms.conf. For configuring a field transform in Splunk Web, see manage field transforms.

In transform extractions, the regular expression in transforms.conf and the field extraction is in props.conf. You can apply one regular expression to multiple field extraction configurations, or have multiple regular expressions for one field extraction configuration. See configure custom fields at search time.

Field transforms contain a field-extracting regular expression and other attributes that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf.

Transform extractions and the search-time operations sequence

Search-time operation order

Extraction transforms are second in the search-time operations sequence and are processed after inline field extractions.

Restrictions

Splunk software processes all inline field extractions that belong to a specific host, source, or source type in ASCII sort order according to their <class> value. You cannot reference a field extracted by EXTRACT-aaa in the field extraction definition for EXTRACT-ZZZ, but you can reference a field extracted by EXTRACT-aaa in the field extraction definition for EXTRACT-ddd.

For more information

See The sequence of search-time operations.

Configure a transform extraction

Transform extractions use the REPORT extraction configuration in props.conf. Each REPORT extraction stanza references a field transform that is defined in transforms.conf. The field transform contains the regular expression that Splunk Enterprise uses to extract fields at search time, and other attributes that govern the way that the transform extracts those fields.

Caution: Do not edit files in $SPLUNK_HOME/etc/system/default/. An upgrade or migration will overwrite your configuration and cause Splunk software to break.

Prerequisites
Review the following topics.

Steps

  1. Identify the source type, source, or host that provides the events that your field is extracted from.
    Extraction configurations in props.conf are restricted to a specific source, source type, or host.
  2. Configure a regular expression that identifies the field in the event.
    If your event lists field/value pairs or field values, configure a delimiter-based field extraction that does not require a regular expression.
  3. Configure a field transform in transforms.conf that utilizes this regular expression or delimiter configuration.
    The transform can define a source key and event value formatting.
  4. Follow the format for the REPORT field extraction type to configure a field extraction stanza in props.conf that uses the host, source, or source type identified earlier.
  5. (Optional)You can configure additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
  6. Restart your Splunk deployment for your changes to take effect.

Field transform syntax

There are two ways to use transforms. One for regex-based field extractions and one for delimiter-based field extractions. Use the following format when you define a search-time field transform in transforms.conf:

[<unique_transform_stanza_name>]
REGEX = <regular expression>
FORMAT = <string>
SOURCE_KEY = <string>
DELIMS = <quoted string list>
FIELDS = <quoted string list>
MV_ADD = [true|false]
CLEAN_KEYS = [true|false]
KEEP_EMPTY_VALS = [true|false]
CAN_OPTIMIZE = [true|false]

The <unique_transform_stanza_name> is required for all search-time transforms. <unique_transform_stanza_name> values are not required to follow field name syntax restrictions. See field name syntax. You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning.

Attribute Default Optional
REGEX Empty string Required unless you are setting up an ASCII-only delimiter-based field extraction. See DELIMS.
FORMAT Empty string Optional
SOURCE_KEY _raw Optional
DELIMS Empty string Optional
FIELDS Empty string Optional
MV_ADD False Optional
CLEAN_KEYS True Optional
KEEP_EMPTY_VALS False Optional
CAN_OPTIMIZE True Optional

Field transform syntax descriptions

Click Expand to see additional information, such as details and configuration examples, about each attribute.

REGEX

A regular expression that operates on your data to extract fields.

REGEX and the FORMAT field

Name-capturing groups in the REGEX are extracted directly to fields. You do not have to specify FORMAT for simple field extraction cases.

If the REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to avoid specifying the mapping in FORMAT: <_KEY_><string>, <_VAL_><string>.

Example of REGEX and FORMAT

Using FORMAT Not using FORMAT
REGEX = ([a-z]+)=([a-z]+) is equivalent to FORMAT = $1::$2 REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

Example of using REGEX for DELIMS-like functionality
Use REGEX for a non-ASCII delimiter.

Invalid DELIMS Valid REGEX configuration
DELIMS = "¦|", "≈=" REGEX = ^([^¦|≈=]+)[≈=]([^¦|≈=]+)[¦|]([^¦|≈=]+)[≈=]([^¦|≈=]+)[¦|]([^¦|≈=]+)[≈=]([^¦|≈=]+)$

FORMAT = $1:$2 $3:$4 $5:$6

DELIMS = "¦|"

FIELDS = ace, bubbles, cupcake

REGEX = ^(?<ace>[^¦|]+)[¦|](?<bubbles>[^¦|]+)[¦|](?<cupcake>[^¦|]+)$

FORMAT

Use FORMAT to specify the format of the field/value pair(s) that you are extracting. You do not need to specify the FORMAT if you have a simple REGEX with name-capturing groups.


Configuration

For search-time extractions, the pattern for the FORMAT field is as follows:

FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*

where: field-name = [<string>|$<extracting-group-number>] field-value = [<string>|$<extracting-group-number>]

Restrictions

You cannot create concatenated fields with FORMAT at search time. This functionality is available only for index-time field transforms. To concatenate a set of regular expression extractions into a single field value, use the FORMAT attribute as an index-time extraction. For example, if you have the string 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip address field value in the format 192.0.2.1. See Configure index-time field extractions in the Getting Data In manual. Do not make extensive changes to your set of indexed fields as it can negatively impact indexing performance and search times.

Example of search-time FORMAT usage

  1. FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value
  2. FORMAT = $1::$2

If you configure FORMAT with a variable field name, the regular expression is repeatedly applied to the source event text to match and extract all field/value pairs.


SOURCE_KEY

Use SOURCE_KEY to extract values from another field. You can use any field that is available at the time of the execution of this field extraction.


Configuration

To configure SOURCE_KEY, identify the field to which the transform's REGEX is to be applied.


DELIMS

Use DELIMS in place of REGEX when dealing with ASCII-only delimiter-based field extractions, where field values or field/value pairs are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.

Configuration

Each ASCII character in the delimiter string is used as a delimiter to split the event. If the event contains full delimiter-separated field value pairs, you enter two sets of quoted delimiters for DELIMS. The first set of quoted delimiters separates the field value pairs. The second set of quoted delimiters separates the field name from its corresponding value.

If the events contain only delimiter-separated values (no field names), use one set of quoted delimiters to separate the values. Use the FIELDS attribute to apply field names to the extracted values. Alternatively, Splunk software reads even tokens as field names and odd tokens as field values.

Restrictions

Delimiters must be specified within double quotes (DELIMS="|,;"). Special escape sequences are \t (tab), \n (newline), \r (carriage return), \\ (backslash) and \" (double quotes). If a value contains an embedded unescaped double quote character, such as "foo"bar", use REGEX, not DELIMS. Non-ASCII delimiters require the use of REGEX. See REGEX for examples of usage of DELIMS-like functionality.

Example

The following example of DELIMS usage applies to an event where field value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols.

[pipe_eq]
DELIMS = "|", "="


FIELDS

Use in conjunction with DELIMS when you perform delimiter-based field extraction, and you only have field values to extract. Use FIELDS to provide field names for the extracted field values in list format according to the order in which the values are extracted.

If field names contain spaces or commas, use " ". To escape, use \.

Example

Following is an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and a space.

[commalist]
DELIMS = ", "
FIELDS = field1, field2, field3


MV_ADD

Use MV_ADD for events that have multiple occurrences of the same field with different values, and you want to keep each value.

Configuration

When MV_ADD = true, Splunk software transforms fields that appear multiple times in an event with different values into multivalue fields. The field name appears once. The multiple values for the field follow the = sign.

When MV_ADD = false, Splunk software keeps the first value found for a field in an event, and discards every subsequent value found.


CLEAN_KEYS

Controls whether the system strips leading underscores and 0-9 characters from the field names it extracts. Key cleaning is the practice of replacing any non-alphanumeric characters in field names with underscores, as well as the removal of leading underscores and 0-9 characters from field names.

Configuration

Add CLEAN_KEYS = false to your transform to keep your field names intact with no removal of leading underscores or 0-9 characters.


KEEP_EMPTY_VALS

Controls whether Splunk software keeps field value pairs when the value is an empty string.

This option does not apply to field/value pairs that are generated by the Splunk software autoKV extraction (automatic field extraction) process. AutoKV ignores field/value pairs with empty values.


CAN_OPTIMIZE

Controls whether Splunk software can disable the extraction.

Use CAN_OPTIMIZE when you run searches under a search mode setting that disables field discovery to ensure that Splunk software discovers specific fields. Splunk software disables an extraction when none of the fields identified by the extraction are needed for the evaluation of a search.


Syntax for a transform-referencing field extraction configuration

To set up a search-time field extraction in props.conf that is associated with a field transform, use the REPORT field extraction class. Use the following format.

[<spec>]
REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
<spec> Description
<source type>
Source type of an event
host::<host>
Host for an event
source::<source>
Source for an event

You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>, separated by commas. See examples of transform extractions.

REPORT-<class> Description
<class>
A unique literal string that identifies the namespace of the field you are extracting. <class> values do not have to follow field name syntax restrictions and are not subject to key cleaning.
<unique_transform_stanza_name>
Name of your field transform stanza from transforms.conf.
PREVIOUS
Configure inline extractions
  NEXT
Automatic key-value field extractions at search-time

This documentation applies to the following versions of Splunk® Enterprise: 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters