Splunk® Enterprise

Knowledge Manager Manual

Configure advanced extractions with field transforms

A transform extraction is made up of two components: a field transform configuration in transforms.conf and a REPORT-<class> field extraction configuration in props.conf. You can find transforms.conf and props.conf in $SPLUNK_HOME/etc/system/local. This section shows you how to configure field transforms in transforms.conf. For configuring a field transform in Splunk Web, see manage field transforms.

In transform extractions, the regular expression is in transforms.conf and the field extraction is in props.conf. You can apply one regular expression to multiple field extraction configurations, or have multiple regular expressions for one field extraction configuration. See configure custom fields at search time.

Field transforms contain a field-extracting regular expression and other settings that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf.

Transform extractions and the search-time operations sequence

Search-time operation order

Extraction transforms are third in the search-time operations sequence and are processed after inline field extractions.

See Extracting a field that was already extracted during inline field extraction.

Restrictions

Splunk software processes all inline field extractions that belong to a specific host, source, or source type in ASCII sort order according to their <class> value. You cannot reference a field extracted by EXTRACT-aaa in the field extraction definition for EXTRACT-ZZZ, but you can reference a field extracted by EXTRACT-aaa in the field extraction definition for EXTRACT-ddd.

For more information

See The sequence of search-time operations.

Configure a transform extraction

Transform extractions use the REPORT extraction configuration in props.conf. Each REPORT extraction stanza references a field transform that is defined in transforms.conf. The field transform contains the regular expression that Splunk Enterprise uses to extract fields at search time, and other settings that govern the way that the transform extracts those fields.

Caution: Do not edit files in $SPLUNK_HOME/etc/system/default/. An upgrade or migration will overwrite your configuration and cause Splunk software to break.

Prerequisites
Review the following topics.

Steps

  1. Identify the source type, source, or host that provides the events that your field is extracted from.
    Extraction configurations in props.conf are restricted to a specific source, source type, or host.
  2. Configure a regular expression that identifies the field in the event.
    If your event lists field/value pairs or field values, configure a delimiter-based field extraction that does not require a regular expression.
  3. Configure a field transform in transforms.conf that utilizes this regular expression or delimiter configuration.
    The transform can define a source key and event value formatting.
  4. Follow the format for the REPORT field extraction type to configure a field extraction stanza in props.conf that uses the host, source, or source type identified earlier.
  5. (Optional)You can configure additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
  6. Restart your Splunk deployment for your changes to take effect.

Field transform syntax

There are two ways to use transforms. One for regex-based field extractions and one for delimiter-based field extractions. Use the following format when you define a search-time field transform in transforms.conf:

[<unique_transform_stanza_name>]
REGEX = <regular expression>
FORMAT = <string>
MATCH_LIMIT = <integer>
DEPTH_LIMIT = <integer>
SOURCE_KEY = <string>
DELIMS = <quoted string list>
FIELDS = <quoted string list>
MV_ADD = [true|false]
CLEAN_KEYS = [true|false]
KEEP_EMPTY_VALS = [true|false]
CAN_OPTIMIZE = [true|false]

The <unique_transform_stanza_name> is required for all search-time transforms. <unique_transform_stanza_name> values are not required to follow field name syntax restrictions. See field name syntax. You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning.

Field transforms support the following settings. If a setting is not specified or included in the transforms.conf file, the default for that setting is applied. See the "Field transform syntax descriptions" section on this page for a description of each setting.

Setting Default Required or optional
REGEX Empty string Required unless you are setting up an ASCII-only delimiter-based field extraction. See DELIMS.
FORMAT Empty string Optional.
MATCH_LIMIT 100000 Optional.
DEPTH_LIMIT 1000 Optional.
SOURCE_KEY _raw Optional.
DELIMS Empty string Optional.
FIELDS Empty string Optional.
MV_ADD False Optional.
CLEAN_KEYS True Optional.
KEEP_EMPTY_VALS False Optional.
CAN_OPTIMIZE True Optional.

Field transform syntax descriptions

Click Expand to see additional information, such as details and configuration examples, about each setting.

REGEX

A regular expression that operates on your data to extract fields.

REGEX and the FORMAT field

Name-capturing groups in the REGEX are extracted directly to fields. You do not have to specify FORMAT for simple field extraction cases.

If the REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to avoid specifying the mapping in FORMAT: _KEY_<string>, _VAL_<string>. The <string> value should be the same for each key-value pair represented in the regular expression. For example, you could use _KEY_1 and _VAL_1 as the capturing groups for a field name and its corresponding value.

Example of REGEX and FORMAT

Using FORMAT Not using FORMAT
REGEX = ([a-z]+)=([a-z]+) is equivalent to FORMAT = $1::$2 REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

Example of using REGEX for DELIMS-like functionality
Use REGEX for a non-ASCII delimiter.

Invalid DELIMS Valid REGEX configuration
DELIMS = "¦|", "≈=" REGEX = ^([^¦|≈=]+)[≈=]([^¦|≈=]+)[¦|]([^¦|≈=]+)[≈=]([^¦|≈=]+)[¦|]([^¦|≈=]+)[≈=]([^¦|≈=]+)$

FORMAT = $1:$2 $3:$4 $5:$6

DELIMS = "¦|"

FIELDS = ace, bubbles, cupcake

REGEX = ^(?<ace>[^¦|]+)[¦|](?<bubbles>[^¦|]+)[¦|](?<cupcake>[^¦|]+)$

FORMAT

Use FORMAT to specify the format of the field/value pair(s) that you are extracting. You do not need to specify the FORMAT if you have a simple REGEX with name-capturing groups.


Configuration

For search-time extractions, the pattern for the FORMAT field is as follows:

FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*

where: field-name = [<string>|$<extracting-group-number>] field-value = [<string>|$<extracting-group-number>]

Restrictions

You cannot create concatenated fields with FORMAT at search time. This functionality is available only for index-time field transforms. To concatenate a set of regular expression extractions into a single field value, use the FORMAT setting as an index-time extraction. For example, if you have the string 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip address field value in the format 192.0.2.1. See Configure index-time field extractions in the Getting Data In manual. Do not make extensive changes to your set of indexed fields as it can negatively impact indexing performance and search times.

Example of search-time FORMAT usage

  1. FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value
  2. FORMAT = $1::$2

If you configure FORMAT with a variable field name, the regular expression is repeatedly applied to the source event text to match and extract all field/value pairs.


MATCH_LIMIT

Use MATCH_LIMIT to set an upper bound on how many times PCRE calls an internal function, match(). If set too low, PCRE may fail to correctly match a pattern.


Configuration

Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 100000.


DEPTH_LIMIT

Use DEPTH_LIMIT to limit the depth of nested backtracking in an internal PCRE function, match(). If set too low, PCRE might fail to correctly match a pattern.


Configuration

Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 1000.


SOURCE_KEY

Use SOURCE_KEY to extract values from another field. You can use any field that is available at the time of the execution of this field extraction.


Configuration

To configure SOURCE_KEY, identify the field to which the transform's REGEX is to be applied.


DELIMS

Use DELIMS in place of REGEX when dealing with ASCII-only delimiter-based field extractions, where field values or field/value pairs are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.

Configuration

Each ASCII character in the delimiter string is used as a delimiter to split the event. If the event contains full delimiter-separated field value pairs, you enter two sets of quoted delimiters for DELIMS. The first set of quoted delimiters separates the field value pairs. The second set of quoted delimiters separates the field name from its corresponding value.

If the events contain only delimiter-separated values (no field names), use one set of quoted delimiters to separate the values. Use the FIELDS setting to apply field names to the extracted values. Alternatively, Splunk software reads even tokens as field names and odd tokens as field values.

Restrictions

Delimiters must be specified within double quotes (DELIMS="|,;"). Special escape sequences are \t (tab), \n (newline), \r (carriage return), \\ (backslash) and \" (double quotes). If a value contains an embedded unescaped double quote character, such as "foo"bar", use REGEX, not DELIMS. Non-ASCII delimiters require the use of REGEX. See REGEX for examples of usage of DELIMS-like functionality.

Example

The following example of DELIMS usage applies to an event where field value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols.

[pipe_eq]
DELIMS = "|", "="


FIELDS

Use in conjunction with DELIMS when you perform delimiter-based field extraction, and you only have field values to extract. Use FIELDS to provide field names for the extracted field values in list format according to the order in which the values are extracted.

If field names contain spaces or commas, use " ". To escape, use \.

Example

Following is an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and a space.

[commalist]
DELIMS = ", "
FIELDS = field1, field2, field3


MV_ADD

Use MV_ADD for events that have multiple occurrences of the same field with different values, and you want to keep each value.

See Extracting a field that was already extracted during inline field extraction.

Configuration

When MV_ADD = true, Splunk software transforms fields that appear multiple times in an event with different values into multivalue fields. The field name appears once. The multiple values for the field follow the = sign.

When MV_ADD = false, Splunk software keeps the first value found for a field in an event, and discards every subsequent value found.


CLEAN_KEYS

Controls whether the system strips leading underscores and 0-9 characters from the field names it extracts. Key cleaning is the practice of replacing any non-alphanumeric characters in field names with underscores, as well as the removal of leading underscores and 0-9 characters from field names.

Configuration

Add CLEAN_KEYS = false to your transform to keep your field names intact with no removal of leading underscores or 0-9 characters.


KEEP_EMPTY_VALS

Controls whether Splunk software keeps field value pairs when the value is an empty string.

This option does not apply to field/value pairs that are generated by the Splunk software autoKV extraction (automatic field extraction) process. AutoKV ignores field/value pairs with empty values.


CAN_OPTIMIZE

Controls whether Splunk software can disable the extraction.

Use CAN_OPTIMIZE when you run searches under a search mode setting that disables field discovery to ensure that Splunk software discovers specific fields. Splunk software disables an extraction when none of the fields identified by the extraction are needed for the evaluation of a search.


Syntax for a transform-referencing field extraction configuration

To set up a search-time field extraction in props.conf that is associated with a field transform, use the REPORT field extraction class. Use the following format.

[<spec>]
REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
<spec> Description
<source type>
Source type of an event
host::<host>
Host for an event
source::<source>
Source for an event

You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>, separated by commas. See examples of transform extractions.

REPORT-<class> Description
<class>
A unique literal string that identifies the namespace of the field you are extracting. <class> values do not have to follow field name syntax restrictions and are not subject to key cleaning.
<unique_transform_stanza_name>
Name of your field transform stanza from transforms.conf.

Extracting a field that was already extracted during inline field extraction

As a result of the sequence of search-time operations, inline field extractions (also known as EXTRACT configurations) are processed before transform field extractions (also known as REPORT configurations). This order has implications for fields that are extracted during both of these operations. For example, say an inline field extraction extracts a field called userName. Then a subsequent transform field extraction extracts another field called userName that has a different value. Because the inline field extraction happens first in the sequence, by default, its version of the userName field is retained and the version of the field extracted by the transform field extraction is discarded. This happens because the MV_ADD setting is set to false by default, so the "old" value that is found for a field in an event is kept, and every subsequent "new" value that is found is discarded. In other words, the EXTRACT configuration "wins" over the REPORT configuration.

But, what if you want to keep the value for the field that is extracted second in line by the transform field extraction? You can set MV_ADD to true to prevent a field from being overwritten by another field that has already been extracted. When MV_ADD is true, fields that appear multiple times in an event with different values are transformed into multivalue fields. As a result, the field name, such as userName in our example, appears only once and both the "old" and "new" values are preserved in a multivalue field.

Last modified on 17 July, 2024
Configure inline extractions   Configure automatic key-value field extraction

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.11, 8.1.13, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2, 9.4.0, 8.1.10, 8.1.12, 8.1.14, 8.1.2


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters