Configure advanced extractions with field transforms
A transform extraction is made up of two components: a field transform configuration in
transforms.conf and a
REPORT-<class> field extraction configuration in
props.conf. You can find
$SPLUNK_HOME/etc/system/local. This section shows you how to configure field transforms in
transforms.conf. For configuring a field transform in Splunk Web, see manage field transforms.
In transform extractions, the regular expression is in
transforms.conf and the field extraction is in
props.conf. You can apply one regular expression to multiple field extraction configurations, or have multiple regular expressions for one field extraction configuration. See configure custom fields at search time.
Field transforms contain a field-extracting regular expression and other settings that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in
Transform extractions and the search-time operations sequence
Search-time operation order
Extraction transforms are third in the search-time operations sequence and are processed after inline field extractions.
See Extracting a field that was already extracted during inline field extraction.
Splunk software processes all inline field extractions that belong to a specific host, source, or source type in ASCII sort order according to their
<class> value. You cannot reference a field extracted by
EXTRACT-aaa in the field extraction definition for
EXTRACT-ZZZ, but you can reference a field extracted by
EXTRACT-aaa in the field extraction definition for
For more information
See The sequence of search-time operations.
Configure a transform extraction
Transform extractions use the REPORT extraction configuration in
props.conf. Each REPORT extraction stanza references a field transform that is defined in
transforms.conf. The field transform contains the regular expression that Splunk Enterprise uses to extract fields at search time, and other settings that govern the way that the transform extracts those fields.
Caution: Do not edit files in
$SPLUNK_HOME/etc/system/default/. An upgrade or migration will overwrite your configuration and cause Splunk software to break.
Review the following topics.
- Configure custom fields at search time for information on different types of field extraction.
- Configure inline extractions for information on configuring inline extractions.
- About default fields (host, source, source type, and more) for information about hosts, sources, and sourcetypes.
- Regular expressions and field name syntax for information about field-extracting regular expressions.
- "Field transform syntax" on this page for information on the format for transform definitions.
- "Syntax for a transform-referencing field extraction configuration" on this page for the syntax of transformation extractions.
- Access the
transforms.conffiles, located in
$SPLUNK_HOME/etc/system/local/, or in your custom app directory in
- Identify the source type, source, or host that provides the events that your field is extracted from.
Extraction configurations in
props.confare restricted to a specific source, source type, or host.
- Configure a regular expression that identifies the field in the event.
If your event lists field/value pairs or field values, configure a delimiter-based field extraction that does not require a regular expression.
- Configure a field transform in
transforms.confthat utilizes this regular expression or delimiter configuration.
The transform can define a source key and event value formatting.
- Follow the format for the REPORT field extraction type to configure a field extraction stanza in
props.confthat uses the host, source, or source type identified earlier.
- (Optional)You can configure additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
- Restart your Splunk deployment for your changes to take effect.
Field transform syntax
There are two ways to use transforms. One for regex-based field extractions and one for delimiter-based field extractions. Use the following format when you define a search-time field transform in
[<unique_transform_stanza_name>] REGEX = <regular expression> FORMAT = <string> MATCH_LIMIT = <integer> DEPTH_LIMIT = <integer> SOURCE_KEY = <string> DELIMS = <quoted string list> FIELDS = <quoted string list> MV_ADD = [true|false] CLEAN_KEYS = [true|false] KEEP_EMPTY_VALS = [true|false] CAN_OPTIMIZE = [true|false]
<unique_transform_stanza_name> is required for all search-time transforms.
<unique_transform_stanza_name> values are not required to follow field name syntax restrictions. See field name syntax. You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning.
Field transforms support the following settings. If a setting is not specified or included in the
transforms.conf file, the default for that setting is applied.
||Empty string||Required unless you are setting up an ASCII-only delimiter-based field extraction. See DELIMS.|
Field transform syntax descriptions
Click Expand to see additional information, such as details and configuration examples, about each setting.
A regular expression that operates on your data to extract fields.
REGEX and the
Name-capturing groups in the
REGEX are extracted directly to fields. You do not have to specify
FORMAT for simple field extraction cases.
REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to avoid specifying the mapping in
<string> value should be the same for each key-value pair represented in the regular expression. For example, you could use
_VAL_1 as the capturing groups for a field name and its corresponding value.
||Not using |
Example of using
REGEX for a non-ASCII delimiter.
FORMAT to specify the format of the field/value pair(s) that you are extracting. You do not need to specify the
FORMAT if you have a simple
REGEX with name-capturing groups.
For search-time extractions, the pattern for the
FORMAT field is as follows:
FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*
field-name = [<string>|$<extracting-group-number>]
field-value = [<string>|$<extracting-group-number>]
You cannot create concatenated fields with
FORMAT at search time. This functionality is available only for index-time field transforms. To concatenate a set of regular expression extractions into a single field value, use the
FORMAT setting as an index-time extraction. For example, if you have the string
192(x)0(y)2(z)1 in your event data, you can extract it at index time as an
ip address field value in the format
192.0.2.1. See Configure index-time field extractions in the Getting Data In manual. Do not make extensive changes to your set of indexed fields as it can negatively impact indexing performance and search times.
Example of search-time
- FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value
- FORMAT = $1::$2
If you configure
FORMAT with a variable field name, the regular expression is repeatedly applied to the source event text to match and extract all field/value pairs.
MATCH_LIMIT to set an upper bound on how many times PCRE calls an internal function, match(). If set too low, PCRE may fail to correctly match a pattern.
Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 100000.
DEPTH_LIMIT to limit the depth of nested backtracking in an internal PCRE function, match(). If set too low, PCRE might fail to correctly match a pattern.
Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 1000.
SOURCE_KEY to extract values from another field. You can use any field that is available at the time of the execution of this field extraction.
SOURCE_KEY, identify the field to which the transform's
REGEX is to be applied.
DELIMS in place of
REGEX when dealing with ASCII-only delimiter-based field extractions, where field values or field/value pairs are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.
Each ASCII character in the delimiter string is used as a delimiter to split the event. If the event contains full delimiter-separated field value pairs, you enter two sets of quoted delimiters for
DELIMS. The first set of quoted delimiters separates the field value pairs. The second set of quoted delimiters separates the field name from its corresponding value.
If the events contain only delimiter-separated values (no field names), use one set of quoted delimiters to separate the values. Use the
FIELDS setting to apply field names to the extracted values. Alternatively, Splunk software reads even tokens as field names and odd tokens as field values.
Delimiters must be specified within double quotes (DELIMS="|,;"). Special escape sequences are \t (tab), \n (newline), \r (carriage return), \\ (backslash) and \" (double quotes). If a value contains an embedded unescaped double quote character, such as "foo"bar", use
DELIMS. Non-ASCII delimiters require the use of
REGEX. See REGEX for examples of usage of DELIMS-like functionality.
The following example of
DELIMS usage applies to an event where field value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols.
DELIMS = "|", "="
Use in conjunction with
DELIMS when you perform delimiter-based field extraction, and you only have field values to extract. Use
FIELDS to provide field names for the extracted field values in list format according to the order in which the values are extracted.
If field names contain spaces or commas, use " ". To escape, use \.
Following is an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and a space.
DELIMS = ", "
FIELDS = field1, field2, field3
MV_ADD for events that have multiple occurrences of the same field with different values, and you want to keep each value.
See Extracting a field that was already extracted during inline field extraction.
MV_ADD = true, Splunk software transforms fields that appear multiple times in an event with different values into multivalue fields. The field name appears once. The multiple values for the field follow the = sign.
MV_ADD = false, Splunk software keeps the first value found for a field in an event, and discards every subsequent value found.
Controls whether the system strips leading underscores and 0-9 characters from the field names it extracts. Key cleaning is the practice of replacing any non-alphanumeric characters in field names with underscores, as well as the removal of leading underscores and 0-9 characters from field names.
CLEAN_KEYS = false to your transform to keep your field names intact with no removal of leading underscores or 0-9 characters.
Controls whether Splunk software keeps field value pairs when the value is an empty string.
This option does not apply to field/value pairs that are generated by the Splunk software autoKV extraction (automatic field extraction) process. AutoKV ignores field/value pairs with empty values.
Controls whether Splunk software can disable the extraction.
CAN_OPTIMIZE when you run searches under a search mode setting that disables field discovery to ensure that Splunk software discovers specific fields. Splunk software disables an extraction when none of the fields identified by the extraction are needed for the evaluation of a search.
Syntax for a transform-referencing field extraction configuration
To set up a search-time field extraction in
props.conf that is associated with a field transform, use the
REPORT field extraction class. Use the following format.
[<spec>] REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
||Source type of an event|
||Host for an event|
||Source for an event|
You can associate multiple field transform stanzas to a single field extraction by listing them after the initial
<unique_transform_stanza_name>, separated by commas. See examples of transform extractions.
||A unique literal string that identifies the namespace of the field you are extracting. <class> values do not have to follow field name syntax restrictions and are not subject to key cleaning.|
||Name of your field transform stanza from transforms.conf.|
Extracting a field that was already extracted during inline field extraction
As a result of the sequence of search-time operations, inline field extractions (also known as EXTRACT configurations) are processed before transform field extractions (also known as REPORT configurations). This order has implications for fields that are extracted during both of these operations. For example, say an inline field extraction extracts a field called
userName. Then a subsequent transform field extraction extracts another field called
userName that has a different value. Because the inline field extraction happens first in the sequence, by default, its version of the
userName field is retained and the version of the field extracted by the transform field extraction is discarded. This happens because the
MV_ADD setting is set to
false by default, so the "old" value that is found for a field in an event is kept, and every subsequent "new" value that is found is discarded. In other words, the EXTRACT configuration "wins" over the REPORT configuration.
But, what if you want to keep the value for the field that is extracted second in line by the transform field extraction? You can set
true to prevent a field from being overwritten by another field that has already been extracted. When
true, fields that appear multiple times in an event with different values are transformed into multivalue fields. As a result, the field name, such as
userName in our example, appears only once and both the "old" and "new" values are preserved in a multivalue field.
Configure inline extractions
Configure automatic key-value field extraction
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2303, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209 (latest FedRAMP release)
Feedback submitted, thanks!