Configure advanced extractions with field transforms
A transform extraction is made up of two components: a field transform configuration in
transforms.conf and a
REPORT-<class> field extraction configuration in
props.conf. You can find
$SPLUNK_HOME/etc/system/local. This section shows you how to configure field transforms in
transforms.conf. For configuring a field transform in Splunk Web, see manage field transforms.
In transform extractions, the regular expression is in
transforms.conf and the field extraction is in
props.conf. You can apply one regular expression to multiple field extraction configurations, or have multiple regular expressions for one field extraction configuration. See configure custom fields at search time.
Field transforms contain a field-extracting regular expression and other attributes that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in
Transform extractions and the search-time operations sequence
Search-time operation order
Extraction transforms are second in the search-time operations sequence and are processed after inline field extractions.
Splunk software processes all inline field extractions that belong to a specific host, source, or source type in ASCII sort order according to their
<class> value. You cannot reference a field extracted by
EXTRACT-aaa in the field extraction definition for
EXTRACT-ZZZ, but you can reference a field extracted by
EXTRACT-aaa in the field extraction definition for
For more information
Configure a transform extraction
Transform extractions use the REPORT extraction configuration in
props.conf. Each REPORT extraction stanza references a field transform that is defined in
transforms.conf. The field transform contains the regular expression that Splunk Enterprise uses to extract fields at search time, and other attributes that govern the way that the transform extracts those fields.
Caution: Do not edit files in
$SPLUNK_HOME/etc/system/default/. An upgrade or migration will overwrite your configuration and cause Splunk software to break.
Review the following topics.
- Configure custom fields at search time for information on different types of field extraction.
- Configure inline extractions for information on configuring inline extractions.
- About default fields (host, source, source type, and more) for information about hosts, sources, and sourcetypes.
- Regular expressions and field name syntax for information about field-extracting regular expressions.
- Field transform syntax for information on the format for transform definitions.
- Syntax for transform configuration for the syntax of transformation extractions.
- Access the
transforms.conffiles, located in
$SPLUNK_HOME/etc/system/local/, or in your custom app directory in
- Identify the source type, source, or host that provides the events that your field is extracted from.
Extraction configurations in
props.confare restricted to a specific source, source type, or host.
- Configure a regular expression that identifies the field in the event.
If your event lists field/value pairs or field values, configure a delimiter-based field extraction that does not require a regular expression.
- Configure a field transform in
transforms.confthat utilizes this regular expression or delimiter configuration.
The transform can define a source key and event value formatting.
- Follow the format for the REPORT field extraction type to configure a field extraction stanza in
props.confthat uses the host, source, or source type identified earlier.
- (Optional)You can configure additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
- Restart your Splunk deployment for your changes to take effect.
Field transform syntax
There are two ways to use transforms. One for regex-based field extractions and one for delimiter-based field extractions. Use the following format when you define a search-time field transform in
[<unique_transform_stanza_name>] REGEX = <regular expression> FORMAT = <string> MATCH_LIMIT = <integer> DEPTH_LIMIT = <integer> SOURCE_KEY = <string> DELIMS = <quoted string list> FIELDS = <quoted string list> MV_ADD = [true|false] CLEAN_KEYS = [true|false] KEEP_EMPTY_VALS = [true|false] CAN_OPTIMIZE = [true|false]
<unique_transform_stanza_name> is required for all search-time transforms.
<unique_transform_stanza_name> values are not required to follow field name syntax restrictions. See field name syntax. You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning.
||Empty string||Required unless you are setting up an ASCII-only delimiter-based field extraction. See DELIMS.|
Field transform syntax descriptions
Click Expand to see additional information, such as details and configuration examples, about each attribute.
A regular expression that operates on your data to extract fields.
REGEX and the
Name-capturing groups in the
REGEX are extracted directly to fields. You do not have to specify
FORMAT for simple field extraction cases.
REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to avoid specifying the mapping in
|| Not using |
Example of using
REGEX for a non-ASCII delimiter.
|| Valid |
FORMAT to specify the format of the field/value pair(s) that you are extracting. You do not need to specify the
FORMAT if you have a simple
REGEX with name-capturing groups.
For search-time extractions, the pattern for the
FORMAT field is as follows:
FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*
field-name = [<string>|$<extracting-group-number>]
field-value = [<string>|$<extracting-group-number>]
You cannot create concatenated fields with
FORMAT at search time. This functionality is available only for index-time field transforms. To concatenate a set of regular expression extractions into a single field value, use the
FORMAT attribute as an index-time extraction. For example, if you have the string
192(x)0(y)2(z)1 in your event data, you can extract it at index time as an
ip address field value in the format
192.0.2.1. See Configure index-time field extractions in the Getting Data In manual. Do not make extensive changes to your set of indexed fields as it can negatively impact indexing performance and search times.
Example of search-time
- FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value
- FORMAT = $1::$2
If you configure
FORMAT with a variable field name, the regular expression is repeatedly applied to the source event text to match and extract all field/value pairs.
MATCH_LIMIT to set an upper bound on how many times PCRE calls an internal function, match(). If set too low, PCRE may fail to correctly match a pattern.
Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 100000.
DEPTH_LIMIT to limit the depth of nested backtracking in an internal PCRE function, match(). If set too low, PCRE might fail to correctly match a pattern.
Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 1000.
SOURCE_KEY to extract values from another field. You can use any field that is available at the time of the execution of this field extraction.
SOURCE_KEY, identify the field to which the transform's
REGEX is to be applied.
DELIMS in place of
REGEX when dealing with ASCII-only delimiter-based field extractions, where field values or field/value pairs are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.
Each ASCII character in the delimiter string is used as a delimiter to split the event. If the event contains full delimiter-separated field value pairs, you enter two sets of quoted delimiters for
DELIMS. The first set of quoted delimiters separates the field value pairs. The second set of quoted delimiters separates the field name from its corresponding value.
If the events contain only delimiter-separated values (no field names), use one set of quoted delimiters to separate the values. Use the
FIELDS attribute to apply field names to the extracted values. Alternatively, Splunk software reads even tokens as field names and odd tokens as field values.
Delimiters must be specified within double quotes (DELIMS="|,;"). Special escape sequences are \t (tab), \n (newline), \r (carriage return), \\ (backslash) and \" (double quotes). If a value contains an embedded unescaped double quote character, such as "foo"bar", use
DELIMS. Non-ASCII delimiters require the use of
REGEX. See REGEX for examples of usage of DELIMS-like functionality.
The following example of
DELIMS usage applies to an event where field value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols.
DELIMS = "|", "="
Use in conjunction with
DELIMS when you perform delimiter-based field extraction, and you only have field values to extract. Use
FIELDS to provide field names for the extracted field values in list format according to the order in which the values are extracted.
If field names contain spaces or commas, use " ". To escape, use \.
Following is an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and a space.
DELIMS = ", "
FIELDS = field1, field2, field3
MV_ADD for events that have multiple occurrences of the same field with different values, and you want to keep each value.
MV_ADD = true, Splunk software transforms fields that appear multiple times in an event with different values into multivalue fields. The field name appears once. The multiple values for the field follow the = sign.
MV_ADD = false, Splunk software keeps the first value found for a field in an event, and discards every subsequent value found.
Controls whether the system strips leading underscores and 0-9 characters from the field names it extracts. Key cleaning is the practice of replacing any non-alphanumeric characters in field names with underscores, as well as the removal of leading underscores and 0-9 characters from field names.
CLEAN_KEYS = false to your transform to keep your field names intact with no removal of leading underscores or 0-9 characters.
Controls whether Splunk software keeps field value pairs when the value is an empty string.
This option does not apply to field/value pairs that are generated by the Splunk software autoKV extraction (automatic field extraction) process. AutoKV ignores field/value pairs with empty values.
Controls whether Splunk software can disable the extraction.
CAN_OPTIMIZE when you run searches under a search mode setting that disables field discovery to ensure that Splunk software discovers specific fields. Splunk software disables an extraction when none of the fields identified by the extraction are needed for the evaluation of a search.
Syntax for a transform-referencing field extraction configuration
To set up a search-time field extraction in
props.conf that is associated with a field transform, use the
REPORT field extraction class. Use the following format.
[<spec>] REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
||Source type of an event|
||Host for an event|
||Source for an event|
You can associate multiple field transform stanzas to a single field extraction by listing them after the initial
<unique_transform_stanza_name>, separated by commas. See examples of transform extractions.
||A unique literal string that identifies the namespace of the field you are extracting. <class> values do not have to follow field name syntax restrictions and are not subject to key cleaning.|
||Name of your field transform stanza from transforms.conf.|
Configure inline extractions
Configure automatic key-value field extraction
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2