Configure advanced extractions with field transforms
A transform extraction is made up of two components: a field transform configuration in transforms.conf
and a REPORT-<class>
field extraction configuration in props.conf
. You can find transforms.conf
and props.conf
in $SPLUNK_HOME/etc/system/local
. This section shows you how to configure field transforms in transforms.conf
. For configuring a field transform in Splunk Web, see manage field transforms.
In transform extractions, the regular expression is in transforms.conf
and the field extraction is in props.conf
. You can apply one regular expression to multiple field extraction configurations, or have multiple regular expressions for one field extraction configuration. See configure custom fields at search time.
Field transforms contain a field-extracting regular expression and other settings that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf
.
Transform extractions and the search-time operations sequence
Search-time operation order
Extraction transforms are third in the search-time operations sequence and are processed after inline field extractions.
See Extracting a field that was already extracted during inline field extraction.
Restrictions
Splunk software processes all inline field extractions that belong to a specific host, source, or source type in ASCII sort order according to their <class>
value. You cannot reference a field extracted by EXTRACT-aaa
in the field extraction definition for EXTRACT-ZZZ
, but you can reference a field extracted by EXTRACT-aaa
in the field extraction definition for EXTRACT-ddd
.
For more information
See The sequence of search-time operations.
Configure a transform extraction
Transform extractions use the REPORT extraction configuration in props.conf
. Each REPORT extraction stanza references a field transform that is defined in transforms.conf
. The field transform contains the regular expression that Splunk Enterprise uses to extract fields at search time, and other settings that govern the way that the transform extracts those fields.
Caution: Do not edit files in $SPLUNK_HOME/etc/system/default/
. An upgrade or migration will overwrite your configuration and cause Splunk software to break.
Prerequisites
Review the following topics.
- Configure custom fields at search time for information on different types of field extraction.
- Configure inline extractions for information on configuring inline extractions.
- About default fields (host, source, source type, and more) for information about hosts, sources, and sourcetypes.
- Regular expressions and field name syntax for information about field-extracting regular expressions.
- "Field transform syntax" on this page for information on the format for transform definitions.
- "Syntax for a transform-referencing field extraction configuration" on this page for the syntax of transformation extractions.
- Access the
props.conf
and thetransforms.conf
files, located in$SPLUNK_HOME/etc/system/local/
, or in your custom app directory in$SPLUNK_HOME/etc/apps/
.
Steps
- Identify the source type, source, or host that provides the events that your field is extracted from.
Extraction configurations inprops.conf
are restricted to a specific source, source type, or host. - Configure a regular expression that identifies the field in the event.
If your event lists field/value pairs or field values, configure a delimiter-based field extraction that does not require a regular expression. - Configure a field transform in
transforms.conf
that utilizes this regular expression or delimiter configuration.
The transform can define a source key and event value formatting. - Follow the format for the REPORT field extraction type to configure a field extraction stanza in
props.conf
that uses the host, source, or source type identified earlier. - (Optional)You can configure additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
- Restart your Splunk deployment for your changes to take effect.
Field transform syntax
There are two ways to use transforms. One for regex-based field extractions and one for delimiter-based field extractions. Use the following format when you define a search-time field transform in transforms.conf
:
[<unique_transform_stanza_name>] REGEX = <regular expression> FORMAT = <string> MATCH_LIMIT = <integer> DEPTH_LIMIT = <integer> SOURCE_KEY = <string> DELIMS = <quoted string list> FIELDS = <quoted string list> MV_ADD = [true|false] CLEAN_KEYS = [true|false] KEEP_EMPTY_VALS = [true|false] CAN_OPTIMIZE = [true|false]
The <unique_transform_stanza_name>
is required for all search-time transforms.
<unique_transform_stanza_name>
values are not required to follow field name syntax restrictions. See field name syntax. You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning.
Field transforms support the following settings. If a setting is not specified or included in the transforms.conf
file, the default for that setting is applied. See the "Field transform syntax descriptions" section on this page for a description of each setting.
Setting | Default | Required or optional |
---|---|---|
REGEX
|
Empty string | Required unless you are setting up an ASCII-only delimiter-based field extraction. See DELIMS. |
FORMAT
|
Empty string | Optional. |
MATCH_LIMIT
|
100000 | Optional. |
DEPTH_LIMIT
|
1000 | Optional. |
SOURCE_KEY
|
_raw | Optional. |
DELIMS
|
Empty string | Optional. |
FIELDS
|
Empty string | Optional. |
MV_ADD
|
False
|
Optional. |
CLEAN_KEYS
|
True
|
Optional. |
KEEP_EMPTY_VALS
|
False
|
Optional. |
CAN_OPTIMIZE
|
True
|
Optional. |
Field transform syntax descriptions
Click Expand to see additional information, such as details and configuration examples, about each setting.
REGEX
A regular expression that operates on your data to extract fields.
REGEX
and the FORMAT
field
Name-capturing groups in the REGEX
are extracted directly to fields. You do not have to specify FORMAT
for simple field extraction cases.
If the REGEX
extracts both the field name and its corresponding value, you can use the following special capturing groups to avoid specifying the mapping in FORMAT
: _KEY_<string>
, _VAL_<string>
. The <string>
value should be the same for each key-value pair represented in the regular expression. For example, you could use _KEY_1
and _VAL_1
as the capturing groups for a field name and its corresponding value.
Example of REGEX
and FORMAT
Using FORMAT
|
Not using FORMAT
|
---|---|
REGEX = ([a-z]+)=([a-z]+) is equivalent to FORMAT = $1::$2
|
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
|
Example of using REGEX
for DELIMS
-like functionality
Use REGEX
for a non-ASCII delimiter.
Invalid DELIMS
|
Valid REGEX configuration
|
---|---|
DELIMS = "¦|", "≈="
|
REGEX = ^([^¦|≈=]+)[≈=]([^¦|≈=]+)[¦|]([^¦|≈=]+)[≈=]([^¦|≈=]+)[¦|]([^¦|≈=]+)[≈=]([^¦|≈=]+)$
|
DELIMS = "¦|"
|
REGEX = ^(?<ace>[^¦|]+)[¦|](?<bubbles>[^¦|]+)[¦|](?<cupcake>[^¦|]+)$
|
FORMAT
Use FORMAT
to specify the format of the field/value pair(s) that you are extracting. You do not need to specify the FORMAT
if you have a simple REGEX
with name-capturing groups.
Configuration
For search-time extractions, the pattern for the FORMAT
field is as follows:
FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*
where:
field-name = [<string>|$<extracting-group-number>]
field-value = [<string>|$<extracting-group-number>]
Restrictions
You cannot create concatenated fields with FORMAT
at search time. This functionality is available only for index-time field transforms. To concatenate a set of regular expression extractions into a single field value, use the FORMAT
setting as an index-time extraction. For example, if you have the string 192(x)0(y)2(z)1
in your event data, you can extract it at index time as an ip address
field value in the format 192.0.2.1
. See Configure index-time field extractions in the Getting Data In manual. Do not make extensive changes to your set of indexed fields as it can negatively impact indexing performance and search times.
Example of search-time FORMAT
usage
- FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value
- FORMAT = $1::$2
If you configure FORMAT
with a variable field name, the regular expression is repeatedly applied to the source event text to match and extract all field/value pairs.
MATCH_LIMIT
Use MATCH_LIMIT
to set an upper bound on how many times PCRE calls an internal function, match(). If set too low, PCRE may fail to correctly match a pattern.
Configuration
Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 100000.
DEPTH_LIMIT
Use DEPTH_LIMIT
to limit the depth of nested backtracking in an internal PCRE function, match(). If set too low, PCRE might fail to correctly match a pattern.
Configuration
Limits the amount of resources that are spent by PCRE when running patterns that will not match. Defaults to 1000.
SOURCE_KEY
Use SOURCE_KEY
to extract values from another field. You can use any field that is available at the time of the execution of this field extraction.
Configuration
To configure SOURCE_KEY
, identify the field to which the transform's REGEX
is to be applied.
DELIMS
Use DELIMS
in place of REGEX
when dealing with ASCII-only delimiter-based field extractions, where field values or field/value pairs are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.
Configuration
Each ASCII character in the delimiter string is used as a delimiter to split the event. If the event contains full delimiter-separated field value pairs, you enter two sets of quoted delimiters for DELIMS
. The first set of quoted delimiters separates the field value pairs. The second set of quoted delimiters separates the field name from its corresponding value.
If the events contain only delimiter-separated values (no field names), use one set of quoted delimiters to separate the values. Use the FIELDS
setting to apply field names to the extracted values. Alternatively, Splunk software reads even tokens as field names and odd tokens as field values.
Restrictions
Delimiters must be specified within double quotes (DELIMS="|,;"). Special escape sequences are \t (tab), \n (newline), \r (carriage return), \\ (backslash) and \" (double quotes). If a value contains an embedded unescaped double quote character, such as "foo"bar", use REGEX
, not DELIMS
. Non-ASCII delimiters require the use of REGEX
. See REGEX for examples of usage of DELIMS-like functionality.
Example
The following example of DELIMS
usage applies to an event where field value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols.
[pipe_eq]
DELIMS = "|", "="
FIELDS
Use in conjunction with DELIMS
when you perform delimiter-based field extraction, and you only have field values to extract. Use FIELDS
to provide field names for the extracted field values in list format according to the order in which the values are extracted.
If field names contain spaces or commas, use " ". To escape, use \.
Example
Following is an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and a space.
[commalist]
DELIMS = ", "
FIELDS = field1, field2, field3
MV_ADD
Use MV_ADD
for events that have multiple occurrences of the same field with different values, and you want to keep each value.
See Extracting a field that was already extracted during inline field extraction.
Configuration
When MV_ADD = true
, Splunk software transforms fields that appear multiple times in an event with different values into multivalue fields. The field name appears once. The multiple values for the field follow the = sign.
When MV_ADD = false
, Splunk software keeps the first value found for a field in an event, and discards every subsequent value found.
CLEAN_KEYS
Controls whether the system strips leading underscores and 0-9 characters from the field names it extracts. Key cleaning is the practice of replacing any non-alphanumeric characters in field names with underscores, as well as the removal of leading underscores and 0-9 characters from field names.
Configuration
Add CLEAN_KEYS = false
to your transform to keep your field names intact with no removal of leading underscores or 0-9 characters.
KEEP_EMPTY_VALS
Controls whether Splunk software keeps field value pairs when the value is an empty string.
This option does not apply to field/value pairs that are generated by the Splunk software autoKV extraction (automatic field extraction) process. AutoKV ignores field/value pairs with empty values.
CAN_OPTIMIZE
Controls whether Splunk software can disable the extraction.
Use CAN_OPTIMIZE
when you run searches under a search mode setting that disables field discovery to ensure that Splunk software discovers specific fields. Splunk software disables an extraction when none of the fields identified by the extraction are needed for the evaluation of a search.
Syntax for a transform-referencing field extraction configuration
To set up a search-time field extraction in props.conf
that is associated with a field transform, use the REPORT
field extraction class. Use the following format.
[<spec>] REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
<spec> | Description |
---|---|
<source type> |
Source type of an event |
host::<host> |
Host for an event |
source::<source> |
Source for an event |
You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>
, separated by commas. See examples of transform extractions.
REPORT-<class> | Description |
---|---|
<class> |
A unique literal string that identifies the namespace of the field you are extracting. <class> values do not have to follow field name syntax restrictions and are not subject to key cleaning. |
<unique_transform_stanza_name> |
Name of your field transform stanza from transforms.conf. |
Extracting a field that was already extracted during inline field extraction
As a result of the sequence of search-time operations, inline field extractions (also known as EXTRACT configurations) are processed before transform field extractions (also known as REPORT configurations). This order has implications for fields that are extracted during both of these operations. For example, say an inline field extraction extracts a field called userName
. Then a subsequent transform field extraction extracts another field called userName
that has a different value. Because the inline field extraction happens first in the sequence, by default, its version of the userName
field is retained and the version of the field extracted by the transform field extraction is discarded. This happens because the MV_ADD
setting is set to false
by default, so the "old" value that is found for a field in an event is kept, and every subsequent "new" value that is found is discarded. In other words, the EXTRACT configuration "wins" over the REPORT configuration.
But, what if you want to keep the value for the field that is extracted second in line by the transform field extraction? You can set MV_ADD
to true
to prevent a field from being overwritten by another field that has already been extracted. When MV_ADD
is true
, fields that appear multiple times in an event with different values are transformed into multivalue fields. As a result, the field name, such as userName
in our example, appears only once and both the "old" and "new" values are preserved in a multivalue field.
Configure inline extractions | Configure automatic key-value field extraction |
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!