Knowledge Manager Manual

 


Create and maintain search-time field extractions through configuration files

Create and maintain search-time field extractions through configuration files

While you can set up and manage search-time field extractions via Splunk Web, it's important to understand how they are handled at the props.conf and transforms.conf level, because those are the configuration files that the Field extractions and Field transformations pages in Splunk Web read from and write to.

Many knowledge managers, especially those who have been using Splunk Enterprise for some time, find it easier to manage their custom fields through configuration files, which can be used to add, maintain, and review libraries of custom field additions for their teams. The configuration files also enable a wider range of field extraction options than you'll get with the Settings pages for field extraction.

This topic shows you how you can:

  • Set up basic "inline" search-time field extractions through edits to props.conf.
  • Design more complex search-time field extractions through a combination of edits to props.conf and transforms.conf.

Regular expressions and field name syntax

Splunk Enterprise uses regular expressions, or regexes, to extract fields from event data. When you use the interactive field extractor (IFX), Splunk Enterprise attempts to generate field-extracting regexes for you, but it can only create regular expressions that extract one field at a time from the events that match them.

On the other hand, when you set up field extractions manually through configuration files, you have to provide the regex yourself--but you can design them so that they extract two or more fields from the events that match them, if necessary.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test your regex by using it in a search with the rex search command. Splunk Enterprise also maintains a list of useful third-party tools for writing and testing regular expressions.

Important: The capturing groups in your regex must identify field names that contain alpha-numeric characters or an underscore. See "Use proper field name syntax," below.

Use proper field name syntax

Splunk Enterprise only accepts field names that contain alpha-numeric characters or an underscore:

  • Valid characters for field names are a-z, A-Z, 0-9, or _ .
  • Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk Enterprise's internal variables.
  • International characters are not allowed.

Splunk Enterprise applies the following "key cleaning" rules to all extracted fields when they are extracted at search time, either by default or through a custom configuration:

1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).

2. When key cleaning is enabled (it is enabled by default), Splunk Enterprise removes all leading underscores and 0-9 characters from extracted fields.

You can disable key cleaning for a particular search-time field extraction by configuring it as an advanced REPORT extraction type, and then having the referenced field transform stanza include the setting CLEAN_KEYS=false. See below for more information about the REPORT extraction configuration.

Note: You cannot turn off key cleaning for basic EXTRACT (props.conf only) field extraction configurations.

Create basic search-time field extractions with props.conf edits

You can create basic search-time field extractions (field extractions that are defined entirely within props.conf, as opposed to extractions that reference field transforms in transforms.conf) by editing the props.conf configuration file. You can find props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom app directory in $SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want to make it easy to transfer your data customizations to other search servers.)

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

For more information on configuration files in general, see "About configuration files" in the Admin Manual.

Steps for defining basic search-time field extractions with props.conf

Basic search-time field extractions use the EXTRACT extraction configuration in props.conf. Each EXTRACT extraction stanza contains the regular expression that Splunk Enterprise uses to extract one or more fields at search time, as well as other attributes that govern the manner in which those fields are extracted.

Follow these steps when you create a basic search-time field extraction:

1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from.

Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.

2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.

3. Follow the format for the EXTRACT field extraction type (defined in the next section) to create a field extraction stanza in props.conf that includes the host/source/sourcetype and regex that you have identified. Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom app directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

4. If your field value is a portion of a word, you must also add an entry to fields.conf. See the example "Create a field from a subtoken" below.

5. Restart Splunk Enterprise for your changes to take effect.

Add an EXTRACT field extraction stanza to props.conf

Follow this format when adding an EXTRACT field extraction to props.conf:

[<spec>]
EXTRACT-<class> = [<regular_expression>|<regular_expression> in <source_field>]
  • <spec> can be:
    • <source type>, the source type of an event.
    • host::<host>, where <host> is the host for an event.
    • source::<source>, where <source> is the source for an event.
    • rule::<rulename>, where <rulename> is the unique name of a source type classification rule.
    • delayedrule::<rulename>, where <rulename> is a unique name of a delayed source type classification rule.

Note: rule and delayedrule are only considered as a last resort before generating a new source type based on the source that Splunk Enterprise sees.

  • <class> is a unique literal string that identifies the namespace of the field (key) you're extracting.
    • Note: <class> values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. <class> values are not subject to key cleaning.
  • The <regular_expression> is required to have named capturing groups; each group represents a different extracted field. When the <regular_expression> matches an event, the named capturing groups and their values are added to the event.
  • Use <regular_expression> in <source_field> to match the regex against the values of a specific field. Otherwise it matches against _raw (all raw event data).
    • Note: <src_field> is a field name, which means it must follow field name syntax. It can only contain alphanumeric characters (a-z, A-Z, and 0-9).
  • If your regex needs to end with in <string> where <string> is not a field name, change the regex to end with [i]n <string> to ensure that Splunk Enterprise doesn't try to match <string> to a field name.

Precedence rules for the EXTRACT field extraction type:

  • For each field extraction, Splunk Enterprise takes the configuration from the highest precedence configuration stanza.
  • When there are multiple categories of matching [<spec>] stanzas, [host::<host>] settings override [<sourcetype>] settings.
  • [source::<source>] settings override both [host::<host>] and [<sourcetype>] settings.
  • Similarly, if a particular field extraction is specified in ../local/ for a <spec>, it overrides that class in ../default/.

There's more to [<spec>] stanza precedence; see props.conf.spec for all the details.

Note: Unlike the procedure for configuring the default set of fields that Splunk Enterprise extracts at index time, transforms.conf requires no DEST_KEY because nothing is being written to the index during search-time field extraction. Fields extracted at search time are not persisted in the index as keys.

Splunk Enterprise follows precedence rules when it runs search-time field extractions. It runs inline field extractions (EXTRACT-<class>) first, and then runs field extractions that reference field transforms (REPORT-<class>).

Setting KV_MODE for search-time data

You can use the KV_MODE attribute to specify the field/value extraction mode for your data. You can add KV_MODE to an EXTRACT or REPORT stanza. Its format is:

KV_MODE = [none|auto|multi|json|xml]
  • none: Disables field extraction for the source, source type, or host identified by the stanza name. You can use this setting to ensure that other regexes that you have created are not overridden by automatic field/value extraction for a particular source, source type, or host. You can also use this setting to increase search performance by disabling extraction for common but nonessential fields. We have some field extraction examples at the end of this topic that demonstrate the disabling of field extraction in different circumstances.
  • auto: Extracts field/value pairs and separates them with equal signs. This is the default field extraction behavior if you do not include this attribute in your field extraction stanza.
  • auto_escaped: Extracts field/value pairs and separates them with equal signs. In addition, this setting ensures that Splunk Enterprise honors \" and \\ as escaped sequences within quoted values. For example: field="value with \"nested\" quotes".
  • multi: This invokes the multikv search command, which extracts field values from table-formatted events.
  • xml: Use this setting if you intend to use the field extraction stanza to extract fields from XML data.
  • json: Use this setting if you intend to use the field extraction stanza to extract fields from JSON data.
  • The xml and json modes will not extract any fields when used on data that isn't of the indicated format (XML or JSON).

Inline (props.conf only) search-time field extraction examples

Here are a set of examples of search-time custom field extraction, set up using props.conf only.

Add a new error code field

This example shows how to create a new "error code" field by configuring a field extraction in props.conf. The field can be identified by the occurrence of device_id= followed by a word within brackets and a text string terminating with a colon. The field should be extracted from events related to the testlog source type.

In props.conf, add:

[testlog]
EXTRACT-errors = device_id=\[w+\](?<err_code>[^:]+)

Extract multiple fields using one regex

This is an example of a field extraction that pulls out five separate fields. You can then use these fields in concert with some event types to help you find port flapping events and report on them.

Here's a sample of the event data that the fields are being extracted from:

#%LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet9/16, changed state to down

The stanza in props.conf for the extraction looks like this:

[syslog]
EXTRACT-port_flapping = Interface\s(?<interface>(?<media>[^\d]+)(?<slot>\d+)\/(?<port>\d+))\,\schanged
\sstate\sto\s(?<port_status>up|down)

Note that five separate fields are extracted as named groups: interface, media, slot, port, and port_status.

The following two steps aren't required for field extraction--they show you what you might do with the extracted fields to find port flapping events and then report on them.

Use tags to define a couple of event types in eventtypes.conf:

[cisco_ios_port_down]
search = "changed state to down"

[cisco_ios_port_up]
search = "changed state to up"

Finally, create a report in savedsearches.conf that ties much of the above together to find port flapping and report on the results:

[port flapping]
search = eventtype=cisco_ios_port_down OR eventtype=cisco_ios_port_up starthoursago=3 | stats count by 
interface,host,port_status | sort -count

Create a field from a subtoken

You may run into problems if you are extracting a field value that is a subtoken--a part of a larger token. Tokens are chunks of event data that have been run through event processing prior to being indexed. During event processing, events are broken up into segments, and this is the point where tokens are created--each segment created is a token.

Tokens are never smaller than a complete word or number. For example, you may have the word foo123 in your event. If it has been run through event processing and indexing, it's a token, and it can be a value of a field. However, if your extraction pulls out the foo as a field value unto itself, you're extracting a subtoken. The problem is that while foo123 exists in the index, foo does not, which means that you'll likely get few results if you search on that subtoken, even though it may appear to be extracted correctly in your search results.

Because tokens cannot be smaller than individual "words" within strings, a field extraction of a subtoken (a part of a word) can cause problems because subtokens will not themselves be in the index, only the larger word of which they are a part.

If your field value is a smaller part of a token, you must configure props.conf as explained above. Then, add an entry to fields.conf:

[<fieldname>]
INDEXED = False
INDEXED_VALUE = False
  • Fill in <fieldname> with the name of your field.
    • For example, [url] if you've configured a field named "url."
  • Set INDEXED and INDEXED_VALUE to false.
    • This tells Splunk Enterprise that the value you're searching for is not a token in the index.

Note: As of release 4.3, you no longer need add this entry to fields.conf for cases where you are extracting a field's value from the value of a default field (such as host, source, sourcetype, or timestamp) that is not indexed (and therefore not tokenized).

For more information on the tokenization of event data, see "About segmentation" in the Getting Data In Manual.

Create advanced search-time field extractions with field transforms

While you can define most search-time field extractions entirely within props.conf, some advanced search-time field extractions reference an additional component called a field transform. This section shows you how to configure field transforms in transforms.conf.

Field transforms contain a field-extracting regular expression and other attributes that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf--they cannot stand alone.

Your search-time field extractions require a field transform component if you need to:

  • Reuse the same field-extracting regular expression across multiple sources, source types, or hosts (in other words, configure one field transform for multiple field extractions). If you find yourself using the same regex to extract fields for different sources, source types, and hosts, you may want to set it up as a transform. Then, if you find that you need to update the regex, you only have to do so once, even though it is used more than one field extraction.
  • Apply more than one field-extracting regular expression to the same source, source type, or host (in other words, apply multiple field transforms to the same field extraction). This is sometimes necessary in cases where the field or fields that you want to extract from a particular source/source type/host appear in two or more very different event patterns.
  • Set up delimiter-based field extractions. Delimiter-based extractions come in handy when your event data presents field-value pairs (or just field values) that are separated by delimiters such as commas, colons, bars, line breaks, and tab spaces.
  • Configure extractions for multivalued fields. When you do this, Splunk Enterprise appends additional field values to the field as it finds them in the event data.
  • Extract fields with names that begin with numbers or underscores. Ordinarily key cleaning removes leading numeric characters and underscores from field names, but you can configure your transform to turn this functionality off if necessary.

You can also configure transforms to:

  • Extract fields from the values of another field (other than _raw) by using the SOURCE_KEY attribute.
  • Manage the formatting of extracted fields, in cases where you are extracting multiple fields or are extracting both the field name and field value, by using the FORMAT attribute.

Both of these configurations can now be set up directly in the regular expression as well. See the "Define a field transform" section below for more information about how to do this.

NOTE: If you need to concatenate a set of regex extractions into a single field value, you can do this with the FORMAT attribute, but only if you set it up as an index-time extraction. For example, if you have a string like 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip address field value in the format 192.0.2.1. For more information, see "Configure index-time field extractions" in the Getting Data In Manual. However we DO NOT RECOMMEND that you make extensive changes to your set of indexed fields--do so sparingly if at all.

Steps for defining custom search-time field extractions that reference field transforms

Advanced search-time field extractions use the REPORT extraction configuration in props.conf. Each REPORT extraction stanza references a field transform that is defined separately in transforms.conf. The field transform contains the regular expression that Splunk Enterprise uses to extract fields at search time, as well as other attributes that govern the way that the transform extracts those fields.

Follow these steps when you create an advanced search-time field extraction:

1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from. (Don't update props.conf yet.)

Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.

2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.

Note: If your event lists field/value pairs or just field values, you can create a delimiter-based field extraction that won't require a regex; see the information on the DELIMS attribute, below, for more information.)

3. Create a field transform in transforms.conf that utilizes this regex (or delimiter configuration). The transform can also define a source key and/or event value formatting.

Edit the transforms.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom app directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

4. Follow the format for the REPORT field extraction type (defined two sections down) to create a field extraction stanza in props.conf that uses the host, source, or source type that you identified in Step 1. If necessary, you can create additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.

Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom app directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

5. Restart Splunk Enterprise for your changes to take effect.

First, define a field transform

Follow this format when defining a search-time field transform in transforms.conf:

[<unique_transform_stanza_name>]
REGEX = <regular expression>
FORMAT = <string>
SOURCE_KEY = <string>
DELIMS = <quoted string list>
FIELDS = <quoted string list>
MV_ADD = [true|false]
CLEAN_KEYS = [true|false]
KEEP_EMPTY_VALS = [true|false]
CAN_OPTIMIZE = [true|false]
  • The <unique_transform_stanza_name> is required for all search-time transforms. Note: <unique_transform_stanza_name> values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning.
  • REGEX is a regular expression that operates on your data to extract fields. It is required for all search-time field transforms unless you are setting up a delimiter-based transaction, in which case you use the DELIMS attribute instead (see the DELIMS attribute description, below).
    • Defaults to an empty string.
  • REGEX and the FORMAT attribute:
    • Name-capturing groups in the REGEX are extracted directly to fields, which means that you don't have to specify FORMAT for simple field extraction cases.
    • If the REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping in FORMAT:
<_KEY_><string>, <_VAL_><string>.
  • For example, the following are equivalent:
Using FORMAT:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
Not using FORMAT:
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
  • In both of these cases, Splunk Enterprise applies the regular expression against the source text of an event repeatedly to extract all of the field/value combinations that it can identify.
  • FORMAT is optional. Use it to specify the format of the field/value pair(s) that you are extracting. You don't need to specify the FORMAT if you have a simple REGEX with name-capturing groups.
    • For search-time extractions, this is the pattern for the FORMAT field:
FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*
where:
field-name = [<string>|$<extracting-group-number>]
field-value = [<string>|$<extracting-group-number>]
Examples of search-time FORMAT usage:
1. FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value
2. FORMAT = $1::$2
  • If you configure FORMAT with a variable field name (such as in example #2 just above, where $1 represents the field name), Splunk Enterprise applies the regular expression against the source event text repeatedly to match and extract as many field/value pairs as it can find.
    • Note: You cannot create concatenated fields with FORMAT at search time. This functionality is only available for index-time field transforms.
    • FORMAT defaults to an empty string.
  • SOURCE_KEY is optional. Use it to extract one or more values from the values of another field. You can use any field that is available at the time of the execution of this field extraction.
    • To configure SOURCE_KEY, identify the field to which Splunk Enterprise should apply the transform's REGEX.
    • By default, SOURCE_KEY is set to _raw, which means it is applied to the raw, unprocessed text of all events.
  • DELIMS is optional. Use it in place of REGEX when dealing with delimiter-based field extractions, where field values--or field/value pairs--are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.
    • Delimiters must be quoted with " " . You can use the backwards slash to escape double quotes around a value if necessary (\").
    • IMPORTANT: If a value may contain an embedded unescaped double quote character, such as "foo"bar", we recommend that you use REGEX, not DELIMS.
    • Each character in the delimiter string is used as a delimiter to split the event.
    • If the event contains full delimiter-separated field/value pairs, you enter two sets of quoted delimiters for DELIMS. The first set of quoted delimiters separates the field/value pairs. The second set of quoted delimiters separates the field name from its corresponding value.
    • If the events only contain delimiter-separated values (no field names), you use one set of quoted delimiters, to separate the values. Then you use the FIELDS attribute to apply field names to the extracted values (see FIELDS below). Alternatively, Splunk Enterprise reads even tokens as field names and odd tokens as field values.
    • Splunk Enterprise consumes consecutive delimiter characters unless you specify a list of field names.
    • Defaults to empty string.
    • This example of DELIMS usage applies to an event where field/value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols:
[pipe_eq]
DELIMS = "|", "="
  • FIELDS is used in conjunction with DELIMS when you are performing delimiter-based field extraction, but you only have field values to extract. Use FIELDS to provide field names for the extracted field values, in list format according to the order in which the values are extracted.
    • Note: If field names contain spaces or commas they must be quoted with " " (to escape, use \).
    • Defaults to an empty string.
    • Here's an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and then a space.
[commalist]
DELIMS = ", "
FIELDS = field1, field2, field3
  • MV_ADD is optional. Use it when you have events that have multiple occurrences of the same field but with different values and you want to keep each of the field's values.
    • When MV_ADD = true, Splunk Enterprise transforms fields that appear multiple times in an event with different values into multivalued fields (the field name appears once, the multiple values for the field follow the '=' sign).
    • When MV_ADD=false, Splunk Enterprise keeps the first value found for a field in an event and discards every subsequent value found for that same field in that same event.
    • Defaults to false.
  • CLEAN_KEYS is optional. It controls whether or not the system strips leading underscores and 0-9 characters from the keys (field names) it extracts (see the subtopic "Use proper field name syntax," above, for more information). "Key cleaning" is the practice of replacing any non-alphanumeric characters (characters other than those falling between the a-z, A-Z, and 0-9 ranges) in field names with underscores, as well as the stripping of leading underscores and 0-9 characters from field names.
    • Add CLEAN_KEYS = false to your transform if you need to keep your field names intact (no removal of leading underscores and/or 0-9 characters).
    • By default, CLEAN_KEYS is always set to true for transforms.
  • KEEP_EMPTY_VALS is optional. It controls whether Splunk Enterprise keeps field/value pairs when the value is an empty string.
    • This option does not apply to field/value pairs that are generated by Splunk Enterprise's autokv extraction (automatic field extraction) process. Autokv ignores field/value pairs with empty values.
    • Defaults to false.
  • CAN_OPTIMIZE is optional. It controls whether Splunk Enterprise can optimize the extraction out (or, in other words, disable the extraction).
    • You might use this if you're running searches under a Search Mode setting that disables field discovery--it ensures that Splunk Enterprise always discovers specific fields.
    • Splunk Enterprise only disables an extraction if it can determine that none of the fields identified by the extraction will ever be needed for the successful evaluation of a search.
    • Note: This attribute should rarely be set to false.
    • Defaults to true.

Second, configure a REPORT field extraction stanza in props.conf and associate it with the field transform

When you're setting up a search-time field extraction in props.conf that is associated with a field transform, you use the REPORT field extraction class. Follow this format.

You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>, separated by commas. (For more information, see the example later in this topic.)

[<spec>]
REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
  • <spec> can be:
    • <sourcetype>, the source type of an event.
    • host::<host>, where <host> is the host for an event.
    • source::<source>, where <source> is the source for an event.
  • <class> is a unique literal string that identifies the namespace of the field (key) you're extracting. Note: <class> values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. <class> values are not subject to key cleaning.
  • <unique_transform_stanza_name> is the name of your field transform stanza from transforms.conf.
  • Precedence rules for the REPORT field extraction class:
    • For each class, Splunk Enterprise takes the configuration from the highest precedence configuration block.
    • If a particular class is specified for a source and a sourcetype, the class for source wins out.
    • Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides that class in ../default/.

If you have a set of transforms that must be run in a specific order and which belong to the same host, source, or source type, you can place them in a comma-separated list within the same props.conf stanza. Splunk Enterprise will apply them in the specified order. For example, this sequence insures that the [yellow] field transform gets applied first, then [blue], and then [red]:

[source::color_logs]
REPORT-colorchange = yellow, blue, red

If you need to change the order, rearrange the list.

Examples of custom search-time field extractions using field transforms

These examples present custom field extraction use cases that require you to configure one or more field transform stanzas in transforms.conf and then reference them in a props.conf field extraction stanza.

Configuring a field extraction that utilizes multiple field transforms

This example of search-time field transform setup demonstrates how:

  • you can create transforms that pull varying field name/value pairs from events.
  • you can create a field extraction that references two or more field transforms.

Let's say you have logs that contain multiple field name/field value pairs. While the fields vary from event to event, the pairs always appear in one of two formats.

The logs often come in this format:

[fieldName1=fieldValue1] [fieldName2=fieldValue2]

However, at times they are more complicated, logging multiple name/value pairs as a list, in which case the format looks like:

[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2]

Note that the list items are separated by commas, and that each fieldName is matched with a corresponding fieldValue. In these secondary cases you still want to pull out the field names and values so that the search results are

fieldName1=fieldValue1
fieldName2=fieldValue2

and so on.

To make things more clear, here's an example of an HTTP request event that combines both of the above formats.

[method=GET] [IP=10.1.1.1] [headerName=Host] [headerValue=www.example.com], [headerName=User-Agent] [headerValue=Mozilla], [headerName=Connection] [headerValue=close] [byteCount=255]

You want to develop a single field extraction that would pull the following field/value pairs from that event:

method=GET
IP=10.1.1.1
Host=www.example.com
User-Agent=Mozilla
Connection=close
byteCount=255

Solution

To efficiently and reliably pull out both formats of field/value pairs, you'll want to design two different regexes that are optimized for each format. One regex will identify events with the first format and pull out all of the matching field/value pairs. The other regex will identify events with the other format and pull out those field/value pairs.

You then create two unique transforms in transforms.conf--one for each regex--and then unite them in the corresponding field extraction stanza in props.conf.

The first transform you add to transforms.conf catches the fairly conventional [fieldName1=fieldValue1] [fieldName2=fieldValue2] case.

[myplaintransform]
REGEX=\[(?!(?:headerName|headerValue))([^\s\=]+)\=([^\]]+)\]
FORMAT=$1::$2

The second transform (also added to transforms.conf) catches the slightly more complex [headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2] case:

[mytransform]
REGEX= \[headerName\=(\w+)\],\s\[headerValue=([^\]]+)\]
FORMAT= $1::$2

Both transforms use the <fieldName>::<fieldValue> FORMAT to match each field name in the event with its corresponding value. This setting in FORMAT enables Splunk Enterprise to keep matching the regex against a matching event until every matching field/value combination is extracted.

Finally, this field extraction stanza, which you create in props.conf, references both of the field transforms:

[mysourcetype]
KV_MODE=none
REPORT-a = mytransform, myplaintransform

Note that, besides using multiple field transforms, the field extraction stanza also sets KV_MODE=none. This disables automatic field/value extraction for the identified source type (while letting your manually defined extractions continue). It ensures that these new regexes aren't overridden by automatic field extraction, and it also helps increase your search performance. (See the following subsection for more on disabling key/value extraction.)

Configuring delimiter-based field extraction

You can use the DELIMS attribute in field transforms to configure field extractions for events where field values or field/value pairs are separated by delimiters such as commas, colons, tab spaces, and more.

For example, say you have a recurring multiline event where a different field/value pair sits on a separate line, and each pair is separated by a colon followed by a tab space. Here's a sample event:

ComponentId:     Application Server
ProcessId:   5316
ThreadId:    00000000
ThreadName:  P=901265:O=0:CT
SourceId:    com.ibm.ws.runtime.WsServerImpl
ClassName:   
MethodName:  
Manufacturer:    IBM
Product:     WebSphere
Version:     Platform 7.0.0.7 [BASE 7.0.0.7 cf070942.55]
ServerName:  sfeserv36Node01Cell\sfeserv36Node01\server1
TimeStamp:   2010-04-27 09:15:57.671000000
UnitOfWork:  
Severity:    3
Category:    AUDIT
PrimaryMessage:  WSVR0001I: Server server1 open for e-business
ExtendedMessage: 

Now you could set up a bulky, wordy search-time field extraction stanza in props.conf that handles all of these fields:

[activityLog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
EXTRACT-ComponentId = ComponentId:\t(?.*)
EXTRACT-ProcessId = ProcessId:\t(?.*)
EXTRACT-ThreadId = ThreadId:\t(?.*)
EXTRACT-ThreadName = ThreadName:\t(?.*)
EXTRACT-SourceId = SourceId:\t(?.*)
EXTRACT-ClassName = ClassName:\t(?.*)
EXTRACT-MethodName = MethodName:\t(?.*)
EXTRACT-Manufacturer = Manufacturer:\t(?.*)
EXTRACT-Product = Product:\t(?.*)
EXTRACT-Version = Version:\t(?.*)
EXTRACT-ServerName = ServerName:\t(?.*)
EXTRACT-TimeStamp = TimeStamp:\t(?.*)
EXTRACT-UnitOfWork = UnitOfWork:\t(?.*)
EXTRACT-Severity = Severity:\t(?.*)
EXTRACT-Category = Category:\t(?.*)
EXTRACT-PrimaryMessage = PrimaryMessage:\t(?.*)
EXTRACT-ExtendedMessage = ExtendedMessage:\t(?.*)

But that solution is pretty over-the-top. Is there a more elegant way to handle it that would remove the need for all these EXTRACT lines? Yes!

Configure the following stanza in transforms.conf:

[activity_report]
DELIMS = "\n", ":\t"

This states that the field/value pairs in the event are on separate lines ("\n"), and then specifies that the field name and field value on each line is separated by a colon and tab space (":\t").

To complete this configuration, rewrite the wordy props.conf stanza mentioned above as:

[activitylog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
REPORT-activity = activity_report

These two brief configurations will extract the same set of fields as before, but they leave less room for error and are more flexible.

Handling events with multivalued fields

You can use the MV_ADD attribute to extract fields in situations where the same field is used more than once in an event, but has a different value each time. Ordinarily, Splunk Enterprise only extracts the first occurrence of a field in an event; every subsequent occurrence is discarded. But when MV_ADD is set to true in transforms.conf, Splunk Enterprise treats the field like a multivalue field and extracts each unique field/value pair in the event.

Say you have a set of events that look like this:

event1.epochtime=1282182111 type=type1 value=value1 type=type3 value=value3
event2.epochtime=1282182111 type=type2 value=value4 type=type3 value=value5 type=type4 value=value6

See how the type and value fields are repeated several times in each event? What you'd like to do is search type=type3 and have both of these events be returned. Or you'd like to run a count(type) report on these two events that returns 5.

So, what you want to do is create a custom multivalue extraction of the type field for these events. Here's how you would set up your transforms.conf and props.conf files to enable it:

First, transforms.conf:

[mv-type]
REGEX = type=(?<type>\s+)
MV_ADD = true

Then, in props.conf for your sourcetype or source, set:

REPORT-type = mv-type

Disabling automatic search-time extraction for specific sources, source types, or hosts

You can disable automatic search-time field extraction for specific sources, source types, or hosts through edits in props.conf. Add KV_MODE = none for the appropriate [<spec>] in props.conf.

Note: Custom field extractions set up manually via the configuration files or Splunk Web will still be processed for the affected source, source type, or host when KV_MODE = none.

[<spec>]
KV_MODE = none

<spec> can be:

  • <sourcetype> - an event source type.
  • host::<host>, where <host> is the host for an event.
  • source::<source>, where <source> is the source for an event.

This documentation applies to the following versions of Splunk: 6.0 , 6.0.1 , 6.0.2 , 6.0.3 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!