Create and maintain search-time field extractions through configuration files
Contents
- Regular expressions and field name syntax
- Use proper field name syntax
- Create basic search-time field extractions with props.conf edits
- Steps for defining basic search-time field extractions with props.conf
- Add an EXTRACT field extraction stanza to props.conf
- Setting KV_MODE for search-time data
- Inline (props.conf only) search-time field extraction examples
- Add a new error code field
- Extract multiple fields using one regex
- Create a field from a subtoken
- Create advanced search-time field extractions with field transforms
- Steps for defining custom search-time field extractions that reference field transforms
- First, define a field transform
- Second, configure a field extraction and associate it with the field transform
- Examples of custom search-time field extractions using field transforms
- Configuring a field extraction that utilizes multiple field transforms
- Configuring delimiter-based field extraction
- Handling events with multivalued fields
- Disabling automatic search-time extraction for specific sources, source types, or hosts
Create and maintain search-time field extractions through configuration files
While you can set up and manage search-time field extractions via Splunk Manager, it's important to understand how they are handled at the props.conf and transforms.conf level, because those are the configuration files that the Field extractions and Field transformations pages in Manager read from and write to.
Many knowledge managers, especially those who have been using Splunk for some time, find it easier to manage their custom fields through configuration files, which can be used to add, maintain, and review libraries of custom field additions for their teams.
This topic shows you how you can:
- Set up basic "inline" search-time field extractions through edits to
props.conf. - Design more complex search-time field extractions through a combination of edits to
props.confandtransforms.conf.
Regular expressions and field name syntax
Splunk uses regular expressions, or regexes, to extract fields from event data. When you use the interactive field extractor (IFX), Splunk attempts to generate field-extracting regexes for you, but it can only create regular expressions that extract one field at a time from the events that match them.
On the other hand, when you set up field extractions manually through configuration files, you have to provide the regex yourself--but you can design them so that they extract two or more fields from the events that match them, if necessary.
For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test your regex by using it in a search with the rex search command. Splunk also maintains a list of useful third-party tools for writing and testing regular expressions.
Important: The capturing groups in your regex must identify field names that contain alpha-numeric characters or an underscore. See "When Splunk creates field names," above.
Use proper field name syntax
Splunk only accepts field names that contain alpha-numeric characters or an underscore:
- Valid characters for field names are a-z, A-Z, 0-9, or _ .
- Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's internal variables.
- International characters are not allowed.
Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted at search time, either by default or through a custom configuration:
1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).
2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscores and 0-9 characters from extracted fields.
You can disable key cleaning for a particular search-time field extraction by configuring it as an advanced REPORT extraction type, and then having the referenced field transform stanza include the setting CLEAN_KEYS=false. See below for more information about the REPORT extraction configuration.
Note: You cannot turn off key cleaning for basic EXTRACT (props.conf only) field extraction configurations.
Create basic search-time field extractions with props.conf edits
You can create basic search-time field extractions (field extractions that are defined entirely within props.conf, as opposed to extractions that reference field transforms in transforms.conf) by editing the props.conf configuration file. You can find props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want to make it easy to transfer your data customizations to other search servers.)
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
For more information on configuration files in general, see "About configuration files" in the Admin Manual.
Steps for defining basic search-time field extractions with props.conf
Basic search-time field extractions use the EXTRACT extraction configuration in props.conf. Each EXTRACT extraction stanza contains the regular expression that Splunk uses to extract one or more fields at search time, as well as other attributes that govern the manner in which those fields are extracted.
Follow these steps when you create a basic search-time field extraction:
1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from.
Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.
2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.
3. Follow the format for the EXTRACT field extraction type (defined in the next section) to create a field extraction stanza in props.conf that includes the host/source/sourcetype and regex that you have identified. Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
4. If your field value is a portion of a word, you must also add an entry to fields.conf. See the example "Create a field from a subtoken" below.
5. Restart Splunk for your changes to take effect.
Add an EXTRACT field extraction stanza to props.conf
Follow this format when adding an EXTRACT field extraction to props.conf:
[<spec>] EXTRACT-<class> = [<regular_expression>|<regular_expression> in <source_field>]
-
<spec>can be:-
<source type>, the source type of an event. -
host::<host>, where<host>is the host for an event. -
source::<source>, where<source>is the source for an event. -
rule::<rulename>, where<rulename>is the unique name of a source type classification rule. -
delayedrule::<rulename>, where<rulename>is a unique name of a delayed source type classification rule.
-
Note: rule and delayedrule are only considered as a last resort before generating a new source type based on the source that Splunk sees.
-
<class>is a unique literal string that identifies the namespace of the field (key) you're extracting.- Note:
<class>values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed.<class>values are not subject to key cleaning.
- Note:
- The
<regular_expression>is required to have named capturing groups; each group represents a different extracted field. When the <regular_expression> matches an event, the named capturing groups and their values are added to the event. - Use
<regular_expression> in <source_field>to match the regex against the values of a specific field. Otherwise it matches against_raw(all raw event data).- Note:
<src_field>is a field name, which means it must follow field name syntax. It can only contain alphanumeric characters (a-z, A-Z, and 0-9).
- Note:
- If your regex needs to end with
in <string>where<string>is not a field name, change the regex to end with[i]n <string>to ensure that Splunk doesn't try to match<string>to a field name.
Precedence rules for the EXTRACT field extraction type:
- For each field extraction, Splunk takes the configuration from the highest precedence configuration stanza.
- When there are multiple categories of matching
[<spec>]stanzas,[host::<host>]settings override[<sourcetype>]settings. -
[source::<source>]settings override both[host::<host>]and[<sourcetype>]settings. - Similarly, if a particular field extraction is specified in
../local/for a<spec>, it overrides that class in ../default/.
There's more to [<spec>] stanza precedence; see props.conf.spec for all the details.
Note: Unlike the procedure for configuring the default set of fields that Splunk extracts at index time, transforms.conf requires no DEST_KEY since nothing is being written to the index during search-time field extraction. Fields extracted at search time are not persisted in the index as keys
Splunk follows precedence rules when it runs search-time field extractions. It runs inline field extractions (EXTRACT-<class>) first, and then runs field extractions that reference field transforms (REPORT-<class>).
Setting KV_MODE for search-time data
You can use the KV_MODE attribute to specify the field/value extraction mode for your data. You can add KV_MODE to an EXTRACT or REPORT stanza. Its format is:
KV_MODE = [none|auto|multi|json|xml]
-
none: Disables field extraction for the source, source type, or host identified by the stanza name. You can use this setting to ensure that other regexes that you have created are not overridden by automatic field/value extraction for a particular source, source type, or host. You can also use this setting to increase search performance by disabling extraction for common but nonessential fields. We have some field extraction examples at the end of this topic that demonstrate the disabling of field extraction in different circumstances. -
auto: Extracts field/value pairs and separates them with equal signs. This is the default field extraction behavior if you do not include this attribute in your field extraction stanza. -
auto_escaped: Extracts field/value pairs and separates them with equal signs. In addition, this setting ensures that Splunk honors \" and \\ as escaped sequences within quoted values. For example:field="value with \"nested\" quotes". -
multi: This invokes themultikvsearch command, which extracts field values from table-formatted events. -
xml: Use this setting if you intend to use the field extraction stanza to extract fields from XML data. -
json: Use this setting if you intend to use the field extraction stanza to extract fields from JSON data. - The
xmlandjsonmodes will not extract any fields when used on data that isn't of the indicated format (XML or JSON).
Inline (props.conf only) search-time field extraction examples
Here are a set of examples of search-time custom field extraction, set up using props.conf only.
Add a new error code field
This example shows how to create a new "error code" field by configuring a field extraction in props.conf. The field can be identified by the occurrence of device_id= followed by a word within brackets and a text string terminating with a colon. The field should be extracted from events related to the testlog source type.
In props.conf, add:
[testlog] EXTRACT-errors = device_id=\[w+\](?<err_code>[^:]+)
Extract multiple fields using one regex
This is an example of a field extraction that pulls out five separate fields. You can then use these fields in concert with some event types to help you find port flapping events and report on them.
Here's a sample of the event data that the fields are being extracted from:
#%LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet9/16, changed state to downThe stanza in props.conf for the extraction looks like this:
[syslog] EXTRACT-port_flapping = Interface\s(?<interface>(?<media>[^\d]+)(?<slot>\d+)\/(?<port>\d+))\,\schanged \sstate\sto\s(?<port_status>up|down)
Note that five separate fields are extracted as named groups: interface, media, slot, port, and port_status.
The following two steps aren't required for field extraction--they show you what you might do with the extracted fields to find port flapping events and then report on them.
Use tags to define a couple of event types in eventtypes.conf:
[cisco_ios_port_down] search = "changed state to down" [cisco_ios_port_up] search = "changed state to up"
Finally, create a saved search in savedsearches.conf that ties much of the above together to find port flapping and report on the results:
[port flapping] search = eventtype=cisco_ios_port_down OR eventtype=cisco_ios_port_up starthoursago=3 | stats count by interface,host,port_status | sort -count
Create a field from a subtoken
You may run into problems if you are extracting a field value that is a subtoken--a part of a larger token. Tokens are chunks of event data that have been run through event processing prior to being indexed. During event processing, events are broken up into segments, and this is the point where tokens are created--each segment created is a token.
Tokens are never smaller than a complete word or number. For example, you may have the word foo123 in your event. If it has been run through event processing and indexing, it's a token, and it can be a value of a field. However, if your extraction pulls out the foo as a field value unto itself, you're extracting a subtoken. The problem is that while foo123 exists in the index, foo does not, which means that you'll likely get few results if you search on that subtoken, even though it may appear to be extracted correctly in your search results.
Because tokens cannot be smaller than individual "words" within strings, a field extraction of a subtoken (a part of a word) can cause problems because subtokens will not themselves be in the index, only the larger word of which they are a part.
If your field value is a smaller part of a token, you must configure props.conf as explained above. Then, add an entry to fields.conf:
[<fieldname>] INDEXED = False INDEXED_VALUE = False
- Fill in <fieldname> with the name of your field.
- For example,
[url]if you've configured a field named "url."
- For example,
- Set
INDEXEDandINDEXED_VALUEto false.- This tells Splunk that the value you're searching for is not a token in the index.
Note: As of release 4.3, you no longer need add this entry to fields.conf for cases where you are extracting a field's value from the value of a default field (such as host, source, sourcetype, or timestamp) that is not indexed (and therefore not tokenized).
For more information on the tokenization of event data, see "About segmentation" in the Getting Data In Manual.
Create advanced search-time field extractions with field transforms
While you can define most search-time field extractions entirely within props.conf, some advanced search-time field extractions reference an additional component called a field transform. This section shows you how to configure field transforms in transforms.conf.
Field transforms contain a field-extracting regular expression and other attributes that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf--they cannot stand alone.
Your search-time field extractions require a field transform component if you need to:
- Reuse the same field-extracting regular expression across multiple sources, source types, or hosts (in other words, configure one field transform for multiple field extractions). If you find yourself using the same regex to extract fields for different sources, source types, and hosts, you may want to set it up as a transform. Then, if you find that you need to update the regex, you only have to do so once, even though it is used more than one field extraction.
- Apply more than one field-extracting regular expression to the same source, source type, or host (in other words, apply multiple field transforms to the same field extraction). This is sometimes necessary in cases where the field or fields that you want to extract from a particular source/source type/host appear in two or more very different event patterns.
- Set up delimiter-based field extractions. Delimiter-based extractions come in handy when your event data presents field-value pairs (or just field values) that are separated by delimiters such as commas, colons, bars, line breaks, and tab spaces.
- Configure extractions for multivalued fields. When you do this, Splunk appends additional field values to the field as it finds them in the event data.
- Extract fields with names that begin with numbers or underscores. Ordinarily key cleaning removes leading numeric characters and underscores from field names, but you can configure your transform to turn this functionality off if necessary.
You can also configure transforms to:
- Extract fields from the values of another field (other than
raw_) by using theSOURCE_KEYattribute. - Manage the formatting of extracted fields, in cases where you are extracting multiple fields or are extracting both the field name and field value, by using the
FORMATattribute.
Both of these configurations can now be set up directly in the regular expression as well. See the "Define a field transform" section below for more information about how to do this.
NOTE: If you need to concatenate a set of regex extractions into a single field value, you can do this with the FORMAT attribute, but only if you set it up as an index-time extraction. For example, if you have a string like 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip address field value in the format 192.0.2.1. For more information, see "Configure index-time field extractions" in the Getting Data In Manual. However we DO NOT RECOMMEND that you make extensive changes to your set of indexed fields--do so sparingly if at all.
Steps for defining custom search-time field extractions that reference field transforms
Advanced search-time field extractions use the REPORT extraction configuration in props.conf. Each REPORT extraction stanza references a field transform that is defined separately in transforms.conf. The field transform contains the regular expression that Splunk uses to extract fields at search time, as well as other attributes that govern the way that the transform extracts those fields.
Follow these steps when you create an advanced search-time field extraction:
1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from. (Don't update props.conf yet.)
Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.
2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.
Note: If your event lists field/value pairs or just field values, you can create a delimiter-based field extraction that won't require a regex; see the information on the DELIMS attribute, below, for more information.)
3. Create a field transform in transforms.conf that utilizes this regex (or delimiter configuration). The transform can also define a source key and/or event value formatting.
Edit the transforms.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
4. Follow the format for the REPORT field extraction type (defined two sections down) to create a field extraction stanza in props.conf that uses the host, source, or source type that you identified in Step 1. If necessary, you can create additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
5. Restart Splunk for your changes to take effect.
First, define a field transform
Follow this format when defining a search-time field transform in transforms.conf:
[<unique_transform_stanza_name>] REGEX = <regular expression> FORMAT = <string> SOURCE_KEY = <string> DELIMS = <quoted string list> FIELDS = <quoted string list> MV_ADD = [true|false] CLEAN_KEYS = [true|false] KEEP_EMPTY_VALS = [true|false] CAN_OPTIMIZE = [true|false]
- The
<unique_transform_stanza_name>is required for all search-time transforms. Note:<unique_transform_stanza_name>values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed. They are not subject to key cleaning. -
REGEXis a regular expression that operates on your data to extract fields. It is required for all search-time field transforms unless you are setting up a delimiter-based transaction, in which case you use theDELIMSattribute instead (see the DELIMS attribute description, below).- Defaults to an empty string.
-
REGEXand theFORMATattribute:- Name-capturing groups in the
REGEXare extracted directly to fields, which means that you don't have to specifyFORMATfor simple field extraction cases. - If the
REGEXextracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping inFORMAT:
- Name-capturing groups in the
<_KEY_><string>,<_VAL_><string>.
- For example, the following are equivalent:
- Using
FORMAT:
- Using
REGEX = ([a-z]+)=([a-z]+)FORMAT = $1::$2
- Not using
FORMAT:
- Not using
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
- In both of these cases, Splunk applies the regular expression against the source text of an event repeatedly to extract all of the field/value combinations that it can identify.
-
FORMATis optional. Use it to specify the format of the field/value pair(s) that you are extracting. You don't need to specify theFORMATif you have a simpleREGEXwith name-capturing groups.- For search-time extractions, this is the pattern for the
FORMATfield:
- For search-time extractions, this is the pattern for the
FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*
- where:
field-name = [<string>|$<extracting-group-number>]field-value = [<string>|$<extracting-group-number>]
- Examples of search-time
FORMATusage:
- Examples of search-time
- 1.
FORMAT = firstfield::$1 secondfield::$2 thirdfield::other-value - 2.
FORMAT = $1::$2
- 1.
- If you configure
FORMATwith a variable field name (such as in example #2 just above, where$1represents the field name), Splunk applies the regular expression against the source event text repeatedly to match and extract as many field/value pairs as it can find.- Note: You cannot create concatenated fields with
FORMATat search time. This functionality is only available for index-time field transforms. -
FORMATdefaults to an empty string.
- Note: You cannot create concatenated fields with
-
SOURCE_KEYis optional. Use it to extract one or more values from the values of another field. You can use any field that is available at the time of the execution of this field extraction.- To configure
SOURCE_KEY, identify the field to which Splunk should apply the transform'sREGEX. - By default,
SOURCE_KEYis set to_raw, which means it is applied to the raw, unprocessed text of all events.
- To configure
-
DELIMSis optional. Use it in place ofREGEXwhen dealing with delimiter-based field extractions, where field values--or field/value pairs--are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.- Delimiters must be quoted with " " . You can use the backwards slash to escape double quotes around a value if necessary (\").
- IMPORTANT: If a value may contain an embedded unescaped double quote character, such as "foo"bar", we recommend that you use REGEX, not DELIMS.
- Each character in the delimiter string is used as a delimiter to split the event.
- If the event contains full delimiter-separated field/value pairs, you enter two sets of quoted delimiters for
DELIMS. The first set of quoted delimiters separates the field/value pairs. The second set of quoted delimiters separates the field name from its corresponding value. - If the events only contain delimiter-separated values (no field names), you use one set of quoted delimiters, to separate the values. Then you use the
FIELDSattribute to apply field names to the extracted values (seeFIELDSbelow). Alternatively, Splunk reads even tokens as field names and odd tokens as field values. - Splunk consumes consecutive delimiter characters unless you specify a list of field names.
- Defaults to empty string.
- This example of
DELIMSusage applies to an event where field/value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols:
-
[pipe_eq] -
DELIMS = "|", "="
-
-
FIELDSis used in conjunction withDELIMSwhen you are performing delimiter-based field extraction, but you only have field values to extract. UseFIELDSto provide field names for the extracted field values, in list format according to the order in which the values are extracted.- Note: If field names contain spaces or commas they must be quoted with " " (to escape, use \).
- Defaults to an empty string.
- Here's an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and then a space.
-
[commalist] -
DELIMS = ", " -
FIELDS = field1, field2, field3
-
-
MV_ADDis optional. Use it when you have events that have multiple occurrences of the same field but with different values and you want to keep each of the field's values.- When
MV_ADD = true, Splunk transforms fields that appear multiple times in an event with different values into multivalued fields (the field name appears once, the multiple values for the field follow the '=' sign). - When
MV_ADD=false, Splunk keeps the first value found for a field in an event and discards every subsequent value found for that same field in that same event. - Defaults to
false.
- When
-
CLEAN_KEYSis optional. It controls whether or not the system strips leading underscores and 0-9 characters from the keys (field names) it extracts (see the subtopic "Use proper field name syntax," above, for more information). "Key cleaning" is the practice of replacing any non-alphanumeric characters (characters other than those falling between the a-z, A-Z, and 0-9 ranges) in field names with underscores, as well as the stripping of leading underscores and 0-9 characters from field names.- Add
CLEAN_KEYS = falseto your transform if you need to keep your field names intact (no removal of leading underscores and/or 0-9 characters). - By default,
CLEAN_KEYSis always set totruefor transforms.
- Add
-
KEEP_EMPTY_VALSis optional. It controls whether Splunk keeps field/value pairs when the value is an empty string.- This option does not apply to field/value pairs that are generated by Splunk's autokv extraction (automatic field extraction) process. Autokv ignores field/value pairs with empty values.
- Defaults to
false.
-
CAN_OPTIMIZEis optional. It controls whether Splunk can optimize the extraction out (or, in other words, disable the extraction).- You might use this if you're running searches under a Search Mode setting that disables field discovery--it ensures that Splunk always discovers specific fields.
- Splunk only disables an extraction if it can determine that none of the fields identified by the extraction will ever be needed for the successful evaluation of a search.
- Note: This attribute should rarely be set to
false. - Defaults to
true.
Second, configure a field extraction and associate it with the field transform
When you're setting up a search-time field extraction in props.conf that is associated with a field transform, you use the REPORT field extraction class. Follow this format.
You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>, separated by commas. (For more information, see the example later in this topic.)
[<spec>] REPORT-<class> = <unique_transform_stanza_name1>, <unique_transform_stanza_name2>,...
-
<spec>can be:-
<sourcetype>, the source type of an event. -
host::<host>, where<host>is the host for an event. -
source::<source>, where<source>is the source for an event.
-
-
<class>is a unique literal string that identifies the namespace of the field (key) you're extracting. Note:<class>values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed.<class>values are not subject to key cleaning. -
<unique_transform_stanza_name>is the name of your field transform stanza from transforms.conf.
- Precedence rules for the REPORT field extraction class:
- For each class, Splunk takes the configuration from the highest precedence configuration block.
- If a particular class is specified for a
sourceand asourcetype, the class forsourcewins out. - Similarly, if a particular class is specified in
../local/for a<spec>, it overrides that class in ../default/.
If you have a set of transforms that must be run in a specific order and which belong to the same host, source, or source type, you can place them in a comma-separated list within the same props.conf stanza. Splunk will apply them in the specified order. For example, this sequence insures that the [yellow] field transform gets applied first, then [blue], and then [red]:
[source::color_logs] REPORT-colorchange = yellow, blue, red
If you need to change the order, rearrange the list.
Examples of custom search-time field extractions using field transforms
These examples present custom field extraction use cases that require you to configure one or more field transform stanzas in transforms.conf and then reference them in a props.conf field extraction stanza.
Configuring a field extraction that utilizes multiple field transforms
This example of search-time field transform setup demonstrates how:
- you can create transforms that pull varying field name/value pairs from events.
- you can create a field extraction that references two or more field transforms.
Let's say you have logs that contain multiple field name/field value pairs. While the fields vary from event to event, the pairs always appear in one of two formats.
The logs often come in this format:
[fieldName1=fieldValue1] [fieldName2=fieldValue2]However, at times they are more complicated, logging multiple name/value pairs as a list, in which case the format looks like:
[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2]Note that the list items are separated by commas, and that each fieldName is matched with a corresponding fieldValue. In these secondary cases you still want to pull out the field names and values so that the search results are
fieldName1=fieldValue1 fieldName2=fieldValue2
and so on.
To make things more clear, here's an example of an HTTP request event that combines both of the above formats.
[method=GET] [IP=10.1.1.1] [headerName=Host] [headerValue=www.example.com], [headerName=User-Agent] [headerValue=Mozilla], [headerName=Connection] [headerValue=close] [byteCount=255]You want to develop a single field extraction that would pull the following field/value pairs from that event:
method=GET IP=10.1.1.1 Host=www.example.com User-Agent=Mozilla Connection=close byteCount=255
Solution
To efficiently and reliably pull out both formats of field/value pairs, you'll want to design two different regexes that are optimized for each format. One regex will identify events with the first format and pull out all of the matching field/value pairs. The other regex will identify events with the other format and pull out those field/value pairs.
You then create two unique transforms in transforms.conf--one for each regex--and then unite them in the corresponding field extraction stanza in props.conf.
The first transform you add to transforms.conf catches the fairly conventional [fieldName1=fieldValue1] [fieldName2=fieldValue2] case.
[myplaintransform] REGEX=\[(?!(?:headerName|headerValue))([^\s\=]+)\=([^\]]+)\] FORMAT=$1::$2
The second transform (also added to transforms.conf) catches the slightly more complex [headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2] case:
[mytransform] REGEX= \[headerName\=(\w+)\],\s\[headerValue=([^\]]+)\] FORMAT= $1::$2
Both transforms use the <fieldName>::<fieldValue> FORMAT to match each field name in the event with its corresponding value. This setting in FORMAT enables Splunk to keep matching the regex against a matching event until every matching field/value combination is extracted.
Finally, this field extraction stanza, which you create in props.conf, references both of the field transforms:
[mysourcetype] KV_MODE=none REPORT-a = mytransform, myplaintransform
Note that, besides using multiple field transforms, the field extraction stanza also sets KV_MODE=none. This disables automatic field/value extraction for the identified source type (while letting your manually defined extractions continue). It ensures that these new regexes aren't overridden by automatic field extraction, and it also helps increase your search performance. (See the following subsection for more on disabling key/value extraction.)
Configuring delimiter-based field extraction
You can use the DELIMS attribute in field transforms to configure field extractions for events where field values or field/value pairs are separated by delimiters such as commas, colons, tab spaces, and more.
For example, say you have a recurring multiline event where a different field/value pair sits on a separate line, and each pair is separated by a colon followed by a tab space. Here's a sample event:
ComponentId: Application Server ProcessId: 5316 ThreadId: 00000000 ThreadName: P=901265:O=0:CT SourceId: com.ibm.ws.runtime.WsServerImpl ClassName: MethodName: Manufacturer: IBM Product: WebSphere Version: Platform 7.0.0.7 [BASE 7.0.0.7 cf070942.55] ServerName: sfeserv36Node01Cell\sfeserv36Node01\server1 TimeStamp: 2010-04-27 09:15:57.671000000 UnitOfWork: Severity: 3 Category: AUDIT PrimaryMessage: WSVR0001I: Server server1 open for e-business ExtendedMessage:
Now you could set up a bulky, wordy search-time field extraction stanza in props.conf that handles all of these fields:
[activityLog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
EXTRACT-ComponentId = ComponentId:\t(?.*)
EXTRACT-ProcessId = ProcessId:\t(?.*)
EXTRACT-ThreadId = ThreadId:\t(?.*)
EXTRACT-ThreadName = ThreadName:\t(?.*)
EXTRACT-SourceId = SourceId:\t(?.*)
EXTRACT-ClassName = ClassName:\t(?.*)
EXTRACT-MethodName = MethodName:\t(?.*)
EXTRACT-Manufacturer = Manufacturer:\t(?.*)
EXTRACT-Product = Product:\t(?.*)
EXTRACT-Version = Version:\t(?.*)
EXTRACT-ServerName = ServerName:\t(?.*)
EXTRACT-TimeStamp = TimeStamp:\t(?.*)
EXTRACT-UnitOfWork = UnitOfWork:\t(?.*)
EXTRACT-Severity = Severity:\t(?.*)
EXTRACT-Category = Category:\t(?.*)
EXTRACT-PrimaryMessage = PrimaryMessage:\t(?.*)
EXTRACT-ExtendedMessage = ExtendedMessage:\t(?.*)
But that solution is pretty over-the-top. Is there a more elegant way to handle it that would remove the need for all these EXTRACT lines? Yes!
Configure the following stanza in transforms.conf:
[activity_report] DELIMS = "\n", ":\t"
This states that the field/value pairs in the event are on separate lines ("\n"), and then specifies that the field name and field value on each line is separated by a colon and tab space (":\t").
To complete this configuration, rewrite the wordy props.conf stanza mentioned above as:
[activitylog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
REPORT-activity = activity_report
These two brief configurations will extract the same set of fields as before, but they leave less room for error and are more flexible.
Handling events with multivalued fields
You can use the MV_ADD attribute to extract fields in situations where the same field is used more than once in an event, but has a different value each time. Ordinarily, Splunk only extracts the first occurrence of a field in an event; every subsequent occurrence is discarded. But when MV_ADD is set to true in transforms.conf, Splunk treats the field like a multivalue field and extracts each unique field/value pair in the event.
Say you have a set of events that look like this:
event1.epochtime=1282182111 type=type1 value=value1 type=type3 value=value3 event2.epochtime=1282182111 type=type2 value=value4 type=type3 value=value5 type=type4 value=value6
See how the type and value fields are repeated several times in each event? What you'd like to do is search type=type3 and have both of these events be returned. Or you'd like to run a count(type) report on these two events that returns 5.
So, what you want to do is create a custom multivalue extraction of the type field for these events. Here's how you would set up your transforms.conf and props.conf files to enable it:
First, transforms.conf:
[mv-type] REGEX = type=(?<type>\s+) MV_ADD = true
Then, in props.conf for your sourcetype or source, set:
REPORT-type = mv-typeDisabling automatic search-time extraction for specific sources, source types, or hosts
You can disable automatic search-time field extraction for specific sources, source types, or hosts through edits in props.conf. Add KV_MODE = none for the appropriate [<spec>] in props.conf.
Note: Custom field extractions set up manually via the configuration files or Manager will still be processed for the affected source, source type, or host when KV_MODE = none.
[<spec>] KV_MODE = none
<spec> can be:
-
<sourcetype>- an event source type. -
host::<host>, where<host>is the host for an event. -
source::<source>, where<source>is the source for an event.
This documentation applies to the following versions of Splunk: 5.0 , 5.0.1 , 5.0.2 , 5.0.3 View the Article History for its revisions.
This seems to imply, though not outright state, that the "DELIMS" capability will handle quoted strings:
---
IMPORTANT: If a value may contain an embedded unescaped double quote character, such as "foo"bar", use REGEX, not DELIMS. An escaped double quote (\") is ok.
---
It would be nice if it was explicitly address whether or not Splunk automatically detects quoted strings during this delimited automatic field extraction.