Knowledge Manager Manual

 


Data interpretation: Fields and field extractions

Create and maintain search-time field extractions through configuration files

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Create and maintain search-time field extractions through configuration files

While you can set up and manage search-time field extractions via Splunk Manager, it's important to understand how they are handled at the props.conf and transforms.conf level, because those are the configuration files that the Field extractions and Field transformations pages in Manager read from and write to.

Many knowledge managers, especially those who have been using Splunk for some time, find it easier to manage their custom fields through configuration files, which can be used to add, maintain, and review libraries of custom field additions for their teams.

This topic shows you how you can:

  • Set up basic "inline" search-time field extractions through edits to props.conf.
  • Design more complex search-time field extractions through a combination of edits to props.conf and transforms.conf.

Regular expressions and field name syntax

Splunk uses regular expressions, or regexes, to extract fields from event data. When you use the interactive field extractor (IFX), Splunk attempts to generate field-extracting regexes for you, but it can only create regular expressions that extract one field at a time from the events that match them.

On the other hand, when you set up field extractions manually through configuration files, you have to provide the regex yourself--but you can design them so that they extract two or more fields from the events that match them, if necessary.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test your regex by using it in a search with the rex search command. Splunk also maintains a list of useful third-party tools for writing and testing regular expressions.

Important: The capturing groups in your regex must identify field names that contain alpha-numeric characters or an underscore. See "When Splunk creates field names," above.

Use proper field name syntax

Splunk only accepts field names that contain alpha-numeric characters or an underscore:

  • Valid characters for field names are a-z, A-Z, 0-9, or _ .
  • Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's internal variables.
  • International characters are not allowed.

Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted at search time, either by default or through a custom configuration:

1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).

2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscores and 0-9 characters from extracted fields.

You can disable key cleaning for a particular search-time field extraction by configuring it as an advanced REPORT extraction type, and then having the referenced field transform stanza include the setting CLEAN_KEYS=false. See below for more information about the REPORT extraction configuration.

Note: You cannot turn off key cleaning for basic EXTRACT (props.conf only) field extraction configurations.

Create basic search-time field extractions with props.conf edits

You can create basic search-time field extractions (field extractions that are defined entirely within props.conf, as opposed to extractions that reference field transforms in transforms.conf) by editing the props.conf configuration file. You can find props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want to make it easy to transfer your data customizations to other search servers.)

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

For more information on configuration files in general, see "About configuration files" in the Admin manual.

Steps for defining basic search-time field extractions with props.conf

Basic search-time field extractions use the EXTRACT extraction configuration in props.conf. Each EXTRACT extraction stanza contains the regular expression that Splunk uses to extract one or more fields at search time, as well as other attributes that govern the manner in which those fields are extracted.

Follow these steps when you create a basic search-time field extraction:

1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from.

Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.

2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.

3. Follow the format for the EXTRACT field extraction type (defined in the next section) to create a field extraction stanza in props.conf that includes the host/source/sourcetype and regex that you have identified. Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

4. If your field value is a portion of a word--or if it is a value of a default field such as host, source, sourcetype, timestamp, or linecount--you must also add an entry to fields.conf. See the example "Create a field from a subtoken", below.

5. Restart Splunk for your changes to take effect.

Add an EXTRACT field extraction stanza to props.conf

Follow this format when adding an EXTRACT field extraction stanza to props.conf:

[<spec>]
EXTRACT-<name> = <regular expression>
  • <spec> can be:
    • <source type>, the source type of an event.
    • host::<host>, where <host> is the host for an event.
    • source::<source>, where <source> is the source for an event.
  • <name> is any unique name you want to give to your stanza to identify its namespace.
  • <regular_expression> is a regex that recognizes one or more custom field values in the events that it matches. The regex is required to have named capturing groups; each group represents a different extracted field.

Precedence rules for EXTRACT stanzas:

  • For each field extraction, Splunk takes the configuration from the highest precedence configuration stanza (see precedence rules in props.conf.spec).
  • If a particular class is specified for a source and a source type, the class for source wins out.
  • Similarly, if a particular field extraction is specified in ../local/ for a <spec>, it overrides that class in ../default/.

Note: Unlike the procedure for configuring the default set of fields that Splunk extracts at index time, transforms.conf requires no DEST_KEY since nothing is being written to the index during search-time field extraction. Fields extracted at search time are not persisted in the index as keys

Splunk follows precedence rules when it runs search-time field extractions. It runs inline field extractions (EXTRACT-<name>) first, and then runs field extractions that reference field transforms (REPORT-<name>).

Inline (props.conf only) search-time field extraction examples

Here are a set of examples of search-time custom field extraction, set up using props.conf only.

Add a new error code field

This example shows how to create a new "error code" field by configuring a field extraction in props.conf. The field can be identified by the occurrence of device_id= followed by a word within brackets and a text string terminating with a colon. The field should be extracted from events related to the testlog source type.

In props.conf, add:

[testlog]
EXTRACT-<errors> = device_id=\[w+\](?<err_code>[^:]+)

Extract multiple fields using one regex

This is an example of a field extraction that pulls out five separate fields. You can then use these fields in concert with some event types to help you find port flapping events and report on them.

Here's a sample of the event data that the fields are being extracted from:

#%LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet9/16, changed state to down

The stanza in props.conf for the extraction looks like this:

[syslog]
EXTRACT-<port_flapping> = Interface\s(?<interface>(?<media>[^\d]+)(?<slot>\d+)\/(?<port>\d+))\,\schanged
\sstate\sto\s(?<port_status>up|down)

Note that five separate fields are extracted as named groups: interface, media, slot, port, and port_status.

The following two steps aren't required for field extraction--they show you what you might do with the extracted fields to find port flapping events and then report on them.

Use tags to define a couple of event types in eventtypes.conf:

[cisco_ios_port_down]
search = "changed state to down"

[cisco_ios_port_up]
search = "changed state to up"

Finally, create a saved search in savedsearches.conf that ties much of the above together to find port flapping and report on the results:

[port flapping]
search = eventtype=cisco_ios_port_down OR eventtype=cisco_ios_port_up starthoursago=3 | stats count by 
interface,host,port_status | sort -count

Create a field from a subtoken, or from a default field value

You may run into problems if you are extracting a field value that is not a token or is part of a token. Most of these cases break down into two types:

  • Field extractions of subtokens (where the value being extracted is a part of a larger word, in other words). For example, your field's value is 123, but in your event it occurs as fool123.
  • Search time field extractions from default field values (such as values of host, source, sourcetype or timestamp values). For example, you have an event with host=inputbackup-i-z5gl22gg2.prod and you want to add the field extraction chef_role=inputbackup to that event.

Tokens are chunks of event data that have been run through event processing prior to their being indexed. During event processing, events are broken up into segments, and this is the point where tokens are created--each segment created is a token. Because tokens cannot be smaller than individual "words" within strings, a field extraction of a subtoken (a part of a word) can cause problems because while the complete tokens are in the index, subtokens extracted from them are not. This means that a search on that subtoken field value will likely yield no results.

In a similar way, extractions from values of default fields are also a problem because default fields and their values are extracted prior to the segmentation step of event processing, which means they are not tokenized at all. Even if you extract the entire field value, of a host, source, or source type, you may have problems searching on it because it is not an indexed token. To continue the example used above, you may find that you will see the chef_role field extracted correctly when you search on <code>host=inputbackup-i-z5gl22gg2.prod. But if you search on host=inputbackup-i-z5gl22gg2.prod chef_role=inputbackup, no results are returned.

For these types of field extractions to perform as expected, you must configure props.conf as explained above. Then add an entry to fields.conf:

[<fieldname>]
INDEXED = False
INDEXED_VALUE = False
  • Fill in <fieldname> with the name of your field.
    • For example, [url] if you've configured a field named "url."
  • Set INDEXED and INDEXED_VALUE to false.
    • This tells Splunk that the value you're searching for is not a token in the index.

For another take on this issue, see the Splunk blog entry "Cannot search based on an extracted field" by Ledion Bitincka.

For more information on the tokenization of event data, see "Configure segmentation to manage disk usage" in the Admin manual.

Create advanced search-time field extractions with field transforms

While you can define most search-time field extractions entirely within props.conf, some advanced search-time field extractions reference an additional component called a field transform. This section shows you how to configure field transforms in transforms.conf.

Field transforms contain a field-extracting regular expression and other attributes that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf--they cannot stand alone.

Your search-time field extractions require a field transform component if you need to:

  • Reuse the same field-extracting regular expression across multiple sources, source types, or hosts (in other words, configure one field transform for multiple field extractions). If you find yourself using the same regex to extract fields for different sources, source types, and hosts, you may want to set it up as a transform. Then, if you find that you need to update the regex, you only have to do so once, even though it is used more than one field extraction.
  • Apply more than one field-extracting regular expression to the same source, source type, or host (in other words, apply multiple field transforms to the same field extraction). This is sometimes necessary in cases where the field or fields that you want to extract from a particular source/source type/host appear in two or more very different event patterns.
  • Set up delimiter-based field extractions. Delimiter-based extractions come in handy when your event data presents field-value pairs (or just field values) that are separated by delimiters such as commas, colons, bars, line breaks, and tab spaces.
  • Configure extractions for multivalued fields. When you do this, Splunk appends additional field values to the field as it finds them in the event data.
  • Extract fields with names that begin with numbers or underscores. Ordinarily key cleaning removes leading numeric characters and underscores from field names, but you can configure your transform to turn this functionality off if necessary.

You can also configure transforms to:

  • Extract fields from the values of another field (other than raw_) by using the SOURCE_KEY attribute.
  • Apply special formatting to the information being extracted, by using the FORMAT attribute.

Both of these configurations can now be set up directly in the regex, however; see the "Define a field transform" section below for more information about how to do this.

NOTE: If you need to concatenate a set of regex extractions into a single field value, you can do this with the FORMAT attribute, but only if you set it up as an index-time extraction. For example, if you have a string like 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip address field value in the format 192.0.2.1. For more information, see "Configure index-time field extractions" in the Getting Data In Manual. However we DO NOT RECOMMEND that you make extensive changes to your set of indexed fields--do so sparingly if at all.

Steps for defining custom search-time field extractions that reference field transforms

Advanced search-time field extractions use the REPORT extraction configuration in props.conf. Each REPORT extraction stanza references a field transform that is defined separately in transforms.conf. The field transform contains the regular expression that Splunk uses to extract fields at search time, as well as other attributes that govern the way that the transform extracts those fields.

Follow these steps when you create an advanced search-time field extraction:

1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from. (Don't update props.conf yet.)

Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.

2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.

Note: If your event lists field/value pairs or just field values, you can create a delimiter-based field extraction that won't require a regex; see the information on the DELIMS attribute, below, for more information.)

3. Create a field transform in transforms.conf that utilizes this regex (or delimiter configuration). The transform can also define a source key and/or event value formatting.

Edit the transforms.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

4. Follow the format for the REPORT field extraction type (defined two sections down) to create a field extraction stanza in props.conf that uses the host, source, or source type that you identified in Step 1. If necessary, you can create additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.

Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

5. Restart Splunk for your changes to take effect.

First, define a field transform

Follow this format when defining a search-time field transform in transforms.conf:

[<unique_transform_stanza_name>]
REGEX = <regular expression>
FORMAT = <string>
SOURCE_KEY = <string>
DELIMS = <quoted string list>
FIELDS = <quoted string list>
MV_ADD = [true|false]
CLEAN_KEYS = [true|false]
KEEP_EMPTY_VALS = [true|false]
CAN_OPTIMIZE = [true|false]
  • The <unique_transform_stanza_name> is required for all search-time transforms.
  • REGEX is a regular expression that operates on your data to extract fields. It is required for all search-time field transforms unless you are setting up a delimiter-based transaction, in which case you use DELIMS instead.
  • REGEX and the FORMAT attribute:
    • Name-capturing groups in the REGEX are extracted directly to fields, which means that you don't have to specify FORMAT for simple field extraction cases.
    • If the REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping in FORMAT:
<_KEY_><string>, <_VAL_><string>.
  • For example, the following are equivalent:
Using FORMAT:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
Not using FORMAT:
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
  • FORMAT is optional. Use it to specify the format of the field/value pair(s) that you are extracting, including any field names or values you want to add. You don't need to specify the FORMAT if you have a simple REGEX with name-capturing groups.
    • For search-time extractions, this is the pattern for the FORMAT field:
FORMAT = <field-name>::<field-value>( <field-name>::<field-value>)*
where:
field-name = [<string>|$<extracting-group-number>]
field-value = [<string>|$<extracting-group-number>]
Examples of search-time FORMAT usage:
1. FORMAT = first::$1 second::$2 third::other-value
2. FORMAT = $1::$2 $4::$3
  • Note: You cannot create concatenated fields with FORMAT at search time. This functionality is only available for index-time field transforms.
  • SOURCE_KEY is optional. Use it to identify a field whose values the transform REGEX should be applied to.
    • You can use this attribute to extract one or more values from the values of another field. You can use any field that is available at the time of the execution of this field extraction.
    • By default, SOURCE_KEY is set to _raw, which means it is applied to the entire event.
  • DELIMS is optional. Use it in place of REGEX when dealing with delimiter-based field extractions, where field values--or field/value pairs--are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.
    • Delimiters must be quoted with " " (use \ to escape).
    • Each character in the delimiter string is used as a delimiter to split the event.
    • If the event contains full delimiter-separated field/value pairs, you enter two sets of quoted delimiters for DELIMS. The first set of quoted delimiters separates the field/value pairs. The second set of quoted delimiters separates the field name from its corresponding value.
    • If the events only contain delimiter-separated values (no field names), you use one set of quoted delimiters, to separate the values. Then you use the FIELDS attribute to apply field names to the extracted values (see FIELDS below). Alternately, Splunk reads even tokens as field names and odd tokens as field values.
    • Splunk consumes consecutive delimiter characters unless you specify a list of field names.
    • IMPORTANT: If a value may contain an embedded unescaped double quote character, such as "foo"bar", use REGEX, not DELIMS. An escaped double quote (\") is ok.
    • Defaults to empty string.
    • This example of DELIMS usage applies to an event where field/value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols:
[pipe_eq]
DELIMS = "|", "="
  • FIELDS is used in conjunction with DELIMS when you are performing delimiter-based field extraction, but you only have field values to extract. Use FIELDS to provide field names for the extracted field values, in list format according to the order in which the values are extracted.
    • Note: If field names contain spaces or commas they must be quoted with " " (to escape, use \).
    • Defaults to empty string.
    • Here's an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and then a space.
[commalist]
DELIMS = ", "
FIELDS = field1, field2, field3
  • MV_ADD is optional. Use it when you have events that repeat the same field but with different values. When MV_ADD = true, Splunk makes any field that is used more than once in an event (but with different values) a multivalued field and appends each value it finds for that field.
    • When set to false, Splunk keeps the first value found for a field in an event and discards every subsequent value found for that same field in that same event.
    • Defaults to false.
  • CLEAN_KEYS is optional. It controls whether or not the system strips leading underscores and 0-9 characters from the field names it extracts (see the subtopic "Use proper field name syntax," above, for more information).
    • Add CLEAN_KEYS = false to your transform if you need to extract field names (keys) with leading underscores and/or 0-9 characters.
    • By default, CLEAN_KEYS is always set to true for transforms.
  • KEEP_EMPTY_VALS is optional. It controls whether Splunk keeps field/value pairs when the value is an empty string.
    • This option does not apply to field/value pairs that are generated by Splunk's autokv extraction. Autokv ignores field/value pairs with empty values.
    • Defaults to false.
  • CAN_OPTIMIZE is optional. It controls whether Splunk can optimize the extraction out (or, in other words, disable the extraction).
    • You might use this when you have field discovery turned off--it ensures that certain fields are always discovered.
    • Splunk only disables an extraction if it can determine that none of the fields identified by the extraction will ever be needed for the successful evaluation of a search.
    • Note: This attribute should rarely be set to false.
    • Defaults to true.

Second, configure a field extraction and associate it with the field transform

When you're setting up a search-time field extraction in props.conf that is associated with a field transform, you use the REPORT extraction class. Follow this format.

You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>, separated by commas. (For more information, see the example later in this topic.)

[<spec>]
REPORT-<name> = <unique_transform_stanza_name>
  • <spec> can be:
    • <sourcetype>, the source type of an event.
    • host::<host>, where <host> is the host for an event.
    • source::<source>, where <source> is the source for an event.
  • <name> is any name you want to give your stanza to identify its namespace.
  • <unique_transform_stanza_name> is the name of your field transform stanza from transforms.conf.
  • Precedence rules for the REPORT class:
    • For each class, Splunk takes the configuration from the highest precedence configuration block.
    • If a particular class is specified for a source and a sourcetype, the class for source wins out.
    • Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides that class in ../default/.

If you have a set of transforms that must be run in a specific order and which belong to the same host, source, or source type, you can place them in a comma-separated list within the same props.conf stanza. Splunk will apply them in the specified order. For example, this sequence insures that the [yellow] field transform gets applied first, then [blue], and then [red]:

[source::color_logs]
REPORT-colorchange = yellow, blue, red

If you need to change the order, rearrange the list.

Examples of custom search-time field extractions using field transforms

These examples present custom field extraction use cases that require you to configure one or more field transform stanzas in transforms.conf and then reference them in a props.conf field extraction stanza.

Configuring a field extraction that utilizes multiple field transforms

This example of search-time field transform setup demonstrates how:

  • you can create transforms that pull varying field name/value pairs from events.
  • you can create a field extraction that references two or more field transforms.

Let's say you have logs that contain multiple field name/field value pairs. While the fields vary from event to event, the pairs always appear in one of two formats.

The logs often come in this format:

[fieldName1=fieldValue1] [fieldName2=fieldValue2]

However, at times they are more complicated, logging multiple name/value pairs as a list, in which case the format looks like:

[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2]

Note that the list items are separated by commas, and that each fieldName is matched with a corresponding fieldValue. In these secondary cases you still want to pull out the field names and values so that the search results are

fieldName1=fieldValue1
fieldName2=fieldValue2

and so on.

To make things more clear, here's an example of an HTTP request event that combines both of the above formats.

[method=GET] [IP=10.1.1.1] [headerName=Host] [headerValue=www.example.com], [headerName=User-Agent] [headerValue=Mozilla], [headerName=Connection] [headerValue=close] [byteCount=255]

You want to develop a single field extraction that would pull the following field/value pairs from that event:

method=GET
IP=10.1.1.1
Host=www.example.com
User-Agent=Mozilla
Connection=close
byteCount=255

Solution

To efficiently and reliably pull out both formats of field/value pairs, you'll want to design two different regexes that are optimized for each format. One regex will identify events with the first format and pull out all of the matching field/value pairs. The other regex will identify events with the other format and pull out those field/value pairs.

You then create two unique transforms in transforms.conf--one for each regex--and then unite them in the corresponding field extraction stanza in props.conf.

The first transform you add to transforms.conf catches the fairly conventional <code>[fieldName1=fieldValue1] [fieldName2=fieldValue2]</code> case.

[myplaintransform]
REGEX=\[(?!(?:headerName|headerValue))([^\s\=]+)\=([^\]]+)\]
FORMAT=$1::$2

The second transform (also added to transforms.conf) catches the slightly more complex [headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2] case:

[mytransform]
REGEX= \[headerName\=(\w+)\],\s\[headerValue=([^\]]+)\]
FORMAT= $1::$2

Both transforms use the <fieldName>::<fieldValue> FORMAT to match each field name in the event with its corresponding value. This setting in FORMAT enables Splunk to keep matching the regex against a matching event until every matching field/value combination is extracted.

Finally, this field extraction stanza, which you create in props.conf, references both of the field transforms:

[mysourcetype]
KV_MODE=none
REPORT-a = mytransform, myplaintransform

Note that, besides using multiple field transforms, the field extraction stanza also sets KV_MODE=none. This disables automatic field/value extraction for the identified source type (while letting your manually defined extractions continue). It ensures that these new regexes aren't overridden by automatic field extraction, and it also helps increase your search performance. (See the following subsection for more on disabling key/value extraction.)

Configuring delimiter-based field extraction

You can use the DELIMS attribute in field transforms to configure field extractions for events where field values or field/value pairs are separated by delimiters such as commas, colons, tab spaces, and more.

For example, say you have a recurring multiline event where a different field/value pair sits on a separate line, and each pair is separated by a colon followed by a tab space. Here's a sample event:

ComponentId:     Application Server
ProcessId:   5316
ThreadId:    00000000
ThreadName:  P=901265:O=0:CT
SourceId:    com.ibm.ws.runtime.WsServerImpl
ClassName:   
MethodName:  
Manufacturer:    IBM
Product:     WebSphere
Version:     Platform 7.0.0.7 [BASE 7.0.0.7 cf070942.55]
ServerName:  sfeserv36Node01Cell\sfeserv36Node01\server1
TimeStamp:   2010-04-27 09:15:57.671000000
UnitOfWork:  
Severity:    3
Category:    AUDIT
PrimaryMessage:  WSVR0001I: Server server1 open for e-business
ExtendedMessage: 

Now you could set up a bulky, wordy search-time field extraction stanza in props.conf that handles all of these fields:

[activityLog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
EXTRACT-ComponentId = ComponentId:\t(?.*)
EXTRACT-ProcessId = ProcessId:\t(?.*)
EXTRACT-ThreadId = ThreadId:\t(?.*)
EXTRACT-ThreadName = ThreadName:\t(?.*)
EXTRACT-SourceId = SourceId:\t(?.*)
EXTRACT-ClassName = ClassName:\t(?.*)
EXTRACT-MethodName = MethodName:\t(?.*)
EXTRACT-Manufacturer = Manufacturer:\t(?.*)
EXTRACT-Product = Product:\t(?.*)
EXTRACT-Version = Version:\t(?.*)
EXTRACT-ServerName = ServerName:\t(?.*)
EXTRACT-TimeStamp = TimeStamp:\t(?.*)
EXTRACT-UnitOfWork = UnitOfWork:\t(?.*)
EXTRACT-Severity = Severity:\t(?.*)
EXTRACT-Category = Category:\t(?.*)
EXTRACT-PrimaryMessage = PrimaryMessage:\t(?.*)
EXTRACT-ExtendedMessage = ExtendedMessage:\t(?.*)

But that solution is pretty over-the-top. Is there a more elegant way to handle it that would remove the need for all these EXTRACT lines? Yes!

Configure the following stanza in transforms.conf:

[activity_report]
DELIMS = "\n", ":\t"

This states that the field/value pairs in the event are on separate lines ("\n"), and then specifies that the field name and field value on each line is separated by a colon and tab space (":\t").

To complete this configuration, rewrite the wordy props.conf stanza mentioned above as:

[activitylog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
REPORT-activity = activity_report

These two brief configurations will extract the same set of fields as before, but they leave less room for error and are more flexible.

Handling events with multivalued fields

You can use the MV_ADD attribute to extract fields in situations where the same field is used more than once in an event, but has a different value each time. Ordinarily, Splunk only extracts the first occurrence of a field in an event; every subsequent occurrence is discarded. But when MV_ADD is set to true in transforms.conf, Splunk treats the field like a multivalue field and saves extracts each unique field/value pair in the event.

Say you have a set of events that look like this:

event1.epochtime=1282182111 type=type1 value=value1 type=type3 value=value3
event2.epochtime=1282182111 type=type2 value=value4 type=type3 value=value5 type=type4 value=value6

See how the type and value fields are repeated several times in each event? What you'd like to do is search type=type3 and have both of these events be returned. Or you'd like to run a count(type) report on these two events that returns 5.

So, what you want to do is create a custom multivalue extraction of the type field for these events. Here's how you would set up your transforms.conf and props.conf files to enable it:

First, transforms.conf:

[mv-type]
REGEX = type=(?<type>\s+)
MV_ADD = true

Then, in props.conf for your sourcetype or source, set:

REPORT-type = mv-type

Disabling automatic search-time extraction for specific sources, source types, or hosts

You can disable automatic search-time field extraction for specific sources, source types, or hosts through edits in props.conf. Add KV_MODE = none for the appropriate [<spec>] in props.conf.

Note: Custom field extractions set up manually via the configuration files or Manager will still be processed for the affected source, source type, or host when KV_MODE = none.

[<spec>]
KV_MODE = none

<spec> can be:

  • <sourcetype> - an event source type.
  • host::<host>, where <host> is the host for an event.
  • source::<source>, where <source> is the source for an event.

This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 View the Article History for its revisions.


Comments

How about csv files with quoted text? e.g. ,"some, text",some unquoted text,"a field
with a carriage return",99

Chaospixel
July 25, 2011

You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!