Create and maintain search-time field extractions through configuration files
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
- Regular expressions and field name syntax
- Use proper field name syntax
- Create basic search-time field extractions with props.conf edits
- Steps for defining basic search-time field extractions with props.conf
- Add an EXTRACT field extraction stanza to props.conf
- Inline (props.conf only) search-time field extraction examples
- Add a new error code field
- Extract multiple fields using one regex
- Create a field from a subtoken, or from a default field value
- Create advanced search-time field extractions with field transforms
- Steps for defining custom search-time field extractions that reference field transforms
- First, define a field transform
- Second, configure a field extraction and associate it with the field transform
- Examples of custom search-time field extractions using field transforms
- Configuring a field extraction that utilizes multiple field transforms
- Configuring delimiter-based field extraction
- Handling events with multivalued fields
- Disabling automatic search-time extraction for specific sources, source types, or hosts
Create and maintain search-time field extractions through configuration files
While you can set up and manage search-time field extractions via Splunk Manager, it's important to understand how they are handled at the props.conf and transforms.conf level, because those are the configuration files that the Field extractions and Field transformations pages in Manager read from and write to.
Many knowledge managers, especially those who have been using Splunk for some time, find it easier to manage their custom fields through configuration files, which can be used to add, maintain, and review libraries of custom field additions for their teams.
This topic shows you how you can:
- Set up basic "inline" search-time field extractions through edits to
props.conf. - Design more complex search-time field extractions through a combination of edits to
props.confandtransforms.conf.
Regular expressions and field name syntax
Splunk uses regular expressions, or regexes, to extract fields from event data. When you use the interactive field extractor (IFX), Splunk attempts to generate field-extracting regexes for you, but it can only create regular expressions that extract one field at a time from the events that match them.
On the other hand, when you set up field extractions manually through configuration files, you have to provide the regex yourself--but you can design them so that they extract two or more fields from the events that match them, if necessary.
For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test your regex by using it in a search with the rex search command. Splunk also maintains a list of useful third-party tools for writing and testing regular expressions.
Important: The capturing groups in your regex must identify field names that contain alpha-numeric characters or an underscore. See "When Splunk creates field names," above.
Use proper field name syntax
Splunk only accepts field names that contain alpha-numeric characters or an underscore:
- Valid characters for field names are a-z, A-Z, 0-9, or _ .
- Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's internal variables.
- International characters are not allowed.
Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted at search time, either by default or through a custom configuration:
1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).
2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscores and 0-9 characters from extracted fields.
You can disable key cleaning for a particular search-time field extraction by configuring it as an advanced REPORT extraction type, and then having the referenced field transform stanza include the setting CLEAN_KEYS=false. See below for more information about the REPORT extraction configuration.
Note: You cannot turn off key cleaning for basic EXTRACT (props.conf only) field extraction configurations.
Create basic search-time field extractions with props.conf edits
You can create basic search-time field extractions (field extractions that are defined entirely within props.conf, as opposed to extractions that reference field transforms in transforms.conf) by editing the props.conf configuration file. You can find props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want to make it easy to transfer your data customizations to other search servers.)
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
For more information on configuration files in general, see "About configuration files" in the Admin manual.
Steps for defining basic search-time field extractions with props.conf
Basic search-time field extractions use the EXTRACT extraction configuration in props.conf. Each EXTRACT extraction stanza contains the regular expression that Splunk uses to extract one or more fields at search time, as well as other attributes that govern the manner in which those fields are extracted.
Follow these steps when you create a basic search-time field extraction:
1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from.
Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.
2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.
3. Follow the format for the EXTRACT field extraction type (defined in the next section) to create a field extraction stanza in props.conf that includes the host/source/sourcetype and regex that you have identified. Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
4. If your field value is a portion of a word--or if it is a value of a default field such as host, source, sourcetype, timestamp, or linecount--you must also add an entry to fields.conf. See the example "Create a field from a subtoken", below.
5. Restart Splunk for your changes to take effect.
Add an EXTRACT field extraction stanza to props.conf
Follow this format when adding an EXTRACT field extraction stanza to props.conf:
[<spec>] EXTRACT-<name> = <regular expression>
-
<spec>can be:-
<source type>, the source type of an event. -
host::<host>, where<host>is the host for an event. -
source::<source>, where<source>is the source for an event.
-
-
<name>is any unique name you want to give to your stanza to identify its namespace. -
<regular_expression>is a regex that recognizes one or more custom field values in the events that it matches. The regex is required to have named capturing groups; each group represents a different extracted field.
Precedence rules for EXTRACT stanzas:
- For each field extraction, Splunk takes the configuration from the highest precedence configuration stanza (see precedence rules in
props.conf.spec). - If a particular class is specified for a
sourceand asource type, the class forsourcewins out. - Similarly, if a particular field extraction is specified in
../local/for a<spec>, it overrides that class in ../default/.
Note: Unlike the procedure for configuring the default set of fields that Splunk extracts at index time, transforms.conf requires no DEST_KEY since nothing is being written to the index during search-time field extraction. Fields extracted at search time are not persisted in the index as keys
Splunk follows precedence rules when it runs search-time field extractions. It runs inline field extractions (EXTRACT-<name>) first, and then runs field extractions that reference field transforms (REPORT-<name>).
Inline (props.conf only) search-time field extraction examples
Here are a set of examples of search-time custom field extraction, set up using props.conf only.
Add a new error code field
This example shows how to create a new "error code" field by configuring a field extraction in props.conf. The field can be identified by the occurrence of device_id= followed by a word within brackets and a text string terminating with a colon. The field should be extracted from events related to the testlog source type.
In props.conf, add:
[testlog] EXTRACT-<errors> = device_id=\[w+\](?<err_code>[^:]+)
Extract multiple fields using one regex
This is an example of a field extraction that pulls out five separate fields. You can then use these fields in concert with some event types to help you find port flapping events and report on them.
Here's a sample of the event data that the fields are being extracted from:
#%LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet9/16, changed state to down
The stanza in props.conf for the extraction looks like this:
[syslog] EXTRACT-<port_flapping> = Interface\s(?<interface>(?<media>[^\d]+)(?<slot>\d+)\/(?<port>\d+))\,\schanged \sstate\sto\s(?<port_status>up|down)
Note that five separate fields are extracted as named groups: interface, media, slot, port, and port_status.
The following two steps aren't required for field extraction--they show you what you might do with the extracted fields to find port flapping events and then report on them.
Use tags to define a couple of event types in eventtypes.conf:
[cisco_ios_port_down] search = "changed state to down" [cisco_ios_port_up] search = "changed state to up"
Finally, create a saved search in savedsearches.conf that ties much of the above together to find port flapping and report on the results:
[port flapping] search = eventtype=cisco_ios_port_down OR eventtype=cisco_ios_port_up starthoursago=3 | stats count by interface,host,port_status | sort -count
Create a field from a subtoken, or from a default field value
You may run into problems if you are extracting a field value that is not a token or is part of a token. Most of these cases break down into two types:
- Field extractions of subtokens (where the value being extracted is a part of a larger word, in other words). For example, your field's value is
123, but in your event it occurs asfool123. - Search time field extractions from default field values (such as values of
host,source,sourcetypeortimestampvalues). For example, you have an event withhost=inputbackup-i-z5gl22gg2.prodand you want to add the field extractionchef_role=inputbackupto that event.
Tokens are chunks of event data that have been run through event processing prior to their being indexed. During event processing, events are broken up into segments, and this is the point where tokens are created--each segment created is a token. Because tokens cannot be smaller than individual "words" within strings, a field extraction of a subtoken (a part of a word) can cause problems because while the complete tokens are in the index, subtokens extracted from them are not. This means that a search on that subtoken field value will likely yield no results.
In a similar way, extractions from values of default fields are also a problem because default fields and their values are extracted prior to the segmentation step of event processing, which means they are not tokenized at all. Even if you extract the entire field value, of a host, source, or source type, you may have problems searching on it because it is not an indexed token. To continue the example used above, you may find that you will see the chef_role field extracted correctly when you search on <code>host=inputbackup-i-z5gl22gg2.prod. But if you search on host=inputbackup-i-z5gl22gg2.prod chef_role=inputbackup, no results are returned.
For these types of field extractions to perform as expected, you must configure props.conf as explained above. Then add an entry to fields.conf:
[<fieldname>] INDEXED = False INDEXED_VALUE = False
- Fill in <fieldname> with the name of your field.
- For example,
[url]if you've configured a field named "url."
- For example,
- Set
INDEXEDandINDEXED_VALUEto false.- This tells Splunk that the value you're searching for is not a token in the index.
For another take on this issue, see the Splunk blog entry "Cannot search based on an extracted field" by Ledion Bitincka.
For more information on the tokenization of event data, see "Configure segmentation to manage disk usage" in the Admin manual.
Create advanced search-time field extractions with field transforms
While you can define most search-time field extractions entirely within props.conf, some advanced search-time field extractions reference an additional component called a field transform. This section shows you how to configure field transforms in transforms.conf.
Field transforms contain a field-extracting regular expression and other attributes that govern the way the transform extracts fields. Field transforms are always created in conjunction with field extraction stanzas in props.conf--they cannot stand alone.
Your search-time field extractions require a field transform component if you need to:
- Reuse the same field-extracting regular expression across multiple sources, source types, or hosts (in other words, configure one field transform for multiple field extractions). If you find yourself using the same regex to extract fields for different sources, source types, and hosts, you may want to set it up as a transform. Then, if you find that you need to update the regex, you only have to do so once, even though it is used more than one field extraction.
- Apply more than one field-extracting regular expression to the same source, source type, or host (in other words, apply multiple field transforms to the same field extraction). This is sometimes necessary in cases where the field or fields that you want to extract from a particular source/source type/host appear in two or more very different event patterns.
- Set up delimiter-based field extractions. Delimiter-based extractions come in handy when your event data presents field-value pairs (or just field values) that are separated by delimiters such as commas, colons, bars, line breaks, and tab spaces.
- Configure extractions for multivalued fields. When you do this, Splunk appends additional field values to the field as it finds them in the event data.
- Extract fields with names that begin with numbers or underscores. Ordinarily key cleaning removes leading numeric characters and underscores from field names, but you can configure your transform to turn this functionality off if necessary.
You can also configure transforms to:
- Extract fields from the values of another field (other than
raw_) by using theSOURCE_KEYattribute. - Apply special formatting to the information being extracted, by using the
FORMATattribute.
Both of these configurations can now be set up directly in the regex, however; see the "Define a field transform" section below for more information about how to do this.
NOTE: If you need to concatenate a set of regex extractions into a single field value, you can do this with the FORMAT attribute, but only if you set it up as an index-time extraction. For example, if you have a string like 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ip address field value in the format 192.0.2.1. For more information, see "Configure index-time field extractions" in the Getting Data In Manual. However we DO NOT RECOMMEND that you make extensive changes to your set of indexed fields--do so sparingly if at all.
Steps for defining custom search-time field extractions that reference field transforms
Advanced search-time field extractions use the REPORT extraction configuration in props.conf. Each REPORT extraction stanza references a field transform that is defined separately in transforms.conf. The field transform contains the regular expression that Splunk uses to extract fields at search time, as well as other attributes that govern the way that the transform extracts those fields.
Follow these steps when you create an advanced search-time field extraction:
1. All extraction configurations in props.conf are restricted by a specific source, source type, or host. Start by identifying the source type, source, or host that provide the events that your field should be extracted from. (Don't update props.conf yet.)
Note: For information about hosts, sources, and sourcetypes, see "About default fields (host, source, source type, and more)" in the Getting Data In manual.
2. Create a regular expression that identifies the field in the event. Use named capturing groups to provide the field names for the extracted values. Use the field name syntax as described in the preceding sections.
Note: If your event lists field/value pairs or just field values, you can create a delimiter-based field extraction that won't require a regex; see the information on the DELIMS attribute, below, for more information.)
3. Create a field transform in transforms.conf that utilizes this regex (or delimiter configuration). The transform can also define a source key and/or event value formatting.
Edit the transforms.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
4. Follow the format for the REPORT field extraction type (defined two sections down) to create a field extraction stanza in props.conf that uses the host, source, or source type that you identified in Step 1. If necessary, you can create additional field extraction stanzas for other hosts, sources, and source types that refer to the same field transform.
Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.
5. Restart Splunk for your changes to take effect.
First, define a field transform
Follow this format when defining a search-time field transform in transforms.conf:
[<unique_transform_stanza_name>] REGEX = <regular expression> FORMAT = <string> SOURCE_KEY = <string> DELIMS = <quoted string list> FIELDS = <quoted string list> MV_ADD = [true|false] CLEAN_KEYS = [true|false] KEEP_EMPTY_VALS = [true|false] CAN_OPTIMIZE = [true|false]
- The
<unique_transform_stanza_name>is required for all search-time transforms. -
REGEXis a regular expression that operates on your data to extract fields. It is required for all search-time field transforms unless you are setting up a delimiter-based transaction, in which case you useDELIMSinstead. - REGEX and the FORMAT attribute:
- Name-capturing groups in the
REGEXare extracted directly to fields, which means that you don't have to specifyFORMATfor simple field extraction cases. - If the
REGEXextracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping inFORMAT:
- Name-capturing groups in the
<_KEY_><string>,<_VAL_><string>.
- For example, the following are equivalent:
- Using
FORMAT:
- Using
REGEX = ([a-z]+)=([a-z]+)FORMAT = $1::$2
- Not using
FORMAT:
- Not using
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
-
FORMATis optional. Use it to specify the format of the field/value pair(s) that you are extracting, including any field names or values you want to add. You don't need to specify theFORMATif you have a simpleREGEXwith name-capturing groups.- For search-time extractions, this is the pattern for the
FORMATfield:
- For search-time extractions, this is the pattern for the
FORMAT = <field-name>::<field-value>( <field-name>::<field-value>)*
- where:
field-name = [<string>|$<extracting-group-number>]field-value = [<string>|$<extracting-group-number>]
- Examples of search-time
FORMATusage:
- Examples of search-time
- 1.
FORMAT = first::$1 second::$2 third::other-value - 2.
FORMAT = $1::$2 $4::$3
- 1.
- Note: You cannot create concatenated fields with FORMAT at search time. This functionality is only available for index-time field transforms.
-
SOURCE_KEYis optional. Use it to identify a field whose values the transform REGEX should be applied to.- You can use this attribute to extract one or more values from the values of another field. You can use any field that is available at the time of the execution of this field extraction.
- By default,
SOURCE_KEYis set to_raw, which means it is applied to the entire event.
-
DELIMSis optional. Use it in place ofREGEXwhen dealing with delimiter-based field extractions, where field values--or field/value pairs--are separated by delimiters such as commas, colons, spaces, tab spaces, line breaks, and so on.- Delimiters must be quoted with " " (use \ to escape).
- Each character in the delimiter string is used as a delimiter to split the event.
- If the event contains full delimiter-separated field/value pairs, you enter two sets of quoted delimiters for
DELIMS. The first set of quoted delimiters separates the field/value pairs. The second set of quoted delimiters separates the field name from its corresponding value. - If the events only contain delimiter-separated values (no field names), you use one set of quoted delimiters, to separate the values. Then you use the
FIELDSattribute to apply field names to the extracted values (seeFIELDSbelow). Alternately, Splunk reads even tokens as field names and odd tokens as field values. - Splunk consumes consecutive delimiter characters unless you specify a list of field names.
- IMPORTANT: If a value may contain an embedded unescaped double quote character, such as "foo"bar", use REGEX, not DELIMS. An escaped double quote (\") is ok.
- Defaults to empty string.
- This example of
DELIMSusage applies to an event where field/value pairs are separated by '|' symbols, and the field names are separated from their corresponding values by '=' symbols:
-
[pipe_eq] -
DELIMS = "|", "="
-
-
FIELDSis used in conjunction withDELIMSwhen you are performing delimiter-based field extraction, but you only have field values to extract. UseFIELDSto provide field names for the extracted field values, in list format according to the order in which the values are extracted.- Note: If field names contain spaces or commas they must be quoted with " " (to escape, use \).
- Defaults to empty string.
- Here's an example of a delimiter-based extraction where three field values appear in an event. They are separated by a comma and then a space.
-
[commalist] -
DELIMS = ", " -
FIELDS = field1, field2, field3
-
-
MV_ADDis optional. Use it when you have events that repeat the same field but with different values. WhenMV_ADD = true, Splunk makes any field that is used more than once in an event (but with different values) a multivalued field and appends each value it finds for that field.- When set to
false, Splunk keeps the first value found for a field in an event and discards every subsequent value found for that same field in that same event. - Defaults to
false.
- When set to
-
CLEAN_KEYSis optional. It controls whether or not the system strips leading underscores and 0-9 characters from the field names it extracts (see the subtopic "Use proper field name syntax," above, for more information).- Add
CLEAN_KEYS = falseto your transform if you need to extract field names (keys) with leading underscores and/or 0-9 characters. - By default,
CLEAN_KEYSis always set totruefor transforms.
- Add
-
KEEP_EMPTY_VALSis optional. It controls whether Splunk keeps field/value pairs when the value is an empty string.- This option does not apply to field/value pairs that are generated by Splunk's autokv extraction. Autokv ignores field/value pairs with empty values.
- Defaults to
false.
-
CAN_OPTIMIZEis optional. It controls whether Splunk can optimize the extraction out (or, in other words, disable the extraction).- You might use this when you have field discovery turned off--it ensures that certain fields are always discovered.
- Splunk only disables an extraction if it can determine that none of the fields identified by the extraction will ever be needed for the successful evaluation of a search.
- Note: This attribute should rarely be set to
false. - Defaults to
true.
Second, configure a field extraction and associate it with the field transform
When you're setting up a search-time field extraction in props.conf that is associated with a field transform, you use the REPORT extraction class. Follow this format.
You can associate multiple field transform stanzas to a single field extraction by listing them after the initial <unique_transform_stanza_name>, separated by commas. (For more information, see the example later in this topic.)
[<spec>] REPORT-<name> = <unique_transform_stanza_name>
-
<spec>can be:-
<sourcetype>, the source type of an event. -
host::<host>, where<host>is the host for an event. -
source::<source>, where<source>is the source for an event.
-
-
<name>is any name you want to give your stanza to identify its namespace. -
<unique_transform_stanza_name>is the name of your field transform stanza from transforms.conf.
- Precedence rules for the REPORT class:
- For each class, Splunk takes the configuration from the highest precedence configuration block.
- If a particular class is specified for a
sourceand asourcetype, the class forsourcewins out. - Similarly, if a particular class is specified in
../local/for a<spec>, it overrides that class in ../default/.
If you have a set of transforms that must be run in a specific order and which belong to the same host, source, or source type, you can place them in a comma-separated list within the same props.conf stanza. Splunk will apply them in the specified order. For example, this sequence insures that the [yellow] field transform gets applied first, then [blue], and then [red]:
[source::color_logs] REPORT-colorchange = yellow, blue, red
If you need to change the order, rearrange the list.
Examples of custom search-time field extractions using field transforms
These examples present custom field extraction use cases that require you to configure one or more field transform stanzas in transforms.conf and then reference them in a props.conf field extraction stanza.
Configuring a field extraction that utilizes multiple field transforms
This example of search-time field transform setup demonstrates how:
- you can create transforms that pull varying field name/value pairs from events.
- you can create a field extraction that references two or more field transforms.
Let's say you have logs that contain multiple field name/field value pairs. While the fields vary from event to event, the pairs always appear in one of two formats.
The logs often come in this format:
[fieldName1=fieldValue1] [fieldName2=fieldValue2]
However, at times they are more complicated, logging multiple name/value pairs as a list, in which case the format looks like:
[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2]
Note that the list items are separated by commas, and that each fieldName is matched with a corresponding fieldValue. In these secondary cases you still want to pull out the field names and values so that the search results are
fieldName1=fieldValue1 fieldName2=fieldValue2
and so on.
To make things more clear, here's an example of an HTTP request event that combines both of the above formats.
[method=GET] [IP=10.1.1.1] [headerName=Host] [headerValue=www.example.com], [headerName=User-Agent] [headerValue=Mozilla], [headerName=Connection] [headerValue=close] [byteCount=255]
You want to develop a single field extraction that would pull the following field/value pairs from that event:
method=GET IP=10.1.1.1 Host=www.example.com User-Agent=Mozilla Connection=close byteCount=255
Solution
To efficiently and reliably pull out both formats of field/value pairs, you'll want to design two different regexes that are optimized for each format. One regex will identify events with the first format and pull out all of the matching field/value pairs. The other regex will identify events with the other format and pull out those field/value pairs.
You then create two unique transforms in transforms.conf--one for each regex--and then unite them in the corresponding field extraction stanza in props.conf.
The first transform you add to transforms.conf catches the fairly conventional <code>[fieldName1=fieldValue1] [fieldName2=fieldValue2]</code> case.
[myplaintransform] REGEX=\[(?!(?:headerName|headerValue))([^\s\=]+)\=([^\]]+)\] FORMAT=$1::$2
The second transform (also added to transforms.conf) catches the slightly more complex [headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2] [headerValue=fieldValue2] case:
[mytransform] REGEX= \[headerName\=(\w+)\],\s\[headerValue=([^\]]+)\] FORMAT= $1::$2
Both transforms use the <fieldName>::<fieldValue> FORMAT to match each field name in the event with its corresponding value. This setting in FORMAT enables Splunk to keep matching the regex against a matching event until every matching field/value combination is extracted.
Finally, this field extraction stanza, which you create in props.conf, references both of the field transforms:
[mysourcetype] KV_MODE=none REPORT-a = mytransform, myplaintransform
Note that, besides using multiple field transforms, the field extraction stanza also sets KV_MODE=none. This disables automatic field/value extraction for the identified source type (while letting your manually defined extractions continue). It ensures that these new regexes aren't overridden by automatic field extraction, and it also helps increase your search performance. (See the following subsection for more on disabling key/value extraction.)
Configuring delimiter-based field extraction
You can use the DELIMS attribute in field transforms to configure field extractions for events where field values or field/value pairs are separated by delimiters such as commas, colons, tab spaces, and more.
For example, say you have a recurring multiline event where a different field/value pair sits on a separate line, and each pair is separated by a colon followed by a tab space. Here's a sample event:
ComponentId: Application Server ProcessId: 5316 ThreadId: 00000000 ThreadName: P=901265:O=0:CT SourceId: com.ibm.ws.runtime.WsServerImpl ClassName: MethodName: Manufacturer: IBM Product: WebSphere Version: Platform 7.0.0.7 [BASE 7.0.0.7 cf070942.55] ServerName: sfeserv36Node01Cell\sfeserv36Node01\server1 TimeStamp: 2010-04-27 09:15:57.671000000 UnitOfWork: Severity: 3 Category: AUDIT PrimaryMessage: WSVR0001I: Server server1 open for e-business ExtendedMessage:
Now you could set up a bulky, wordy search-time field extraction stanza in props.conf that handles all of these fields:
[activityLog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
EXTRACT-ComponentId = ComponentId:\t(?.*)
EXTRACT-ProcessId = ProcessId:\t(?.*)
EXTRACT-ThreadId = ThreadId:\t(?.*)
EXTRACT-ThreadName = ThreadName:\t(?.*)
EXTRACT-SourceId = SourceId:\t(?.*)
EXTRACT-ClassName = ClassName:\t(?.*)
EXTRACT-MethodName = MethodName:\t(?.*)
EXTRACT-Manufacturer = Manufacturer:\t(?.*)
EXTRACT-Product = Product:\t(?.*)
EXTRACT-Version = Version:\t(?.*)
EXTRACT-ServerName = ServerName:\t(?.*)
EXTRACT-TimeStamp = TimeStamp:\t(?.*)
EXTRACT-UnitOfWork = UnitOfWork:\t(?.*)
EXTRACT-Severity = Severity:\t(?.*)
EXTRACT-Category = Category:\t(?.*)
EXTRACT-PrimaryMessage = PrimaryMessage:\t(?.*)
EXTRACT-ExtendedMessage = ExtendedMessage:\t(?.*)
But that solution is pretty over-the-top. Is there a more elegant way to handle it that would remove the need for all these EXTRACT lines? Yes!
Configure the following stanza in transforms.conf:
[activity_report] DELIMS = "\n", ":\t"
This states that the field/value pairs in the event are on separate lines ("\n"), and then specifies that the field name and field value on each line is separated by a colon and tab space (":\t").
To complete this configuration, rewrite the wordy props.conf stanza mentioned above as:
[activitylog]
LINE_BREAKER = [-]{8,}([\r\n]+)
SHOULD_LINEMERGE = false
REPORT-activity = activity_report
These two brief configurations will extract the same set of fields as before, but they leave less room for error and are more flexible.
Handling events with multivalued fields
You can use the MV_ADD attribute to extract fields in situations where the same field is used more than once in an event, but has a different value each time. Ordinarily, Splunk only extracts the first occurrence of a field in an event; every subsequent occurrence is discarded. But when MV_ADD is set to true in transforms.conf, Splunk treats the field like a multivalue field and saves extracts each unique field/value pair in the event.
Say you have a set of events that look like this:
event1.epochtime=1282182111 type=type1 value=value1 type=type3 value=value3 event2.epochtime=1282182111 type=type2 value=value4 type=type3 value=value5 type=type4 value=value6
See how the type and value fields are repeated several times in each event? What you'd like to do is search type=type3 and have both of these events be returned. Or you'd like to run a count(type) report on these two events that returns 5.
So, what you want to do is create a custom multivalue extraction of the type field for these events. Here's how you would set up your transforms.conf and props.conf files to enable it:
First, transforms.conf:
[mv-type] REGEX = type=(?<type>\s+) MV_ADD = true
Then, in props.conf for your sourcetype or source, set:
REPORT-type = mv-type
Disabling automatic search-time extraction for specific sources, source types, or hosts
You can disable automatic search-time field extraction for specific sources, source types, or hosts through edits in props.conf. Add KV_MODE = none for the appropriate [<spec>] in props.conf.
Note: Custom field extractions set up manually via the configuration files or Manager will still be processed for the affected source, source type, or host when KV_MODE = none.
[<spec>] KV_MODE = none
<spec> can be:
-
<sourcetype>- an event source type. -
host::<host>, where<host>is the host for an event. -
source::<source>, where<source>is the source for an event.
This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 View the Article History for its revisions.
How about csv files with quoted text? e.g. ,"some, text",some unquoted text,"a field
with a carriage return",99