Configure index-time custom field extraction
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
- Define additional indexed fields
- Add a regex stanza for the new field to transforms.conf
- Link the new field to props.conf
- Add an entry to fields.conf for the new field
- Restart Splunk for your changes to take effect
- How Splunk builds indexed fields
- When Splunk creates field names
- Index-time field extraction examples
- Define a new indexed field
- Define two new indexed fields with one regex
Configure index-time custom field extraction
We do not recommend that you add custom fields to the set of default fields that Splunk extracts and indexes at index time, such as timestamp, punct, host, source, and sourcetype. Adding to this list of fields can negatively impact indexing performance and search times, because each indexed field increases the size of the searchable index. Indexed fields are also less flexible--whenever you make changes to your set of indexed fields, you must reindex your entire dataset.
With those caveats, there are times when you may find a need to change or add to your indexed fields. For example, you may have situations where certain search-time field extractions noticeably impact search performance. This can happen, for example, if you commonly search a large event set with expressions like foo!=bar or NOT foo=bar, and the field foo nearly always takes on the value bar.
Conversely, you may want to add an indexed field if the value of a search-time extracted field exists outside of the field more often than not. For example, if you commonly search only for foo=1, but 1 occurs in many events that do not have foo=1, you may want to add foo to the list of fields extracted by Splunk at index time.
For more information about index-time and search-time, see "Index time versus search time" in this manual.
Define additional indexed fields
Define additional indexed fields by editing props.conf, transforms.conf and fields.conf.
Edit these files in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "About configuration files" in the Admin manual.
Splunk only accepts field names that contain alpha-numeric characters or an underscore:
- Valid characters for field names are a-z, A-Z, 0-9, or _ .
- Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's internal variables.
- International characters are not allowed.
Add a regex stanza for the new field to transforms.conf
Add the following lines to transforms.conf:
[<unique_stanza_name>] REGEX = <your_regex> FORMAT = <your_custom_field_name>::"$1" WRITE_META = true
-
<unique_stanza_name>names your stanza. Use this name later to configureprops.conf. -
REGEX =create a regex that recognizes your custom field value. -
FORMAT =inserts <your_custom_field_name> before the value you've extracted via regex as $1.- In order to properly display field values containing whitespace in Splunk Web, apply quotes to the
FORMATkey. -
FORMAT = <your_custom_field_name>::"$1" - You can extract multiple fields using a single regex that contains multiple match groups:
FORMAT = <your_first_field>::"$1" <your_second_field>::"$2"
- In order to properly display field values containing whitespace in Splunk Web, apply quotes to the
-
WRITE_META =set this totrueto write your field name and value to_meta, which is where Splunk stores indexed fields. (See "How Splunk builds indexed fields," below.)
Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test regexes by using them in searches with the rex search command. Splunk also maintains a list of useful third-party tools for writing and testing regular expressions.
Note: The capturing groups in your regex must identify field names that use ASCII characters (a-zA-Z0-9_-.). International characters will not work.
Link the new field to props.conf
To props.conf, add the following lines:
[<spec>] TRANSFORMS-<value> = <unique_stanza_name>
-
<spec>can be:- <sourcetype>, the sourcetype of an event.
- host::<host>, where <host> is the host for an event.
- source::<source>, where <source> is the source for an event.
-
<unique_stanza_name>is the name of your stanza fromtransforms.conf. -
<value>is any value you want, to give your attribute its name-space.
Note: For index-time field extraction, props.conf uses TRANSFORMS-<class>, as opposed to EXTRACT-<value>, which is used for configuring search-time field extraction.
Add an entry to fields.conf for the new field
Add an entry to fields.conf for the new indexed field:
[<your_custom_field_name>] INDEXED=true
-
<your_custom_field_name>is the name of the custom indexed field you set in the unique stanza that you added totransforms.conf. The field name cannot contain spaces and can only contain lowercase characters. - Set
INDEXED=trueto indicate that the field is indexed.
Note: If a field of the same name is extracted at search time, you must set INDEXED=false for the field. In addition, you must also set INDEXED_VALUE=false if events exist that have values of that field that are not pulled out at index time, but which are extracted at search time.
For example, say you're performing a simple <field>::1234 extraction at index time. This could work, but you would have problems if you also implement a search-time field extraction based on a regex like A(\d+)B, where the string A1234B yields a value for that field of 1234. This would turn up events for 1234 at search time that Splunk would be unable to locate at index time with the <field>::1234 extraction.
Restart Splunk for your changes to take effect
Changes to configuration files such as props.conf and transforms.conf won't take effect until you shut down and restart Splunk.
How Splunk builds indexed fields
Splunk builds indexed fields by writing to _meta. Here's how it works:
-
_metais modified by all matching transforms in transforms.conf that contain eitherDEST_KEY = _metaorWRITE_META = true. - Each matching transform can overwrite
_meta, so useWRITE_META = trueto append_meta.- If you don't use
WRITE_META, then start yourFORMATwith$0.
- If you don't use
- After
_metais fully built during parsing, Splunk interprets the text in the following way:- The text is broken into units; each unit is separated by whitespace.
- Quotation marks (" ") group characters into larger units, regardless of whitespace.
- Backslashes ( \ ) immediately preceding quotation marks disable the grouping properties of quotation marks.
- Backslashes preceding a backslash disable that backslash.
- Units of text that contain a double colon (::) are turned into extracted fields. The text on the left side of the double colon becomes the field name, and the right side becomes the value.
Note: Indexed fields with regex-extracted values containing quotation marks will generally not work, and backslashes may also have problems. Fields extracted at search time do not have these limitations.
Here's an example of a set of index-time extractions involving quotation marks and backslashes to disable quotation marks and backslashes.
WRITE_META = true FORMAT = field1::value field2::"value 2" field3::"a field with a \" quotation mark" field4::"a field which ends with a backslash\\"
When Splunk creates field names
When Splunk creates field names, it applies the following rules to all extracted fields, whether they are extracted at index-time or search-time, by default or through a custom configuration:
1. All characters that are not in a-z,A-Z, and 0-9 ranges are replaced with an underscore (_).
2. All leading underscores are removed (because in Splunk, leading underscores are reserved for internal variables).
Index-time field extraction examples
Here are a set of examples of configuration file setups for custom index-time field extractions.
Define a new indexed field
This example creates an indexed field called err_code.
transforms.conf
In transforms.conf add:
[netscreen-error] REGEX = device_id=\[w+\](?<err_code>[^:]+) FORMAT = err_code::"$1" WRITE_META = true
This stanza takes device_id= followed with a word within brackets and a text string terminating with a colon. The source type of the events is testlog.
Comments:
- The
FORMAT =line contains the following values:-
err_code::is the name of the field. - $1 refers to the new field written to the index. It is the value extracted by
REGEX.
-
-
WRITE_META = trueis an instruction to write the content ofFORMATto the index.
props.conf
Add the following lines to props.conf:
[testlog] TRANSFORMS-netscreen = netscreen-error
fields.conf
Add the following lines to fields.conf:
[err_code] INDEXED=true
Define two new indexed fields with one regex
This example creates two indexed fields called username and login_result.
transforms.conf
In transforms.conf add:
[ftpd-login] REGEX = Attempt to login by user: (.*): login (.*)\. FORMAT = username::"$1" login_result::"$2" WRITE_META = true
This stanza finds the literal text Attempt to login by user:, extracts a username followed by a colon, and then the result, which is followed by a period. A line might look like:
2008-10-30 14:15:21 mightyhost awesomeftpd INFO Attempt to login by user: root: login FAILED.
props.conf
Add the following lines to props.conf:
[ftpd-log] TRANSFORMS-login = ftpd-login
fields.conf
Add the following lines to fields.conf:
[username] INDEXED=true [login_result] INDEXED=true
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.