Create custom fields at index time
In general, you should try to extract your fields at search time. However, there are times when you might need to add to the set of custom indexed fields that are applied to your events at index time.
For example, you might have certain search-time field extractions that noticeably impact search performance. This can happen, for example, if you typically search a large event set with expressions like foo!=bar
or NOT foo=bar
, and the field foo
nearly always takes on the value bar
.
You also might want to add an indexed field if the value of a search-time extracted field exists outside of the field more often than not. For example, if you typically search only for foo=1
, but 1 occurs in many events that do not have foo=1
, you might want to add foo
to the list of fields extracted by Splunk at index time.
For more information about creating custom field extractions see About fields in the Knowledge Manager manual.
If you have Splunk Cloud Platform and want to define index-time field extractions, you must create a private app that contains your desired configurations. If you are a Splunk Cloud Platform administrator with experience creating private apps, see Manage private apps in your Splunk Cloud Platform deployment. If you have not created private apps, contact your Splunk account representative for help with this customization.
Unless absolutely necessary, do not add custom fields to the set of default fields that Splunk software automatically extracts and indexes at index time. This includes fields such as timestamp
, punct
, host
, source
, and sourcetype
. Adding to this list of fields decreases performance, as each indexed field increases the size of the searchable index. Also, you can't change the fields on data that have already been indexed. You can only apply search-time knowledge to those events.
See About default fields.
Define additional indexed fields
Define additional indexed fields by editing props.conf, transforms.conf, and fields.conf.
Edit these files in $SPLUNK_HOME/etc/system/local/
or in your own custom application directory in $SPLUNK_HOME/etc/apps/
. For more information on configuration files in general, see About configuration files in the Admin manual.
Where to put the configuration changes in a distributed environment
If you have a distributed search deployment, processing is split between search peers (indexers) and a search head. You must deploy the changes as follows:
- Deploy the
props.conf
andtransforms.conf
changes to each of the search peers. - Deploy the
fields.conf
changes to the search head.
If you are employing heavy forwarders in front of your search peers, the props and transforms processing takes place on the forwarders, not the search peers. Therefore, you must deploy the props and transforms changes to the forwarders, not the search peers.
For details on Splunk Enterprise distributed components, read Scale your deployment with Splunk Enterprise components in the Distributed Deployment Manual.
For details on where you need to put configuration settings, read Configuration parameters and the data pipeline in the Admin Manual.
Field name syntax restrictions
You can assign field names as follows:
- Valid characters for field names are a-z, A-Z, 0-9, . , :, and _.
- Field names cannot begin with 0-9 or _ . Splunk reserves leading underscores for its internal variables.
- Avoid assigning field names that match any of the default field names.
- Do not assign field names that contain international characters.
Add a regex stanza for the new field to transforms.conf
Follow this format when you define an index-time field transform in transforms.conf
(Note: Some of these attributes, such as LOOKAHEAD
and DEST_KEY
, are only required for certain use cases):
[<unique_transform_stanza_name>] REGEX = <regular_expression> FORMAT = <your_custom_field_name>::$1 WRITE_META = [true|false] DEST_KEY = <KEY> DEFAULT_VALUE = <string> SOURCE_KEY = <KEY> REPEAT_MATCH = [true|false] LOOKAHEAD = <integer>
Note the following:
- The
<unique_stanza_name>
is required for all transforms, as is theREGEX
.
REGEX
is a regular expression that operates on your data to extract fields.- Name-capturing groups in the
REGEX
are extracted directly to fields, which means that you don't have to specify aFORMAT
for simple field extraction cases. - If the
REGEX
extracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping in theFORMAT
attribute:
- Name-capturing groups in the
_KEY_<string>, _VAL_<string>
- For example, the following are equivalent:
- Using
FORMAT
:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
- Not using
FORMAT
:
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
FORMAT
is optional. Use it to specify the format of the field-value pair(s) that you are extracting, including any field names or values that you want to add. You don't need to specify theFORMAT
if you have a simpleREGEX
with name-capturing groups.FORMAT
behaves differently depending on whether the extraction takes place at search time or index time.- For index-time transforms, you use
$n
to specify the output of eachREGEX
match (for example,$1
,$2
, and so on). - If the
REGEX
does not haven
groups, the matching fails. FORMAT
defaults to<unique_transform_stanza_name>::$1
.- The special identifier
$0
represents what was in theDEST_KEY
before theREGEX
was performed (in the case of index-time field extractions theDEST_KEY
is_meta
). For more information, see "How Splunk builds indexed fields," below. - For index-time field extractions, you can set up
FORMAT
in several ways. It can be a<field-name>::<field-value>
setup like:
- For index-time transforms, you use
FORMAT = field1::$1 field2::$2
(where theREGEX
extracts field values for captured groups "field1" and "field2")
- or:
FORMAT = $1::$2
(where theREGEX
extracts both the field name and the field value)
- However you can also set up index-time field extractions that create concatenated fields:
FORMAT = ipaddress::$1.$2.$3.$4
- When you create concatenated fields with
FORMAT
, it's important to understand that$
is the only special character. It is treated as a prefix for regex capturing groups only if it is followed by a number and only if that number applies to an existing capturing group.
- So if your regex has only one capturing group and its value is
bar
, then:
FORMAT = foo$1
would yieldfoobar
FORMAT = foo$bar
would yieldfoo$bar
FORMAT = foo$1234
would yieldfoo$1234
FORMAT = foo$1\$2
would yieldfoobar\$2
WRITE_META = true
writes the extracted field name and value to_meta
, which is where Splunk stores indexed fields. This attribute setting is required for all index-time field extractions, except for those whereDEST_KEY = _meta
(see the discussion ofDEST_KEY
, below).- For more information about
_meta
and its role in indexed field creation, see "How Splunk builds indexed fields," below.
- For more information about
DEST_KEY
is required for index-time field extractions whereWRITE_META = false
or is not set. It specifies where Splunk sends the results of theREGEX
.- For index-time searches,
DEST_KEY = _meta
, which is where Splunk stores indexed fields. For other possibleKEY
values see thetransforms.conf
page in this manual. - For more information about
_meta
and its role in indexed field creation, see How Splunk builds indexed fields, below. - When you use
DEST_KEY = _meta
you should also add$0
to the start of yourFORMAT
attribute.$0
represents theDEST_KEY
value before Splunk performs theREGEX
(in other words,_meta
. - Note: The
$0
value is in no way derived from theREGEX
.
- For index-time searches,
DEFAULT_VALUE
is optional. The value for this attribute is written toDEST_KEY
if theREGEX
fails.- Defaults to empty.
SOURCE_KEY
is optional. You use it to identify a KEY whose values theREGEX
should be applied to.- By default,
SOURCE_KEY = _raw
, which means it is applied to the entirety of all events. - Typically used in conjunction with
REPEAT_MATCH
. - For other possible
KEY
values see thetransforms.conf
page in this manual.
- By default,
REPEAT_MATCH
is optional. Set it totrue
to run theREGEX
multiple times on theSOURCE_KEY
.REPEAT_MATCH
starts wherever the last match stopped and continues until no more matches are found. Useful for situations where an unknown number of field-value matches are expected per event.- Defaults to
false
.
LOOKAHEAD
is optional. Use it to specify how many characters to search into an event.- Defaults to 4096. You might want to increase your
LOOKAHEAD
value if you have events with line lengths longer than 4096 characters. - Specifically, if the text you need to match is past this number of characters, you will need to increase this value.
- Be aware, however, that complex regexes can have very high costs when scanning larger text segments. The speed may fall off quadratically or worse when using multiple greedy branches or lookaheads / lookbehinds.
- Defaults to 4096. You might want to increase your
Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test regexes by using them in searches with the rex search command.
Note: The capturing groups in your regex must identify field names that follow field name syntax restrictions. They can only contain ASCII characters (a-z, A-Z, 0-9 or _.). International characters will not work.
Link the new field to props.conf
To props.conf
, add the following lines:
[<spec>] TRANSFORMS-<class> = <unique_stanza_name>
Note the following:
<spec>
can be:- <sourcetype>, the sourcetype of an event.
- host::<host>, where <host> is the host for an event.
- source::<source>, where <source> is the source for an event.
- Note: You can use regex-type syntax when setting the
<spec>
. Also, source and source type stanzas match in a case-sensitive manner while host stanzas do not. For more information, see theprops.conf
spec file.
<class>
is a unique literal string that identifies the namespace of the field (key) you are extracting. Note:<class>
values do not have to follow field name syntax restrictions (see above). You can use characters other than a-z, A-Z, and 0-9, and spaces are allowed.<unique_stanza_name>
is the name of your stanza fromtransforms.conf
.
Note: For index-time field extraction, props.conf
uses TRANSFORMS-<class>
, as opposed to EXTRACT-<class>
, which is used for configuring search-time field extraction.
Add an entry to fields.conf for the new field
The Splunk platform uses configurations in fields.conf
to determine which custom field extractions should be treated as indexed fields.
If the new indexed field comes from a source type that is fully or partially composed of unstructured data, you create a separate configuration for each custom indexed field. The stanza names of these configurations are the field names.
If you are adding fields that come from a source type that is composed entirely of structured data, such as JSON-formatted data, you can design source-type-scoped configurations that utilize wildcard expressions to match sets of indexed fields. See Extract fields from files with structured data.
Add an entry to fields.conf
for the new indexed field:
[<your_custom_field_name>] INDEXED=true
<your_custom_field_name>
is the name of the custom field you set in the unique stanza that you added totransforms.conf
.- Set
INDEXED=true
to indicate that the field is indexed.
If a field of the same name is extracted at search time, you must set INDEXED=false
for the field. In addition, you must also set INDEXED_VALUE=false
if events exist that have values of that field that are not pulled out at index time, but which are extracted at search time.
For example, say you're performing a simple <field>::1234
extraction at index time. This could work, but you would have problems if you also implement a search-time field extraction based on a regex like A(\d+)B
, where the string A1234B
yields a value for that field of 1234
. This would turn up events for 1234
at search time that Splunk would be unable to locate at index time with the <field>::1234
extraction.
Restart Splunk for your changes to take effect
Changes to configuration files such as props.conf
and transforms.conf
won't take effect until you shut down and restart Splunk on all affected components.
How Splunk builds indexed fields
Splunk builds indexed fields by writing to _meta
. Here's how it works:
_meta
is modified by all matching transforms intransforms.conf
that contain eitherDEST_KEY = _meta
orWRITE_META = true
.- Each matching transform can overwrite
_meta
, so useWRITE_META = true
to append_meta
.- If you don't use
WRITE_META
, then start yourFORMAT
with$0
.
- If you don't use
- After
_meta
is fully built during parsing, Splunk interprets the text in the following way:- The text is broken into units; each unit is separated by whitespace.
- Quotation marks (" ") group characters into larger units, regardless of whitespace.
- Backslashes ( \ ) immediately preceding quotation marks disable the grouping properties of quotation marks.
- Backslashes preceding a backslash disable that backslash.
- Units of text that contain a double colon (::) are turned into indexed fields. The text on the left side of the double colon becomes the field name, and the right side becomes the value.
Indexed fields with regex-extracted values containing quotation marks will generally not work, and backslashes might also have problems. Fields extracted at search time do not have these limitations.
Here's an example of a set of index-time extractions involving quotation marks and backslashes to disable quotation marks and backslashes:
WRITE_META = true FORMAT = field1::value field2::"value 2" field3::"a field with a \" quotation mark" field4::"a field which ends with a backslash\\"
When Splunk creates field names
Remember: When Splunk creates field names, it applies field name syntax restrictions to them.
1. All characters that are not in a-z,A-Z, and 0-9 ranges are replaced with an underscore (_).
2. All leading underscores are removed. In Splunk, leading underscores are reserved for internal fields.
Index-time field extraction examples
Here are a set of examples of configuration file setups for index-time field extractions.
Define a new indexed field
This basic example creates an indexed field called err_code
.
transforms.conf
In transforms.conf
add:
[netscreen-error] REGEX = device_id=\[\w+\](?<err_code>[^:]+) FORMAT = err_code::"$1" WRITE_META = true
This stanza takes device_id=
followed with a word within brackets and a text string terminating with a colon. The source type of the events is testlog
.
Comments:
- The
FORMAT =
line contains the following values:err_code::
is the name of the field.- $1 refers to the new field written to the index. It is the value extracted by
REGEX
.
WRITE_META = true
is an instruction to write the content ofFORMAT
to the index.
A line might look like:
Aug 15 10:30:14 136.10.247.130 SSG350M: NetScreen device_id=[Juniper111] [Root]system-warning-00518: Admin user "userid" login attempt for Web(http) management (port 20480) from 1.1.1.1:22560 failed. (2012-08-15 11:33:36)
props.conf
Add the following lines to props.conf
:
[testlog] TRANSFORMS-netscreen = netscreen-error
fields.conf
Add the following lines to fields.conf
:
[err_code] INDEXED=true
Restart Splunk for your configuration file changes to take effect.
Define two new indexed fields with one regex
This example creates two indexed fields called username
and login_result
.
transforms.conf
In transforms.conf
add:
[ftpd-login] REGEX = Attempt to login by user: (.*): login (.*)\. FORMAT = username::"$1" login_result::"$2" WRITE_META = true
This stanza finds the literal text Attempt to login by user:
, extracts a username followed by a colon, and then the result, which is followed by a period. A line might look like:
2008-10-30 14:15:21 mightyhost awesomeftpd INFO Attempt to login by user: root: login FAILED.
props.conf
Add the following lines to props.conf
:
[ftpd-log] TRANSFORMS-login = ftpd-login
fields.conf
Add the following lines to fields.conf
:
[username] INDEXED=true [login_result] INDEXED=true
Restart Splunk for your configuration file changes to take effect.
Concatenate field values from event segments at index time
This example shows you how an index-time transform can be used to extract separate segments of an event and combine them to create a single field, using the FORMAT
option.
Let's say you have the following event:
20100126 08:48:49 781 PACKET 078FCFD0 UDP Rcv 127.0.0.0 8226 R Q [0084 A NOERROR] A (4)www(8)google(3)com(0)
Now, what you want to do is get (4)www(8)google(3)com(0)
extracted as a value of a field named dns_requestor
. But you don't want those garbage parentheses and numerals, you just want something that looks like www.google.com
. How do you achieve this?
transforms.conf
You would start by setting up a transform in transforms.conf
named dnsRequest
:
[dnsRequest] REGEX = UDP[^\(]+\(\d\)(\w+)\(\d\)(\w+)\(\d\)(\w+) FORMAT = dns_requestor::$1.$2.$3
This transform defines a custom field named dns_requestor
. It uses its REGEX
to pull out the three segments of the dns_requestor
value. Then it uses FORMAT
to order those segments with periods between them, like a proper URL.
Note: This method of concatenating event segments into a complete field value is something you can only perform with index-time extractions; search-time extractions have practical restrictions that prevent it. If you find that you must use FORMAT
in this manner, you will have to create a new indexed field to do it.
props.conf
Then, the next step would be to define a field extraction in props.conf
that references the dnsRequest
transform and applies it to events coming from the server1
source type:
[server1] TRANSFORMS-dnsExtract = dnsRequest
fields.conf
Finally, you would enter the following stanza in fields.conf
:
[dns_requestor] INDEXED = true
Restart Splunk for your configuration file changes to take effect.
Assign default fields dynamically | Extract fields from files with structured data |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.2.2406, 8.2.2112, 8.2.2202, 9.0.2205, 8.2.2201, 8.2.2203, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403 (latest FedRAMP release)
Feedback submitted, thanks!