Splunk® Enterprise

Getting Data In

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Create custom fields at index-time

Caution: We do not recommend that you add custom fields to the set of default fields that Splunk automatically extracts and indexes at index time, such as timestamp, punct, host, source, and sourcetype. Adding to this list of fields can negatively impact indexing performance and search times, because each indexed field increases the size of the searchable index. Indexed fields are also less flexible--whenever you make changes to your set of fields, you must re-index your entire dataset. For more information, see "Index time versus search time" in the Admin manual.

With those caveats, there are times when you might find reason to add custom indexed fields. For example, you might have a situation where certain search-time field extractions are noticeably impacting search performance. This can happen, for example, if you commonly search a large event set with expressions like foo!=bar or NOT foo=bar, and the field foo nearly always takes on the value bar.

Conversely, you might want to add an indexed field if the value of a search-time extracted field exists outside of the field more often than not. For example, if you commonly search only for foo=1, but 1 occurs in many events that do not have foo=1, you might want to add foo to the list of fields extracted by Splunk at index time.

In general, you should try to extract your fields at search time. For more information see "Create search-time field extractions" in the Knowledge Manager manual.

Define additional indexed fields

Define additional indexed fields by editing props.conf, transforms.conf, and fields.conf.

Edit these files in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "About configuration files" in the Admin manual.

Splunk only accepts field names that contain alpha-numeric characters or an underscore:

  • Valid characters for field names are a-z, A-Z, 0-9, or _ .
  • Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk's internal variables.
  • International characters are not allowed.

Add a regex stanza for the new field to transforms.conf

Follow this format when you define an index-time field transform in transforms.conf (Note: Some of these attributes, such as LOOKAHEAD and DEST_KEY, are only required for certain use cases):

REGEX = <regular_expression>
FORMAT = <your_custom_field_name>::$1
WRITE_META = [true|false]
DEFAULT_VALUE = <string>
REPEAT_MATCH = [true|false]
LOOKAHEAD = <integer>

Note the following:

  • The <unique_stanza_name> is required for all transforms, as is the REGEX.
  • REGEX is a regular expression that operates on your data to extract fields.
    • Name-capturing groups in the REGEX are extracted directly to fields, which means that you don't have to specify a FORMAT for simple field extraction cases.
    • If the REGEX extracts both the field name and its corresponding value, you can use the following special capturing groups to skip specifying the mapping in the FORMAT attribute:
_KEY_<string>, _VAL_<string>
  • For example, the following are equivalent:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
Not using FORMAT:
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
  • FORMAT is optional. Use it to specify the format of the field/value pair(s) that you are extracting, including any field names or values that you want to add. You don't need to specify the FORMAT if you have a simple REGEX with name-capturing groups.
  • FORMAT behaves differently depending on whether the extraction takes place at search time or index time.
    • For index-time transforms, you use $n to specify the output of each REGEX match (for example, $1, $2, and so on).
    • If the REGEX does not have n groups, the matching fails.
    • FORMAT defaults to <unique_transform_stanza_name>::$1.
    • The special identifier $0 represents what was in the DEST_KEY before the REGEX was performed (in the case of index-time field extractions the DEST_KEY is _meta). For more information, see "How Splunk builds indexed fields," below.
    • For index-time field extractions, you can set up FORMAT in several ways. It can be a <field-name>::<field-value> setup like:
FORMAT = field1::$1 field2::$2 (where the REGEX extracts field values for captured groups "field1" and "field2")
FORMAT = $1::$2 (where the REGEX extracts both the field name and the field value)
However you can also set up index-time field extractions that create concatenated fields:
FORMAT = ipaddress::$1.$2.$3.$4
When you create concatenated fields with FORMAT, it's important to understand that $ is the only special character. It is treated as a prefix for regex capturing groups only if it is followed by a number and only if that number applies to an existing capturing group.
So if your regex has only one capturing group and its value is bar, then:
FORMAT = foo$1 would yield foobar
FORMAT = foo$bar would yield foo$bar
FORMAT = foo$1234 would yield foo$1234
FORMAT = foo$1\$2 would yield foobar\$2
  • WRITE_META = true writes the extracted field name and value to _meta, which is where Splunk stores indexed fields. This attribute setting is required for all index-time field extractions, except for those where DEST_KEY = _meta (see the discussion of DEST_KEY, below).
    • For more information about _meta and its role in indexed field creation, see "How Splunk builds indexed fields," below.
  • DEST_KEY is required for index-time field extractions where WRITE_META = false or is not set. It specifies where Splunk sends the results of the REGEX.
    • For index-time searches, DEST_KEY = _meta, which is where Splunk stores indexed fields. For other possible KEY values see the transforms.conf page in this manual.
    • For more information about _meta and its role in indexed field creation, see "How Splunk builds indexed fields," below.
    • When you use DEST_KEY = _meta you should also add $0 to the start of your FORMAT attribute. $0 represents the DEST_KEY value before Splunk performs the REGEX (in other words, _meta.
    • Note: The $0 value is in no way derived from the REGEX.
  • DEFAULT_VALUE is optional. The value for this attribute is written to DEST_KEY if the REGEX fails.
    • Defaults to empty.
  • SOURCE_KEY is optional. You use it to identify a KEY whose values the REGEX should be applied to.
    • By default, SOURCE_KEY = _raw, which means it is applied to the entirety of all events.
    • Typically used in conjunction with REPEAT_MATCH.
    • For other possible KEY values see the transforms.conf page in this manual.
  • REPEAT_MATCH is optional. Set it to true to run the REGEX multiple times on the SOURCE_KEY.
    • REPEAT_MATCH starts wherever the last match stopped and continues until no more matches are found. Useful for situations where an unknown number of field/value matches are expected per event.
    • Defaults to false.
  • LOOKAHEAD is optional. Use it to specify how many characters to search into an event.
    • Defaults to 256. You might want to increase your LOOKAHEAD value if you have events with line lengths longer than 256 characters.

Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test regexes by using them in searches with the rex search command. Splunk also maintains a list of useful third-party tools for writing and testing regular expressions.

Note: The capturing groups in your regex must identify field names that use ASCII characters (a-zA-Z0-9_-.). International characters will not work.

Link the new field to props.conf

To props.conf, add the following lines:

TRANSFORMS-<value> = <unique_stanza_name>

Note the following:

  • <spec> can be:
    • <sourcetype>, the sourcetype of an event.
    • host::<host>, where <host> is the host for an event.
    • source::<source>, where <source> is the source for an event.
    • Note: You can use regex-type syntax when setting the <spec>. Also, source and source type stanzas match in a case-sensitive manner while host stanzas do not. For more information, see the props.conf spec file.
  • <value> is any value you want, to give your attribute its name-space.
  • <unique_stanza_name> is the name of your stanza from transforms.conf.

Note: For index-time field extraction, props.conf uses TRANSFORMS-<class>, as opposed to EXTRACT-<value>, which is used for configuring search-time field extraction.

Add an entry to fields.conf for the new field

Add an entry to fields.conf for the new indexed field:


Note the following:

  • <your_custom_field_name> is the name of the custom field you set in the unique stanza that you added to transforms.conf.
  • Set INDEXED=true to indicate that the field is indexed.

Note: If a field of the same name is extracted at search time, you must set INDEXED=false for the field. In addition, you must also set INDEXED_VALUE=false if events exist that have values of that field that are not pulled out at index time, but which are extracted at search time.

For example, say you're performing a simple <field>::1234 extraction at index time. This could work, but you would have problems if you also implement a search-time field extraction based on a regex like A(\d+)B, where the string A1234B yields a value for that field of 1234. This would turn up events for 1234 at search time that Splunk would be unable to locate at index time with the <field>::1234 extraction.

Restart Splunk for your changes to take effect

Changes to configuration files such as props.conf and transforms.conf won't take effect until you shut down and restart Splunk.

How Splunk builds indexed fields

Splunk builds indexed fields by writing to _meta. Here's how it works:

  • _meta is modified by all matching transforms in transforms.conf that contain either DEST_KEY = _meta or WRITE_META = true.
  • Each matching transform can overwrite _meta, so use WRITE_META = true to append _meta.
    • If you don't use WRITE_META, then start your FORMAT with $0.
  • After _meta is fully built during parsing, Splunk interprets the text in the following way:
    • The text is broken into units; each unit is separated by whitespace.
    • Quotation marks (" ") group characters into larger units, regardless of whitespace.
    • Backslashes ( \ ) immediately preceding quotation marks disable the grouping properties of quotation marks.
    • Backslashes preceding a backslash disable that backslash.
    • Units of text that contain a double colon (::) are turned into extracted fields. The text on the left side of the double colon becomes the field name, and the right side becomes the value.

Note: Indexed fields with regex-extracted values containing quotation marks will generally not work, and backslashes might also have problems. Fields extracted at search time do not have these limitations.

Here's an example of a set of index-time extractions involving quotation marks and backslashes to disable quotation marks and backslashes:

FORMAT = field1::value field2::"value 2" field3::"a field with a \" quotation mark" field4::"a field which 
ends with a backslash\\"

When Splunk creates field names

When Splunk creates field names, it applies the following rules to all extracted fields, whether they are extracted at index-time or search-time, by default or through a custom configuration:

1. All characters that are not in a-z,A-Z, and 0-9 ranges are replaced with an underscore (_).

2. All leading underscores are removed. In Splunk, leading underscores are reserved for internal variables.

Index-time field extraction examples

Here are a set of examples of configuration file setups for index-time field extractions.

Define a new indexed field

This basic example creates an indexed field called err_code.


In transforms.conf add:

REGEX =  device_id=\[\w+\](?<err_code>[^:]+)
FORMAT = err_code::"$1"

This stanza takes device_id= followed with a word within brackets and a text string terminating with a colon. The source type of the events is testlog.


  • The FORMAT = line contains the following values:
    • err_code:: is the name of the field.
    • $1 refers to the new field written to the index. It is the value extracted by REGEX.
  • WRITE_META = true is an instruction to write the content of FORMAT to the index.


Add the following lines to props.conf:

TRANSFORMS-netscreen = netscreen-error


Add the following lines to fields.conf:


Restart Splunk for your configuration file changes to take effect.

Define two new indexed fields with one regex

This example creates two indexed fields called username and login_result.


In transforms.conf add:

REGEX = Attempt to login by user: (.*): login (.*)\.
FORMAT = username::"$1" login_result::"$2"

This stanza finds the literal text Attempt to login by user:, extracts a username followed by a colon, and then the result, which is followed by a period. A line might look like:

2008-10-30 14:15:21 mightyhost awesomeftpd INFO Attempt to login by user: root: login FAILED.


Add the following lines to props.conf:

TRANSFORMS-login = ftpd-login


Add the following lines to fields.conf:



Restart Splunk for your configuration file changes to take effect.

Concatenate field values from event segments at index time

This example shows you how an index-time transform can be used to extract separate segments of an event and combine them to create a single field, using the FORMAT option.

Let's say you have the following event:

20100126 08:48:49 781 PACKET 078FCFD0 UDP Rcv 8226 R Q [0084 A NOERROR] A (4)www(8)google(3)com(0)

Now, what you want to do is get (4)www(8)google(3)com(0) extracted as a value of a field named dns_requestor. But you don't want those garbage parentheses and numerals, you just want something that looks like www.google.com. How do you achieve this?


You would start by setting up a transform in transforms.conf named dnsRequest:

REGEX = UDP[^\(]+\(\d\)(\w+)\(\d\)(\w+)\(\d\)(\w+)
FORMAT = dns_requestor::$1.$2.$3 

This transform defines a custom field named dns_requestor. It uses its REGEX to pull out the three segments of the dns_requestor value. Then it uses FORMAT to order those segments with periods between them, like a proper URL.

Note: This method of concatenating event segments into a complete field value is something you can only perform with index-time extractions; search-time extractions have practical restrictions that prevent it. If you find that you must use FORMAT in this manner, you will have to create a new indexed field to do it.


Then, the next step would be to define a field extraction in props.conf that references the dnsRequest transform and applies it to events coming from the server1 source type:

TRANSFORMS-dnsExtract = dnsRequest


Finally, you would enter the following stanza in fields.conf:

INDEXED = true

Restart Splunk for your configuration file changes to take effect.

Assign default fields dynamically
Extract fields from file headers

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7


Chaospixel, DELIMS can only be used for search-time extractions; this topic describes index-time field extractions, which require REGEX. See the descriptions for the REGEX and DELIMS attributes in the transforms.ocnf spec file (http://www.splunk.com/base/Documentation/4.2.2/Admin/Transformsconf) for details.

July 25, 2011

An example using DELIMS would be useful. The docs indicate that DELIMS can be used in place of REGEX - but the error log keeps telling me that REGEX is required for my transform (I am trying to read a csv)

July 25, 2011

Page is missing next link

July 7, 2011

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters