Admin Manual

 


About the Splunk Admin Manual
How Splunk Works

Automatic header-based field extraction

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Automatic header-based field extraction

You can configure Splunk to extract fields automatically from data sources that contain headers. Examples of sources that have headers are: CSV, TM3, or MS Exchange log files. To do this, use automatic header-based field extraction.


How automatic header-based field extraction works

If you enable automatic header-based field extraction for a source or source type, Splunk scans that source or source type for header information to use to extract fields. If a source has the necessary information, Splunk extracts fields using delimiter-based key/value extraction.

Splunk does this by creating an entry in transforms.conf for the source, and populating it with transforms to extract the fields. Splunk also adds a source type stanza to props.conf to tie the field extraction transforms to the source. Splunk then applies the transforms to events from the source at search time.

Note: Automatic header-based field extraction doesn't impact index size or indexing performance because it occurs during source typing (before index time).

Once Splunk has extracted fields, you can use them for filtering and reporting just like any other field by selecting them from the Fields picker in Splunk Web.


Configure automatic header-based field extraction

Configure automatic header-based field extraction for any source or source type by editing props.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.

For more information on configuration files in general, see how configuration files work.

To turn on automatic header-based field extraction for a source or source type, add CHECK_FOR_HEADER=TRUE under that source or source type's stanza in props.conf.

Important: If you have already defined a source type for the source for which you want to enable automatic header-based field extraction, you must edit the stanza in inputs.conf and remove the sourcetype = [name] before you set CHECK_FOR_HEADER=TRUE in props.conf so that it doesn't conflict with the value that is generated by the automatic extraction.

Example props.conf entry for an MS Exchange source:

[MSExchange] 
CHECK_FOR_HEADER=TRUE
...

Note: Set CHECK_FOR_HEADER=FALSE to turn off automatic header-based field extraction for a source or source type.


Changes Splunk makes to configuration files

If you enable automatic header-based field extraction for a source or sourcetype, Splunk adds information to copies of transforms.conf and props.conf in $SPLUNK_HOME/etc/apps/learned/ when it extracts fields for that source or sourcetype.

Important: Don't edit this information afterward, or the extracted fields will not work.

Splunk creates a stanza in transforms.conf for each source type with unique header information that matches a source type defined in props.conf. Splunk names each stanza it creates as [AutoHeader-M], where M in an integer that increments sequentially for each source that has a unique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-M]). Splunk populates each stanza with transforms to extract the fields (using header information).

Important: If you have already defined a source type for the source for which you want to enable automatic header-based field extraction, you must edit the stanza in inputs.conf and remove the sourcetype = [name] before you set CHECK_FOR_HEADER=TRUE so that it doesn't conflict with the value that is generated by the automatic extraction.

Example of an transforms.conf entry made automatically by Splunk for the MS Exchange source mentioned above:

...
[AutoHeader-1]
FIELDS="time", "client-ip", "cs-method", "sc-status"
DELIMS=" "
...

Splunk then adds new source type stanzas to props.conf for each unique source. Splunk names the stanzas as [yoursource-N], where yoursource is the source type configured with automatic header-based field extraction, and N is an integer that increments sequentially for each transform in transforms.conf.

Example props.conf entry using the MS Exchange file from the introduction:

# the original source you configured
[MSExchange] 
CHECK_FOR_HEADER=TRUE
...
# source type that Splunk added to handle transforms for automatic header-based field extraction for the same source
[MSExchange-1]
REPORT-AutoHeader = AutoHeader-1
...


Note about search and header-based field extraction

To return all events that Splunk has typed with a source type it generated while running automatic header-based field extraction, use a wildcard to search for all events of that source type.

A search for sourcetype="yoursource" looks like this:

sourcetype=yoursource*


Examples

These examples show how header-based field extraction works with common source types.

MS Exchange source file

This example shows how Splunk extracts fields from an MS Exchange file using automatic header-based field extraction.

This sample MS Exchange log file has a header containing a list of field names, delimited by spaces:

# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Fields: time client-ip cs-method sc-status
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240

Splunk creates a header and transform in tranforms.conf:

[AutoHeader-1]
FIELDS="time", "client-ip", "cs-method", "sc-status"
DELIMS=" "

Splunk then ties the transform to the source by adding this to the source type stanza in props.conf:

# Original source type stanza you create
[MSExchange] 
CHECK_FOR_HEADER=TRUE
...
# source type stanza that Splunk creates
[MSExchange-1]
REPORT-AutoHeader = AutoHeader-1
...

Splunk automatically extracts the following fields from each event:

14:13:11 10.1.1.9 HELO 250

14:13:13 10.1.1.9 MAIL 250

14:13:19 10.1.1.9 RCPT 250

14:13:29 10.1.1.9 DATA 250

14:13:31 10.1.1.9 QUIT 240

CSV file

This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.

Example CSV file contents:

foo,bar,anotherfoo,anotherbar
100,21,this is a long file,nomore
200,22,wow,o rly?
300,12,ya rly!,no wai!

Splunk creates a header and transform in tranforms.conf (located in: $SPLUNK_HOME/etc/apps/learned/transforms.conf):

# Some previous automatic header-based field extraction 
[AutoHeader-1]
...
# source type stanza that Splunk creates 
[AutoHeader-2]
FIELDS="foo", "bar", "anotherfoo", "anotherbar"
DELIMS=","

Splunk then ties the transform to the source by adding this to a new source type stanza in props.conf:

...
[CSV-1] 
REPORT-AutoHeader = AutoHeader-2
...

Splunk extracts the following fields from each event:

100,21,this is a long file,nomore

200,22,wow,o rly?

300,12,ya rly!,no wai!

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.