Knowledge Manager Manual

 


Extract fields from file headers during source typing

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Extract fields from file headers during source typing

Certain data sources and source types, such as CSV and MS Exchange log files, can have headers that contain field information. You can configure Splunk to automatically extract these fields during the source.

For example, a legacy CSV file--which is essentially a static table--could have a header row like

name, location, message, "start date"

which behaves like a series of column headers for the values listed afterwards in the file.

Note: Automatic header-based field extraction doesn't impact index size or indexing performance because it occurs during source typing (before index time).


How automatic header-based field extraction works

When you enable automatic header-based field extraction for a specific source or source type, Splunk scans it for header field information, which it then uses for field extraction. If a source has the necessary header information, Splunk extracts fields using delimiter-based key/value extraction.

Splunk does this by creating an entry in transforms.conf for the source, and populating it with transforms to extract the fields. Splunk also adds a source type stanza to props.conf to tie the field extraction transforms to the source. Splunk then applies the transforms to events from the source at search time.

You can use fields extracted by Splunk for filtering and reporting just like any other field by selecting them from the fields sidebar in the Search view (select Pick fields to see a complete list of available fields).


Enable automatic header-based field extraction

Enable automatic header-based field extraction for any source or source type by editing props.conf. Edit this file in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/.

For more information on configuration files in general, see "About configuration files" in the Admin manual.

To turn on automatic header-based field extraction for a source or source type, add CHECK_FOR_HEADER=TRUE under that source or source type's stanza in props.conf.

Important: If you have already defined a source type for the source for which you want to enable automatic header-based field extraction, you must edit the stanza in inputs.conf and remove the sourcetype = [name] before you set CHECK_FOR_HEADER=TRUE in props.conf, so that it doesn't conflict with the value generated by the automatic extraction.

Example props.conf entry for an MS Exchange source:

[MSExchange] 
CHECK_FOR_HEADER=TRUE
...

Note: Set CHECK_FOR_HEADER=FALSE to turn off automatic header-based field extraction for a source or source type.

Important: Changes you make to props.conf (such as enabling automatic header-based field extraction) won't take effect until you restart Splunk.

Changes Splunk makes to configuration files

If you enable automatic header-based field extraction for a source or sourcetype, Splunk adds stanzas to copies of transforms.conf and props.conf in $SPLUNK_HOME/etc/apps/learned/ when it extracts fields for that source or sourcetype.

Important: Don't edit these stanzas after Splunk adds them, or the related extracted fields won't work.

Splunk creates a stanza in transforms.conf for each source type with unique header information matching a source type defined in props.conf. Splunk names each stanza it creates as [AutoHeader-M], where M in an integer that increments sequentially for each source that has a unique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-M]). Splunk populates each stanza with transforms that the fields (using header information).

Important: If you have already defined a source type for the source for which you want to enable automatic header-based field extraction, you must edit the stanza in inputs.conf and remove the sourcetype = [name] before you set CHECK_FOR_HEADER=TRUE in props.conf, so that it doesn't conflict with the value generated by the automatic extraction.

Here is an example of an transforms.conf entry that Splunk might make automatically for the MS Exchange source that was enabled for header-based field extraction in the preceding example:

...
[AutoHeader-1]
FIELDS="time", "client-ip", "cs-method", "sc-status"
DELIMS=" "
...

Splunk then adds new source type stanzas to props.conf for each unique source. Splunk names the stanzas as [yoursource-N], where yoursource is the source type configured with automatic header-based field extraction, and N is an integer that increments sequentially for each transform in transforms.conf.

Example props.conf entry (including the MS Exchange file from the introduction):

# the original source you configured
[MSExchange] 
CHECK_FOR_HEADER=TRUE
...
# source type that Splunk added to <code>transforms.conf</code> to handle transforms for automatic header-based field extraction for the same source
[MSExchange-1]
REPORT-AutoHeader = AutoHeader-1
...

Note about search and header-based field extraction

Use a wildcard to search for events associated with source types that Splunk generated during header-based field extraction.

For example, a search for sourcetype="yoursource" looks like this:

sourcetype=yoursource*

Examples of header-based field extraction

These examples show how header-based field extraction works with common source types.

MS Exchange source file

This example shows how Splunk extracts fields from an MS Exchange file using automatic header-based field extraction.

This sample MS Exchange log file has a header containing a list of field names, delimited by spaces:

# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Fields: time client-ip cs-method sc-status
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240

Splunk creates a header and transform in tranforms.conf:

[AutoHeader-1]
FIELDS="time", "client-ip", "cs-method", "sc-status"
DELIMS=" "

Note that Splunk automatically detects that the delimiter is a whitespace.

Splunk then ties the transform to the source by adding this to the source type stanza in props.conf:

# Original source type stanza you create
[MSExchange] 
CHECK_FOR_HEADER=TRUE
...
# source type stanza that Splunk creates
[MSExchange-1]
REPORT-AutoHeader = AutoHeader-1
...

Splunk automatically extracts the following fields from each event:

14:13:11 10.1.1.9 HELO 250

14:13:13 10.1.1.9 MAIL 250

14:13:19 10.1.1.9 RCPT 250

14:13:29 10.1.1.9 DATA 250

14:13:31 10.1.1.9 QUIT 240

CSV file

This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.

Example CSV file contents:

foo,bar,anotherfoo,anotherbar
100,21,this is a long file,nomore
200,22,wow,o rly?
300,12,ya rly!,no wai!

Splunk creates a header and transform in tranforms.conf (located in: $SPLUNK_HOME/etc/apps/learned/transforms.conf):

# Some previous automatic header-based field extraction 
[AutoHeader-1]
...
# source type stanza that Splunk creates 
[AutoHeader-2]
FIELDS="foo", "bar", "anotherfoo", "anotherbar"
DELIMS=","

Note that Splunk automatically detects that the delim is a comma.

Splunk then ties the transform to the source by adding this to a new source type stanza in props.conf:

...
[CSV-1] 
REPORT-AutoHeader = AutoHeader-2
...

Splunk extracts the following fields from each event:

100,21,this is a long file,nomore

200,22,wow,o rly?

300,12,ya rly!,no wai!

This documentation applies to the following versions of Splunk: 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!