Splunk® Enterprise

Getting Data In

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Extract fields from file headers

Note: The feature described in this topic (CHECK_FOR_HEADER attribute) has been deprecated in Splunk version 5.0. For a list of all deprecated features, see the topic "Deprecated features" in the Release Notes.

CSV files can have headers that contain field information. You can configure Splunk to automatically extract these fields during index-time event processing.

For example, a legacy CSV file could start with a header row that contains column headers for the values in subsequent rows:

name, location, message, "start date"

Important: You cannot use this method with universal or light forwarders. Instead, define a search-time field transform in transforms.conf that uses the DELIMS attribute. For details, see "Create and maintain search-time field extractions through configuration files" in the Knowledge Manager manual.

How automatic header-based field extraction works

When you enable automatic header-based field extraction for files from a specific source or source type, Splunk scans each file for header field information, which it then uses for field extraction. If a source has the necessary header information, Splunk extracts fields using delimiter-based key/value extraction.

Splunk does this just prior to indexing by changing the source type of the incoming data to [original_sourcetype]-N, where N is a number). Next, it creates a stanza for this new source type in props.conf, defines a delimeter-based extraction rule for the static table header in transforms.conf, and then ties that extraction rule to the new source type back in its new props.conf stanza. Finally, at search time, Splunk applies field transform to events from the source (the static table file).

Automatic header-based field extraction doesn't affect index size or indexing performance because it occurs during sourcetyping (before index time).

You can use fields extracted by Splunk for filtering and reporting just like any other field by selecting them from the fields sidebar in the Search view (select Pick fields to see a complete list of available fields).

Note: Splunk records the header line of a static table in a CSV file as an event. To perform a search that gets a count of the events in the file without including the header event, you can run a search that identifies the file as the source while explicitly excluding the comma delimited list of header names that appears in the event. Here's an example:

source=/my/file.csv NOT "header_field1,header_field2,header_field3,..." | stats count

Enable automatic header-based field extraction

Enable automatic header-based field extraction for any source or source type by editing props.conf. Edit this file in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/<app_name>/local.

For more information on configuration files in general, see "About configuration files" in the Admin manual.

To turn on automatic header-based field extraction for a source or source type, add CHECK_FOR_HEADER=TRUE under that source or source type's stanza in props.conf.

To turn off automatic header-based field extraction for a source or source type, set CHECK_FOR_HEADER=FALSE.

Important: Changes you make to props.conf (such as enabling automatic header-based field extraction) won't take effect until you reload Splunk.

Note: CHECK_FOR_HEADER must be in a source or source type stanza.

Changes Splunk makes to configuration files

If you enable automatic header-based field extraction for a source or source type, Splunk adds stanzas to copies of transforms.conf and props.conf in $SPLUNK_HOME/etc/apps/learned/local/ when it extracts fields for that source or source type.

Important: Don't edit these stanzas after Splunk adds them, or the related extracted fields won't work.

transforms.conf

Splunk creates a stanza in transforms.conf for each source type with unique header information matching a source type defined in props.conf. Splunk names each stanza it creates as [AutoHeader-N], where N in an integer that increments sequentially for each source that has a unique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-N]). Splunk populates each stanza with transforms for the fields, using header information.

props.conf

Splunk then adds new sourcetype stanzas to props.conf for each source with a unique name, fieldset, and delimiter. Splunk names the stanzas as [yoursource-N], where yoursource is the source type configured with automatic header-based field extraction, and N is an integer that increments sequentially for each transform in transforms.conf.

For example, say you're indexing a number of CSV files. If each of those files has the same set of header fields and uses the same delimiter in transforms.conf, Splunk maps the events indexed from those files to a source type of csv-1 in props.conf. But if that batch of CSV files also includes a couple of files with unique sets of fields and delimiters, Splunk gives the events it indexes from those files source types of csv-2 and csv-3, respectively. Events from files with the same source, fieldset, and delimiter in transforms.conf will have the same source type value.

Note: If you want to enable automatic header-based field extraction for a particular source, and you have already manually specified a source type value for that source (either by defining the source type in Splunk Web or by directly adding the source type to a stanza in inputs.conf), be aware that setting CHECK_FOR_HEADER=TRUE for that source allows Splunk to override the source type value you've set for it with the source types generated by the automatic header-based field extraction process. This means that even though you may have set things up in inputs.conf so that all csv files get a source type of csv, once you set CHECK_FOR_HEADER=TRUE, Splunk overrides that source type setting with the incremental source types described above.

Search and header-based field extraction

Use a wildcard to search for events associated with source types that Splunk generated during header-based field extraction.

For example, a search for sourcetype="yoursource" looks like this:

sourcetype=yoursource*

Example

This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.

Example CSV file contents:

foo,bar,anotherfoo,anotherbar
100,21,this is a long file,nomore
200,22,wow,o rly?
300,12,ya rly!,no wai!

Splunk creates a header and transform in transforms.conf (located in: $SPLUNK_HOME/etc/apps/learned/transforms.conf):

# Some previous automatic header-based field extraction 
[AutoHeader-1]
...
# source type stanza that Splunk creates 
[AutoHeader-2]
FIELDS="foo", "bar", "anotherfoo", "anotherbar"
DELIMS=","

Note that Splunk automatically detects that the delim is a comma.

Splunk then ties the transform to the source by adding this to a new source type stanza in props.conf:

...
[CSV-1] 
REPORT-AutoHeader = AutoHeader-2
...

Splunk extracts the following fields from each event:

100,21,this is a long file,nomore

  • foo="100" bar="21" anotherfoo="this is a long file" anotherbar="nomore"

200,22,wow,o rly?

  • foo="200" bar="22" anotherfoo="wow" anotherbar="o rly?"

300,12,ya rly!,no wai!

  • foo="300" bar="12" anotherfoo="ya rly!" anotherbar="no wai!"

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around extracting fields.

PREVIOUS
Create custom fields at index time
  NEXT
About hosts

This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18


Comments

Hi Yuvalba,<br /><br />Use iis-2 and set CHECK_FOR_HEADER=FALSE. Splunk only makes changes to files when you ask it to check for the header. Once you've gotten Splunk to find the headers of the files the way you want, you can then reuse that sourcetype to index the remainder of your files.

Malmoore, Splunker
August 5, 2013

I have multiple IIS log file with same structure.<br />Splunk created sourcetype "iis-2" which works<br /><br />Now, when I configure monitoring of additional such files should I choose again the original iis type or use the iis-2 type? (I would like to prevent further type from being created because there is no need)

Yuvalba
July 27, 2013

Hi Xeshxzh. You can always use the DELIMS stanza in transforms.conf to manually set field extraction from a CSV header definition. We are working on new ways to auto-extract headers and will not remove CHECK_FOR_HEADER from the product until these improved methods are introduced.

Ogdin, Splunker
January 19, 2013

If we are deprecating this feature, what are the alternatives?<br />Our data is stored in CSV files with header, so this is a very important feature for us. Thank you!

Xeshxzh
January 18, 2013

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters