Splunk® Enterprise

Getting Data In

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Extract fields from file headers

CSV files can have headers that contain field information. You can configure Splunk to automatically extract these fields during index-time event processing.

For example, a legacy CSV file could start with a header row that contains column headers for the values in subsequent rows:

name, location, message, "start date"

Important: You cannot use this method with universal or light forwarders. Instead, define a search-time field transform in transforms.conf that uses the DELIMS attribute. For details, see "Create and maintain search-time field extractions through configuration files" in the Knowledge Manager manual.

How automatic header-based field extraction works

When you enable automatic header-based field extraction for files from a specific source or source type, Splunk scans each file for header field information, which it then uses for field extraction. If a source has the necessary header information, Splunk extracts fields using delimiter-based key/value extraction.

Splunk does this just prior to indexing by changing the source type of the incoming data to [original_sourcetype]-N, where N is a number). Next, it creates a stanza for this new source type in props.conf, defines a delimeter-based extraction rule for the static table header in transforms.conf, and then ties that extraction rule to the new source type back in its new props.conf stanza. Finally, at search time, Splunk applies field transform to events from the source (the static table file).

Automatic header-based field extraction doesn't affect index size or indexing performance because it occurs during sourcetyping (before index time).

You can use fields extracted by Splunk for filtering and reporting just like any other field by selecting them from the fields sidebar in the Search view (select Pick fields to see a complete list of available fields).

Note: Splunk records the header line of a static table in a CSV file as an event. To perform a search that gets a count of the events in the file without including the header event, you can run a search that identifies the file as the source while explicitly excluding the comma delimited list of header names that appears in the event. Here's an example:

source=/my/file.csv NOT "header_field1,header_field2,header_field3,..." | stats count

Enable automatic header-based field extraction

Enable automatic header-based field extraction for any source or source type by editing props.conf. Edit this file in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/<app_name>/local.

For more information on configuration files in general, see "About configuration files" in the Admin manual.

To turn on automatic header-based field extraction for a source or source type, add CHECK_FOR_HEADER=TRUE under that source or source type's stanza in props.conf.

To turn off automatic header-based field extraction for a source or source type, set CHECK_FOR_HEADER=FALSE.

Important: Changes you make to props.conf (such as enabling automatic header-based field extraction) won't take effect until you reload Splunk.

Note: CHECK_FOR_HEADER must be in a source or source type stanza.

Changes Splunk makes to configuration files

If you enable automatic header-based field extraction for a source or source type, Splunk adds stanzas to copies of transforms.conf and props.conf in $SPLUNK_HOME/etc/apps/learned/local/ when it extracts fields for that source or source type.

Important: Don't edit these stanzas after Splunk adds them, or the related extracted fields won't work.

transforms.conf

Splunk creates a stanza in transforms.conf for each source type with unique header information matching a source type defined in props.conf. Splunk names each stanza it creates as [AutoHeader-N], where N in an integer that increments sequentially for each source that has a unique header ([AutoHeader-1], [AutoHeader-2],...,[AutoHeader-N]). Splunk populates each stanza with transforms for the fields, using header information.

props.conf

Splunk then adds new sourcetype stanzas to props.conf for each source with a unique name, fieldset, and delimiter. Splunk names the stanzas as [yoursource-N], where yoursource is the source type configured with automatic header-based field extraction, and N is an integer that increments sequentially for each transform in transforms.conf.

For example, say you're indexing a number of CSV files. If each of those files has the same set of header fields and uses the same delimiter in transforms.conf, Splunk maps the events indexed from those files to a source type of csv-1 in props.conf. But if that batch of CSV files also includes a couple of files with unique sets of fields and delimiters, Splunk gives the events it indexes from those files source types of csv-2 and csv-3, respectively. Events from files with the same source, fieldset, and delimiter in transforms.conf will have the same source type value.

Note: If you want to enable automatic header-based field extraction for a particular source, and you have already manually specified a source type value for that source (either by defining the source type in Splunk Web or by directly adding the source type to a stanza in inputs.conf), be aware that setting CHECK_FOR_HEADER=TRUE for that source allows Splunk to override the source type value you've set for it with the source types generated by the automatic header-based field extraction process. This means that even though you may have set things up in inputs.conf so that all csv files get a source type of csv, once you set CHECK_FOR_HEADER=TRUE, Splunk overrides that source type setting with the incremental source types described above.

Search and header-based field extraction

Use a wildcard to search for events associated with source types that Splunk generated during header-based field extraction.

For example, a search for sourcetype="yoursource" looks like this:

sourcetype=yoursource*

Example

This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.

Example CSV file contents:

foo,bar,anotherfoo,anotherbar
100,21,this is a long file,nomore
200,22,wow,o rly?
300,12,ya rly!,no wai!

Splunk creates a header and transform in transforms.conf (located in: $SPLUNK_HOME/etc/apps/learned/transforms.conf):

# Some previous automatic header-based field extraction 
[AutoHeader-1]
...
# source type stanza that Splunk creates 
[AutoHeader-2]
FIELDS="foo", "bar", "anotherfoo", "anotherbar"
DELIMS=","

Note that Splunk automatically detects that the delim is a comma.

Splunk then ties the transform to the source by adding this to a new source type stanza in props.conf:

...
[CSV-1] 
REPORT-AutoHeader = AutoHeader-2
...

Splunk extracts the following fields from each event:

100,21,this is a long file,nomore

  • foo="100" bar="21" anotherfoo="this is a long file" anotherbar="nomore"

200,22,wow,o rly?

  • foo="200" bar="22" anotherfoo="wow" anotherbar="o rly?"

300,12,ya rly!,no wai!

  • foo="300" bar="12" anotherfoo="ya rly!" anotherbar="no wai!"

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around extracting fields.

PREVIOUS
Create custom fields at index-time
  NEXT
About hosts

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7


Comments

TJ Green,<br /><br />I don't think there is, but why don't you try asking this question in Splunk Answers? - http://splunk-base.splunk.com/answers/ - Someone there should be able to tell you definitively.

Sgoodman, Splunker
October 11, 2012

If you don't have access to the config files, is there a way around the field definition issue? <br /><br />For example, I tried the below (w/ more fields) but got a "Regex: subpattern name is too long (maximum 32 characters)" error <br /><br />sourcetype="recordfile" | head 100 | rex field=_raw "^(?[^,]*),(?[^,]*),(?[^,]*)"

TJ Green
October 10, 2012

Rnavis: it's mostly because the page is talking about two things at the same time. The "automatically extract" part is thinking about "CHECK_FOR_HEADER", which is a props.conf key that makes Splunk itself check the header row, generate an AutoHeader rule. Then the rest of the page is talking about just the 'AutoHeader', rules, which might have been created automatically by CHECK_FOR_HEADER, or might have been created manually by an admin. <br /><br />In the case of manual AutoHeader rules, you have to put it on the search head, not the indexer. <br /><br />However for the CHECK_FOR_HEADER config, it's worse than that. if CHECK_FOR_HEADER is used with dist search or for that matter with any kind of Splunk forwarding, you have to keep not only the CHECK_FOR_HEADER, but also the autogenerated AutoHeader rules, mirrored on both search head and indexer.

Sideview
April 30, 2012

Rnavis,<br /><br />Thanks for catching that inconsistency. The field extraction happens at index-time, so the conf files need to be on the indexer(s), not on the search head. I've removed the note that stated otherwise.

Sgoodman, Splunker
March 14, 2012

Correct me if I'm wrong, but it appears there is conflicting information in this document.... Earlier in the document it states, <br />"You can configure Splunk to automatically extract these fields during index-time event processing." <br />Then later on it states the following: <br />Note: If you are using Splunk in a distributed environment, be sure to place the props.conf and transforms.conf files that you update for header-based field extraction on your search head, not the indexer" <br /><br />Doesn't this imply that the header based field extraction is search time vs index time?

Rnavis
March 12, 2012

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters