Extract fields from file headers
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Extract fields from file headers
CSV files can have headers that contain field information. You can configure Splunk to automatically extract these fields during index-time event processing.
For example, a legacy CSV file could start with a header row that contains column headers for the values in subsequent rows:
name, location, message, "start date"
Important: You cannot use this method with universal or light forwarders. Instead, define a search-time field transform in
transforms.conf that uses the
DELIMS attribute. For details, see "Create and maintain search-time field extractions through configuration files" in the Knowledge Manager manual.
How automatic header-based field extraction works
When you enable automatic header-based field extraction for files from a specific source or source type, Splunk scans each file for header field information, which it then uses for field extraction. If a source has the necessary header information, Splunk extracts fields using delimiter-based key/value extraction.
Splunk does this just prior to indexing by changing the source type of the incoming data to
[original_sourcetype]-N, where N is a number). Next, it creates a stanza for this new source type in
props.conf, defines a delimeter-based extraction rule for the static table header in
transforms.conf, and then ties that extraction rule to the new source type back in its new
props.conf stanza. Finally, at search time, Splunk applies field transform to events from the source (the static table file).
Automatic header-based field extraction doesn't affect index size or indexing performance because it occurs during sourcetyping (before index time).
You can use fields extracted by Splunk for filtering and reporting just like any other field by selecting them from the fields sidebar in the Search view (select Pick fields to see a complete list of available fields).
Note: Splunk records the header line of a static table in a CSV file as an event. To perform a search that gets a count of the events in the file without including the header event, you can run a search that identifies the file as the source while explicitly excluding the comma delimited list of header names that appears in the event. Here's an example:
source=/my/file.csv NOT "header_field1,header_field2,header_field3,..." | stats count
Enable automatic header-based field extraction
Enable automatic header-based field extraction for any source or source type by editing
props.conf. Edit this file in
$SPLUNK_HOME/etc/system/local/ or in your own custom application directory in
For more information on configuration files in general, see "About configuration files" in the Admin manual.
To turn on automatic header-based field extraction for a source or source type, add
CHECK_FOR_HEADER=TRUE under that source or source type's stanza in
To turn off automatic header-based field extraction for a source or source type, set
Important: Changes you make to
props.conf (such as enabling automatic header-based field extraction) won't take effect until you reload Splunk.
Note: CHECK_FOR_HEADER must be in a source or source type stanza.
Changes Splunk makes to configuration files
If you enable automatic header-based field extraction for a source or source type, Splunk adds stanzas to copies of
$SPLUNK_HOME/etc/apps/learned/local/ when it extracts fields for that source or source type.
Important: Don't edit these stanzas after Splunk adds them, or the related extracted fields won't work.
Splunk creates a stanza in
transforms.conf for each source type with unique header information matching a source type defined in
props.conf. Splunk names each stanza it creates as
N in an integer that increments sequentially for each source that has a unique header (
[AutoHeader-1], [AutoHeader-2],...,[AutoHeader-N]). Splunk populates each stanza with transforms for the fields, using header information.
Splunk then adds new sourcetype stanzas to
props.conf for each source with a unique name, fieldset, and delimiter. Splunk names the stanzas as
yoursource is the source type configured with automatic header-based field extraction, and
N is an integer that increments sequentially for each transform in
For example, say you're indexing a number of CSV files. If each of those files has the same set of header fields and uses the same delimiter in
transforms.conf, Splunk maps the events indexed from those files to a source type of
props.conf. But if that batch of CSV files also includes a couple of files with unique sets of fields and delimiters, Splunk gives the events it indexes from those files source types of
csv-3, respectively. Events from files with the same source, fieldset, and delimiter in
transforms.conf will have the same source type value.
Note: If you want to enable automatic header-based field extraction for a particular source, and you have already manually specified a source type value for that source (either by defining the source type in Splunk Web or by directly adding the source type to a stanza in
inputs.conf), be aware that setting
CHECK_FOR_HEADER=TRUE for that source allows Splunk to override the source type value you've set for it with the source types generated by the automatic header-based field extraction process. This means that even though you may have set things up in
inputs.conf so that all csv files get a source type of
csv, once you set
CHECK_FOR_HEADER=TRUE, Splunk overrides that source type setting with the incremental source types described above.
Search and header-based field extraction
Use a wildcard to search for events associated with source types that Splunk generated during header-based field extraction.
For example, a search for
sourcetype="yoursource" looks like this:
This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.
Example CSV file contents:
foo,bar,anotherfoo,anotherbar 100,21,this is a long file,nomore 200,22,wow,o rly? 300,12,ya rly!,no wai!
Splunk creates a header and transform in
transforms.conf (located in:
# Some previous automatic header-based field extraction [AutoHeader-1] ... # source type stanza that Splunk creates [AutoHeader-2] FIELDS="foo", "bar", "anotherfoo", "anotherbar" DELIMS=","
Note that Splunk automatically detects that the delim is a comma.
Splunk then ties the transform to the source by adding this to a new source type stanza in
... [CSV-1] REPORT-AutoHeader = AutoHeader-2 ...
Splunk extracts the following fields from each event:
100,21,this is a long file,nomore
foo="100" bar="21" anotherfoo="this is a long file" anotherbar="nomore"
foo="200" bar="22" anotherfoo="wow" anotherbar="o rly?"
300,12,ya rly!,no wai!
foo="300" bar="12" anotherfoo="ya rly!" anotherbar="no wai!"
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around extracting fields.