Extract fields from file headers at index time
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Extract fields from file headers at index time
Certain data sources and source types, such as CSV and MS Exchange log files, can have headers that contain field information. You can configure Splunk to automatically extract these fields during index-time event processing.
For example, a legacy CSV file--which is essentially a static table--could have a header row like
name, location, message, "start date"
which behaves like a series of column headers for the values listed afterwards in the file.
Note: Automatic header-based field extraction doesn't impact index size or indexing performance because it occurs during source typing (before index time).
How automatic header-based field extraction works
When you enable automatic header-based field extraction for a specific source or source type, Splunk scans it for header field information, which it then uses for field extraction. If a source has the necessary header information, Splunk extracts fields using delimiter-based key/value extraction.
Splunk does this at index time by changing the source type of the incoming data to
[original_sourcetype]-N, where N is a number). Next, it creates a stanza for this new source type in
props.conf, defines a delimeter-based extraction rule for the static table header in
transforms.conf, and then ties that extraction rule to the new source type back in its new
props.conf stanza. Finally, at search time, Splunk applies field transform to events from the source (the static table file).
You can use fields extracted by Splunk for filtering and reporting just like any other field by selecting them from the fields sidebar in the Search view (select Pick fields to see a complete list of available fields).
Note: Splunk records the header line of a static table in a CSV or similar file as an event. To perform a search that gets a count of the events in the file without including the header event, you can run a search that identifies the file as the source while explicitly excluding the comma delimited list of header names that appears in the event. Here's an example:
source=/my/file.csv NOT "header_field1,header_field2,header_field3,..." | stats count
Enable automatic header-based field extraction
Enable automatic header-based field extraction for any source or source type by editing
props.conf. Edit this file in
$SPLUNK_HOME/etc/system/local/, or your own custom application directory in
Note: If you are using Splunk in a distributed environment, be sure to place the
transforms.conf files that you update for header-based field extraction on your search head, not the indexer.
For more information on configuration files in general, see "About configuration files" in the Admin manual.
To turn on automatic header-based field extraction for a source or source type, add
CHECK_FOR_HEADER=TRUE under that source or source type's stanza in
props.conf entry for an MS Exchange source:
[MSExchange] CHECK_FOR_HEADER=TRUE ...
[source::C:\\Program Files\\Exchsrvr\\ServerName.log] sourcetype=MSExchange [MSExchange] CHECK_FOR_HEADER=TRUE
CHECK_FOR_HEADER=FALSE to turn off automatic header-based field extraction for a source or source type.
Important: Changes you make to
props.conf (such as enabling automatic header-based field extraction) won't take effect until you restart Splunk.
Note: CHECK_FOR_HEADER must be in a source or source type stanza.
Changes Splunk makes to configuration files
If you enable automatic header-based field extraction for a source or source type, Splunk adds stanzas to copies of
$SPLUNK_HOME/etc/apps/learned/local/ when it extracts fields for that source or source type.
Important: Don't edit these stanzas after Splunk adds them, or the related extracted fields won't work.
Splunk creates a stanza in
transforms.conf for each source type with unique header information matching a source type defined in
props.conf. Splunk names each stanza it creates as
N in an integer that increments sequentially for each source that has a unique header (
[AutoHeader-1], [AutoHeader-2],...,[AutoHeader-N]). Splunk populates each stanza with transforms that the fields (using header information).
transforms.conf entry that Splunk would add for the MS Exchange source, which was enabled for automatic header-based field extraction in the preceding example:
... [AutoHeader-1] DELIMS=" " FIELDS="time", "client-ip", "cs-method", "sc-status" ...
Splunk then adds new source type stanzas to
props.conf for each source with a unique name, set of fields, and delimiter. Splunk names the stanzas as
yoursource is the source type configured with automatic header-based field extraction, and
N is an integer that increments sequentially for each transform in
For example, say you're indexing a number of CSV files. If each of those files has the same set of header fields and with the same delimiter in
transforms.conf, Splunk maps the events indexed from those files to a source type of
props.conf. But if that batch of CSV files also includes a couple of files with unique sets of fields and delimiters, Splunk gives the events it indexes from those files source types of
csv-3, respectively. Events from files with the same source, fieldset, and delimiter in
transforms.conf will have the same source type value.
Note: If you want to enable automatic header-based field extraction for a particular source, and you have already manually specified a source type value for that source (either by defining the source type in Splunk Web or by directly adding the source type to a stanza in
inputs.conf) be aware that setting
CHECK_FOR_HEADER=TRUE for that source allows Splunk to override the source type value you've set for it with the source types generated by the automatic header-based field extraction process. This means that even though you may have set things up in
inputs.conf so that all csv files get a source type of
csv, once you set
CHECK_FOR_HEADER=TRUE, Splunk overrides that source type setting with the incremental source type names described above.
Here's the source type that Splunk would add to
props.conf to tie the transform to the MS Exchange source mentioned earlier:
[MSExchange-1] TRANSFORMS-AutoHeader = AutoHeader-1 ...
Note about search and header-based field extraction
Use a wildcard to search for events associated with source types that Splunk generated during header-based field extraction.
For example, a search for
sourcetype="yoursource" looks like this:
Examples of header-based field extraction
These examples show how header-based field extraction works with common source types.
MS Exchange source file
This example shows how Splunk extracts fields from an MS Exchange file using automatic header-based field extraction.
This sample MS Exchange log file has a header containing a list of field names, delimited by spaces:
# Message Tracking Log File # Exchange System Attendant Version 6.5.7638.1 # Fields: time client-ip cs-method sc-status 14:13:11 10.1.1.9 HELO 250 14:13:13 10.1.1.9 MAIL 250 14:13:19 10.1.1.9 RCPT 250 14:13:29 10.1.1.9 DATA 250 14:13:31 10.1.1.9 QUIT 240
Splunk creates a header and transform in
[AutoHeader-1] FIELDS="time", "client-ip", "cs-method", "sc-status" DELIMS=" "
Note that Splunk automatically detects that the delimiter is a whitespace.
Splunk then ties the transform to the source by adding this to the source type stanza in
# Original source type stanza you create [MSExchange] CHECK_FOR_HEADER=TRUE ... # source type stanza that Splunk creates [MSExchange-1] REPORT-AutoHeader = AutoHeader-1 ...
Splunk automatically extracts the following fields from each event:
14:13:11 10.1.1.9 HELO 250
time="14:13:11" client-ip="10.1.1.9" cs-method="HELO" sc-status="250"
14:13:13 10.1.1.9 MAIL 250
time="14:13:13" client-ip="10.1.1.9" cs-method="MAIL" sc-status="250"
14:13:19 10.1.1.9 RCPT 250
time="14:13:19" client-ip="10.1.1.9" cs-method="RCPT" sc-status="250"
14:13:29 10.1.1.9 DATA 250
time="14:13:29" client-ip="10.1.1.9" cs-method="DATA" sc-status="250"
14:13:31 10.1.1.9 QUIT 240
time="14:13:31" client-ip="10.1.1.9" cs-method="QUIT" sc-status="240"
This example shows how Splunk extracts fields from a CSV file using automatic header-based field extraction.
Example CSV file contents:
foo,bar,anotherfoo,anotherbar 100,21,this is a long file,nomore 200,22,wow,o rly? 300,12,ya rly!,no wai!
Splunk creates a header and transform in
transforms.conf (located in:
# Some previous automatic header-based field extraction [AutoHeader-1] ... # source type stanza that Splunk creates [AutoHeader-2] FIELDS="foo", "bar", "anotherfoo", "anotherbar" DELIMS=","
Note that Splunk automatically detects that the delim is a comma.
Splunk then ties the transform to the source by adding this to a new source type stanza in
... [CSV-1] REPORT-AutoHeader = AutoHeader-2 ...
Splunk extracts the following fields from each event:
100,21,this is a long file,nomore
foo="100" bar="21" anotherfoo="this is a long file" anotherbar="nomore"
foo="200" bar="22" anotherfoo="wow" anotherbar="o rly?"
300,12,ya rly!,no wai!
foo="300" bar="12" anotherfoo="ya rly!" anotherbar="no wai!"
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around extracting fields.