
Extract fields from files with structured data
Many structured data files, such as comma-separated value (CSV) files and Internet Information Server (IIS) web server logs, have information in the file header that can be extracted as fields during indexing. You can configure Splunk Enterprise and the Splunk universal forwarder to automatically extract these values into fields that can be searched. For example, a CSV file starts with a header row that contains column headers for the values in subsequent rows:
host,status,message,"start date" srv1.splunk.com,error,"No space left on device",2013-06-10T06:35:00 srv2.splunk.com,ok,-,2013-06-11T06:00:00
Input types that the indexed field extraction feature supports
This feature works with the following input types:
- File-based inputs only (such as monitoring files, directories, or archives.)
- Inputs that use the
oneshot
input type (or through the "Upload" feature in Splunk Web.)
It does not work with modular inputs, network inputs, or any other type of input.
Use configuration files to enable automatic header-based field extraction
You can also use a combination of inputs.conf and props.conf to extract fields from structured data files. Edit these files in $SPLUNK_HOME/etc/system/local/
or in your own custom application directory in $SPLUNK_HOME/etc/apps/<app_name>/local
. Inputs.conf specifies the files you want to monitor and the source type to be applied to the events they contain, and props.conf defines the source types themselves. If you have Splunk Enterprise, you can edit the settings on indexer machines or machines where you are running the Splunk universal forwarder. You must restart Splunk Enterprise for any changes that you make to inputs.conf
and props.conf
to take effect. If you have Splunk Cloud and want configure the extraction of fields from structured data, use the Splunk universal forwarder.
Special characters or values are available for some attributes
You can use special characters or values such as spaces, vertical and horizontal tabs, and form feeds in some attributes. The following table lists these characters:
Special value | Props.conf representation |
---|---|
form feed | \f
|
space | space or ' '
|
horizontal tab | \t or tab
|
vertical tab | \v
|
whitespace | whitespace
|
none | none or \0
|
file separator | fs or \034
|
group separator | gs or \035
|
record separator | rs or \036
|
unit separator | us or \037
|
You can use these special characters for the following attributes only:
FIELD_DELIMITER
FIELD_HEADER_REGEX
FIELD_QUOTE
Edit configuration files to create and reference source types
To create and reference the new source types to extract files with headers:
- Using a text editor, open the file
props.conf
in the appropriate location as described in Enable automatic header-based field extraction earlier in this topic. If theprops.conf
file does not exist, you must create it. - Define a new sourcetype by creating a stanza which tells Splunk Enterprise how to extract the file header and structured file data, using the attributes described above. You can define as many stanzas - and thus, as many sourcetypes - as you like in the file. For example:
[HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim] FIELD_DELIMITER=, HEADER_FIELD_DELIMITER=\s FIELD_QUOTE="
- Save the props.conf file and close it.
- Create a file
inputs.conf
in the same directory, if it does not already exist. - Open the file for editing.
- Add a stanza which represents the file or files that you want Splunk Enterprise to extract file header and structured data from. You can add as many stanzas as you wish for files or directories from which you want to extract header and structured data. For example:
[monitor:///opt/test/data/StructuredData/HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim.csv] sourcetype=HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim
- Save the inputs.conf file and close it.
- Restart Splunk Enterprise or the universal forwarder for the changes to take effect.
Caveats to extracting fields from structured data files
Splunk software does not parse structured data that has been forwarded to an indexer
When you forward structured data to an indexer, it is not parsed when it arrives at the indexer, even if you have configured props.conf
on that indexer with INDEXED_EXTRACTIONS
. Forwarded data skips the following pipelines on the indexer, which precludes any parsing of that data on the indexer:
parsing
merging
typing
The forwarded data must arrive at the indexer already parsed.
Field extraction settings for forwarded structured data must be configured on the forwarder
If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf
settings that define the field extractions on the forwarder that sends the data. This includes configuration of INDEXED_EXTRACTIONS
and any other parsing, filtering, anonymizing, and routing rules. Performing these actions on the instance that indexes the data will have no effect, as the forwarded data must arrive at the indexer already parsed.
When you use Splunk Web to modify event break and time stamp settings, it records all of the proposed changes as a stanza for props.conf
. You can find those settings in the "Advanced" tab on the "Set Source type" page.
Use the "Copy to clipboard" link in the "Advanced" tab to copy the proposed changes to props.conf
to the system clipboard. You can then paste this stanza into props.conf
in a text editor on Splunk instances that monitor and forward similar files.
Only header fields containing data are indexed
When Splunk software extracts header fields from structured data files, it only extracts those fields where data is present in at least one row. If the header field has no data in any row, it is skipped (that is, not indexed). Take, for example, the following csv file:
header1,header2,header3,header4,header5 one,1,won,,111 two,2,too,,222 three,3,thri,,333 four,4,fore,,444 five,5,faiv,,555
When Splunk software reads this file, it notes that the rows in the header4
column are all empty, and does not index that header field or any of the rows in it. This means that neither header4
nor any of the data in its row can be searched for in the index.
If, however, the header4
field contains rows with empty strings (for example, ""), the field and all the rows underneath it are indexed.
No support for mid-file renaming of header fields
Some software, such as Internet Information Server, supports the renaming of header fields in the middle of the file. Splunk software does not recognize changes such as this. If you attempt to index a file that has header fields renamed within the file, the renamed header field is not indexed.
Example configuration and data files
Following are an example inputs.conf and props.conf to give you an idea of how to use the file header extraction attributes.
To extract the data locally, edit inputs.conf and props.conf to define inputs and sourcetypes for the structured data files, and use the attributes described above to specify how to deal with the files. To forward this data to another Splunk instance, edit inputs.conf and props.conf on the forwarding instance, and props.conf on the receiving instance.
Inputs.conf
[monitor:///opt/test/data/StructuredData/CSVWithFewHeaderFieldsWithoutAnyValues.csv] sourcetype=CSVWithFewHeaderFieldsWithoutAnyValues [monitor:///opt/test/data/StructuredData/VeryLargeCSVFile.csv] sourcetype=VeryLargeCSVFile [monitor:///opt/test/data/StructuredData/UselessLongHeaderToBeIgnored.log] sourcetype=UselessLongHeaderToBeIgnored [monitor:///opt/test/data/StructuredData/HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim.csv] sourcetype=HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim [monitor:///opt/test/data/FieldHeaderRegex.log] sourcetype=ExtractCorrectHeaders
Props.conf
[CSVWithFewHeaderFieldsWithoutAnyValues] FIELD_DELIMITER=, [VeryLargeCSVFile] FIELD_DELIMITER=, [UselessLongHeaderToBeIgnored] HEADER_FIELD_LINE_NUMBER=35 TIMESTAMP_FIELDS=Date,Time,TimeZone FIELD_DELIMITER=\s FIELD_QUOTE=" [HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim] FIELD_DELIMITER=, HEADER_FIELD_DELIMITER=\s FIELD_QUOTE=" [ExtractCorrectHeaders] FIELD_HEADER_REGEX=Ignore_This_Stuff:\s(.*) FIELD_DELIMITER=,
Sample files
The following are snippets of the files referenced in the above inputs.conf and props.conf examples, to give you an idea of what the files look like.
You might need to scroll right quite a bit to see all of the content.
CSVWithFewHeaderFieldsWithoutAnyValues.csv
vqmcallhistoryid,serialnumber,vqmavgjbenvdelay,vqmavgjbenvnegdelta,vqmavgjbenvposdelta,vqmbitrate,vqmburstcount,vqmburstlengthavgms,vqmburstlengthavgpkts,vqmburstlostrateavg,vqmburstrvaluelq,vqmccmid,vqmcalldurationms,vqmcallid,vqmcallstart,vqmdegradationdelay,vqmdegradationdiscard,vqmdegradationecholevel,vqmdegradationloss,vqmdegradationnoiselevel,vqmdegradationrecency,vqmdegradationsignallevel,vqmdegradationvocoder,vqmdelayavgmsec,vqmdelaycurrentmsec,vqmdelaydecreasecount,vqmdelayincreasecount,vqmdelaymaxmsec,vqmdelayminmsec,vqmdestifc,vqmdestintname,vqmdiscardrateavg,vqmdiscards,vqmdscp,vqmduplicatepkts,vqmearlypeakjitterms,vqmearlypkts,vqmearlythresholdms,vqmearlythresholdpc,vqmearlytotalcount,vqmearlyunderthresholdcount,vqmexcessburst,vqmexcessgap,vqmexternalrvaluecqin,vqmexternalrvaluecqout,vqmexternalrvaluelqin,vqmexternalrvaluelqout,vqmfrom,vqmgapcount,vqmgaplengthavgms,vqmgaplengthavgpkts,vqmgaplostrateavg,vqmgaprvalue,vqmjbmaxdelay,vqmjbmindelay,vqmjbnomdelay,vqmjbtype,vqmjbresetcount,vqmlatepeakjitterms,vqmlatepkts,vqmlatethresholdms,vqmlatethresholdpc,vqmlatetotalcount,vqmlateunderthreshold,vqmlocaldelayaveragems,vqmlocaldelaymaximumms,vqmlocaldelayminimumms,vqmloss,vqmlossrateavg,vqmmaxjbenvnegdelta,vqmmaxjbenvposdelta,vqmmeanpdvabsmaxavg,vqmmeanpdvavg,vqmmeanpdvmaxavg,vqmmeanpdvtrue,vqmminjbenvnegdelta,vqmminjbenvposdelta,vqmmoscq,vqmmoscqfixed,vqmmoslq,vqmmoslqfixed,vqmmosnominal,vqmmosnominalfixed,vqmmospq,vqmmospqfixed,vqmnetworklossrateavg,vqmonewaydelayaverage,vqmonewaydelayinstant,vqmonewaydelaymaximum,vqmoriginationdelayaverage,vqmoriginationdelayinstant,vqmoriginationdelaymaximum,vqmoutoforder,vqmoverrundiscardpkts,vqmppdvms,vqmpdvaveragems,vqmpdvmaximumms,vqmpktsrcvd,vqmrvaluecq,vqmrvalueg107,vqmrvaluelq,vqmrvaluenominal,vqmreliabilityindex,vqmresynccount,vqmrtdelayaverage,vqmrtdelayinstant,vqmrtdelaymaximum,vqmrtpdesturi,vqmrtpdestinationip,vqmrtpdestinationport,vqmrtpssrc,vqmrtpsourceip,vqmrtpsourceport,vqmrtpsourceuri,vqmsourceintname,vqmsrcifc,vqmstreamquality,vqmterminatedelayaverage,vqmterminatedelayinstant,vqmterminatedelaymaximum,vqmthroughputindex,vqmto,vqmunderrundiscardpkts,vqmvocoderclass,vqmvocodertype,created,modified 99152,CFG0730084,-3,-2,356,64000,1,280,14,14.29,36,3499,201000,BW163736844290611-173170743@10.253.143.13,2011-06-29 12:37:37.292,0,4.68,1.43,0.19,0,0,0,0,52,60,15,17,60,10,0,Loopback,0.48,48,46,0,30,1334,10,99.55,10008,9962,0,0,,,,,6096147095,2,100590,5029,0.48,87,200,10,50,1,625,487.5,8767,50,99.58,93,50,,518,,2,0.5,-60,975,488,179,192,999.3,0,0,4.07,,4.12,,4.2,,4.03,,0.02,63,76,76,,,,43,0,6.8,0,520,10054,87,87,89,93,9,79,12,12,12,6096147089,10.255.43.12,10010,706222942,10.253.136.231,25110,6096147095,eth 0/1,2,0,54,80,80,18500,6096147089,48,1,0,2011-06-29 12:41:47.303,2011-06-29 12:41:47.303 99154,CFG0730084,-3,-1,251,64000,4,195,9,20.52,28,3494,359000,BW163502270290611594566299@10.253.143.13,2011-06-29 12:35:02.324,0,2.88,1.11,3.44,0,0,0,0,40,40,26,24,50,10,0,Loopback,0.31,54,46,0,31,2455,10,99.8,17769,17732,0,0,,,,,6096147095,5,71556,3577,0.62,87,200,10,50,1,1120,496.5,15437,50,99.73,123,74,,529,,65,0.67,-62,993,496.5,126,139,3404.7,0,0,4.04,,4.07,,4.2,,3.94,,0.36,58,64,69,,,,49,0,286,0,529,17839,86,86,87,93,9,137,8,8,8,6096147089,10.255.43.12,10000,536353626,10.253.136.231,25096,6096147095,eth 0/1,2,0,48,60,70,30400,6096147089,54,1,0,2011-06-29 12:41:47.342,2011-06-29 12:41:47.342
VeryLargeCSVFile.csv
IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Location,X,Y 030203898,FRAUD,"FORGERY, CREDIT CARD",Tuesday,02/18/2003,16:30,NORTHERN,NONE,2800 Block of VAN NESS AV,-122.424612993055,37.8014488257836 000038261,WARRANTS,WARRANT ARREST,Thursday,04/17/2003,22:45,NORTHERN,"ARREST, BOOKED",POLK ST / SUTTER ST,-122.420120319211,37.7877570602182 030203901,LARCENY/THEFT,GRAND THEFT PICKPOCKET,Tuesday,02/18/2003,16:05,NORTHERN,NONE,VAN NESS AV / MCALLISTER ST,-122.42025048261,37.7800745746105 030203923,DRUG/NARCOTIC,SALE OF BASE/ROCK COCAINE,Tuesday,02/18/2003,17:00,BAYVIEW,"ARREST, BOOKED",1600 Block of KIRKWOOD AV,-122.390718076188,37.7385560584619 030203923,OTHER OFFENSES,CONSPIRACY,Tuesday,02/18/2003,17:00,BAYVIEW,"ARREST, BOOKED",1600 Block of KIRKWOOD AV,-122.390718076188,37.7385560584619 030203923,OTHER OFFENSES,PROBATION VIOLATION,Tuesday,02/18/2003,17:00,BAYVIEW,"ARREST, BOOKED",1600 Block of KIRKWOOD AV,-122.390718076188,37.7385560584619
UselessLongHeaderToBeIgnored.log
************ Start Display Current Environment ************ WebSphere Platform 6.1 [ND 6.1.0.21 cf210844.13] running with process name sammys_cell_A\fsgwws189Node_A\sammys_A_c01_s189_m06 and process id 17904 Detailed IFix information: ID: 6.1.0-WS-WASSDK-AixPPC32-FP0000021 BuildVrsn: null Desc: Software Developer Kit 6.1.0.21 ID: 6.1.0-WS-WAS-AixPPC32-FP0000021 BuildVrsn: null Desc: WebSphere Application Server 6.1.0.21 ID: 6.1.0-WS-WASSDK-AixPPC32-FP0000019 BuildVrsn: null Desc: Software Developer Kit 6.1.0.19 ID: 6.1.0-WS-WAS-AixPPC32-FP0000019 BuildVrsn: null Desc: WebSphere Application Server 6.1.0.19 ID: sdk.FP61021 BuildVrsn: null Desc: WebSphere Application Server 6.1.0.21 ID: sdk.FP61019 BuildVrsn: null Desc: WebSphere Application Server 6.1.0.19 ID: was.embed.common.FP61021 BuildVrsn: null Desc: WebSphere Application Server 6.1.0.21 ID: was.embed.FP61021 BuildVrsn: null Desc: WebSphere Application Server 6.1.0.21
HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim.csv
"Field 1" "Field 3" "Field 4" "Field 6" Value11,Value12,Value13,Value14,Value15,Value16 Value21,Value22,Value23,Value24,Value25 Value31,Value32,Value33,Value34,Value35, Value36
FieldHeaderRegex.log
Garbage Garbage Garbage Ignore_This_Stuff: Actual_Header1 Actual_Header2
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around extracting fields.
PREVIOUS Create custom fields at index time |
NEXT About hosts |
This documentation applies to the following versions of Splunk Cloud™: 7.2.4, 7.2.6, 7.2.7, 7.2.8, 7.2.9
Feedback submitted, thanks!