Configure extractions of multivalue fields with fields.conf
Multivalue fields are fields that can appear multiple times in an event and have a different value for each appearance. One of the more common examples of multivalue fields is that of email address fields, which typically appears two to three times in a single sendmail event—once for the sender, another time for the list of recipients, and possibly a third time for the list of Cc addresses, if one exists. If all of these fields are labeled identically (as
AddressList, for example), they lose meaning that they might otherwise have if they're identified separately as
Multivalue fields are parsed at search time, which enables you to process the resulting values in the search pipeline. Search commands that work with multivalue fields include makemv, mvcombine, mvexpand, and nomv. For more information on these and other commands see Manipulate and evaluate fields with multiple values in the Search Manual. The complete command reference is in the Search Reference manual.
Use the TOKENIZER setting to define a multivalue field in fields.conf
You can use the
TOKENIZER setting to define a multivalue field in
fields.conf. At search time,
TOKENIZER uses a regular expression to tell the Splunk platform how to recognize and extract multiple field values for a recurring field in an event.
TOKENIZER setting is used by the
stats commands. It also provides the summary and XML outputs of the asynchronous search API.
Tokenization of indexed fields (fields extracted at index time) is not supported. If you have set
INDEXED=true for a field, you cannot also use the
TOKENIZER setting for that field. You can use a transform extraction defined in
transforms.conf to break an indexed field into multiple values.
- Review the TOKENIZER multivalue field configuration syntax.
- See fields.conf in the Admin Manual to learn how the
- For an overview of configuration file usage in the Splunk platform, see About configuration files in the Admin Manual.
- For a primer on regular expression syntax and usage, see About Splunk regular expressions. You can test regular expressions by using them in searches with the rex search command.
- Open the
fields.conffile that you want to edit.
If you have Splunk Enterprise, you edit
$SPLUNK_HOME/etc/system/local/, or your own custom app directory in
- Add a stanza for the multivalue field. The stanza name should be the name of the field.
- Add a line in the stanza that matches the
TOKENIZERsetting with a regular expression that is designed to capture multiple values for a field.
- (Optional) If you have other attributes to set for the multivalue field, set them in the same stanza underneath the
- Save your changes to the file.
TOKENIZER multivalue field configuration syntax
[<field name 1>] TOKENIZER = <regular expression> [<field name 2>] TOKENIZER = <regular expression>
<regular expression>should be designed to capture multiple values for a field. For example, if a field name is followed by a list of email addresses, the regular expression should be able to extract each individual address as a separate value of the field without capturing delimiters like commas and spaces.
TOKENIZERdefaults to empty. When
TOKENIZERis empty, the field can only take on a single value.
TOKENIZERis not empty, the first group is taken from each match to form the set of field values.
TOKENIZERseparates the multiple values of a field with the following delimiter characters:
You start with a poorly formatted email log file where all of the addresses involved are grouped together under
AddressList. Here is a sample from that log file.
From: firstname.lastname@example.org To: email@example.com, firstname.lastname@example.org, email@example.com CC: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Subject: Multivalue fields are out there! X-Mailer: Febooti Automation Workshop (Unregistered) Content-Type: text/plain; charset=UTF-8 Date: Wed, 3 Nov 2017 17:13:54 +0200 X-Priority: 3 (normal)
This example from
$SPLUNK_HOME/etc/system/README/fields.conf.example breaks email fields
CC into multiple values.
[To] TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w) [From] TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w) [Cc] TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)
Because the TOKENIZER process adds a
\n delimiter between each value it extracts for a field, the multiple values for
To in the sample event for this example will display like this:
Example transform field extraction configurations
About calculated fields
This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.1.0, 8.1.1