Splunk® Enterprise

Knowledge Manager Manual

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Configure CSV and external lookups

Lookups add fields from an external source to your events based on the values of fields currently present in those events. This topic discusses two kinds of lookups: CSV and external.

Lookup type Description
CSV lookup Populates your events with fields pulled from CSV files. Also referred to as a "static lookup" because CSV files represent static tables of data. Each column in a CSV table is interpreted as the potential values of a field.
External lookup Uses Python scripts or binary executables to populate your events with field values from an external source.

KV store lookups are a third kind of lookup that you can define using KV store collections. See "Configure KV store lookups" in this manual.

This topic shows you how to set up and manage CSV and external lookups by configuring lookup stanzas in props.conf and transforms.conf. Configuration files give you a greater degree of control over lookup design and behavior than you get when you set up lookup files using Splunk Web.

If you do not have access to the .conf files, or if you prefer to maintain lookups through Splunk Web whenever possible, you can configure all three lookup types using the pages at Settings > Lookups. See "Use lookups to add information to your events" in this manual.

Step 1 - Add a lookup stanza to transforms.conf

The lookup stanza identifies the lookup type (CSV or external) and the location of the lookup table. It can optionally include field matching rules and rules for time-bounded lookups. The details of how you configure the lookup in transforms.conf vary according to the lookup type.

If you want a lookup to be available globally, add its lookup stanza to the version of transforms.conf in $SPLUNK_HOME/etc/system/local/. If you want the lookup to be specific to a particular app, add its stanza to the version of transforms.conf in $SPLUNK_HOME/etc/apps/<app_name>/local/.

Caution: Do not edit configuration files in $SPLUNK_HOME/etc/system/default.

Create a CSV lookup stanza

CSV lookups match field values from your events to field values in the static table represented by a CSV file. Then they output corresponding field values from that table to your events.

1. Add the CSV file for the lookup to your Splunk Enterprise implementation.

The CSV file must be located in one of two places:
$SPLUNK_HOME/etc/system/lookups
$SPLUNK_HOME/etc/apps/<app_name>/lookups
Create the lookups directory if it does not exist.
The table represented by the CSV file should have at least two columns. One of those columns should represent a field with a set of values that includes those belonging to a field in your events. The column does not have to have the same name as the event field. Any column can have multiple instances of the same value, as this represents a multivalue field.
The CSV file cannot contain non-utf-8 characters. Plain ascii text is ok, as is any character set that is also valid utf-8.
CSV files with Pre-OS X (OS9 or earlier) Macintosh-style line endings (aka carriage return only, "\r") are not supported. CSV files with column names that exceed 4096 characters are also not supported.

2. Add a CSV lookup stanza to transforms.conf.

The CSV lookup stanza names the lookup table and provides the name of the CSV file that the lookup uses. It uses these required attributes.
  • [<lookup_name>]: The name of the lookup.
  • filename = <string>: The name of the CSV file that the lookup references.
The CSV lookup stanza can an optionally include attributes that:

3. Save your changes to the .conf file.

4. If you want to make this an automatic lookup, move on to Step 2.

Otherwise, go to Step 3 and restart Splunk Enterprise.

Create an external lookup stanza

External lookups invoke a script that matches fields in your events with fields in an external source and outputs corresponding fields from that external source and adds them to your events. This kind of lookup is also referred to as a scripted lookup.

1. Add the script for the lookup to your Splunk Enterprise implementation.

The script must be located in one of two places:
  • $SPLUNK_HOME/etc/searchscripts
  • $SPLUNK_HOME/etc/apps/<app_name>/bin
See "More about the external lookup script" for more information about how such scripts work.

2. Add an external lookup stanza to transforms.conf.

The external lookup stanza names the lookup table, provides the script and argument to perform lookups, identifies the script type, and supplies a list of fields that are supported by the script. It uses these required attributes.
  • [<lookup_name>]: The name of the lookup.
  • external_cmd = <string>: The command and arguments that Splunk Enterprise should invoke to perform the lookup. The command is expected to be the name of the script, such as external_lookup.py.
  • external_type = [python|executable|kvstore]: The type of script being used for the lookup. Can be python, for a Python script, or executable, for a binary executable. The kvstore value is reserved for KV store lookups.
  • fields_list = <string>: is a list of all fields that are supported by the external lookup. The fields must be delimited by a comma followed by a space.

The external lookup stanza can optionally include attributes that:

3. Save your changes to the .conf file.

4. If you want to make this an automatic lookup, move on to Step 2.

Otherwise, go to Step 3 and restart Splunk Enterprise.

Step 2 - Make the lookup automatic

In this step you create a lookup configuration in props.conf that:

  • References the lookup table you created in Step 1
  • Specifies the fields in your events that the lookup should match in the lookup table
  • Specifies the corresponding fields that the lookup should output from the lookup table to your events.

The result is an automatic lookup: a lookup that automatically adds fields to your events from the lookup table.

This step is the same for CSV and external lookup types. It does not apply to KV store lookups.

Note: This step is optional. Skip it if you do not want to create an automatic lookup: a lookup that automatically adds fields to your events from a lookup table whenever you run a search with a related host, source, or source type. If you do not create an automatic lookup you can still invoke the transforms.conf configuration with the lookup, inputlookup, and outputlookup commands.

Automatic lookups can access any data that belongs to you or which is shared. When you access lookups with search commands, only shared data can be accessed.

1. In props.conf, locate the stanza for the host, source, or source type that you want to associate this lookup with.

If the stanza does not exist, create it.
The format of the stanza header is [<spec>]. <spec> can be:
  • <sourcetype>, the source type of an event.
  • host::<host>, where <host></code> is the host, or host-matching pattern, for an event.
  • source::<source>, where <source> is the source, or source-matching pattern, for an event.
<spec> cannot use regular expression syntax.

2. Add a LOOKUP-<class> configuration to the stanza that you have identified or created.

The LOOKUP-<class> configuration outputs fields to events with the host, source, or source type of the stanza, if those events have fields that match fields in the lookup table. Follow this syntax:
[<spec>]
LOOKUP-<class> = $TRANSFORM <match_field_in_table> OUTPUT|OUTPUTNEW <output_field_in_table>
  • $TRANSFORM: References the transforms.conf stanza that defines the lookup table.
  • match_field_in_table: Each column in a CSV table represents a field and its possible values. This variable is the CSV table column that matches to a field in events with the same host, source, or source as the props.conf stanza.
  • output_field_in_table: The column in the lookup table that you want Splunk Enterprise to add to your events. Use OUTPUTNEW if you do not want to overwrite existing values in your output field.
You can have multiple fields on either side of the lookup. For example, you can have $TRANSFORM <match_field1>, <match_field2> OUTPUT|OUTPUTNEW <match_field3>, <match_field4>. You can also have one matching field return two output fields, three matching fields return one output field, and so on.
If you do not include an OUTPUT|OUTPUTNEW clause, Splunk Enterprise adds all the field names and values from the lookup table to your events.
Use the AS clause if the field names in the lookup table and your events do not match or if you want to rename the field in your event:
[<stanza name>]
LOOKUP-<class> = $TRANSFORM <match_field_in_table> AS <match_field_in_event> 
OUTPUT|OUTPUTNEW <output_field_in_table> AS <output_field_in_event>

Note: You can have multiple LOOKUP-<class> configurations in a single props.conf stanza. Each lookup should have its own unique lookup name. For example, if you have multiple lookups, you can name them LOOKUP-table1, LOOKUP-table2, and so on.

Step 3 - Restart Splunk Enterprise

The changes you have made to the configuration files are not implemented until you restart Splunk Enterprise.

If you have set up an automatic lookup, after restart you should see the output fields from your lookup table listed in the fields sidebar. From there, you can select the fields to display in each of the matching search results.

CSV lookup example

This example explains how you can set up a lookup for HTTP status codes in an access_combined log. In this example, you design a lookup that matches the status field in your events with the status column in a lookup table named http_status.csv. Then you have the lookup output the corresponding status_description and status_type fields to your events.

The following is the text for the http_status.csv file.

status,status_description,status_type
100,Continue,Informational
101,Switching Protocols,Informational
200,OK,Successful
201,Created,Successful
202,Accepted,Successful
203,Non-Authoritative Information,Successful
204,No Content,Successful
205,Reset Content,Successful
206,Partial Content,Successful
300,Multiple Choices,Redirection
301,Moved Permanently,Redirection
302,Found,Redirection
303,See Other,Redirection
304,Not Modified,Redirection
305,Use Proxy,Redirection
307,Temporary Redirect,Redirection
400,Bad Request,Client Error
401,Unauthorized,Client Error
402,Payment Required,Client Error
403,Forbidden,Client Error
404,Not Found,Client Error
405,Method Not Allowed,Client Error
406,Not Acceptable,Client Error
407,Proxy Authentication Required,Client Error
408,Request Timeout,Client Error
409,Conflict,Client Error
410,Gone,Client Error
411,Length Required,Client Error
412,Precondition Failed,Client Error
413,Request Entity Too Large,Client Error
414,Request-URI Too Long,Client Error
415,Unsupported Media Type,Client Error
416,Requested Range Not Satisfiable,Client Error
417,Expectation Failed,Client Error
500,Internal Server Error,Server Error
501,Not Implemented,Server Error
502,Bad Gateway,Server Error
503,Service Unavailable,Server Error
504,Gateway Timeout,Server Error
505,HTTP Version Not Supported,Server Error

1. Put the http_status.csv file in $SPLUNK_HOME/etc/apps/search/lookups/. This indicates that the lookup is specific to the Search App.

2. In the transforms.conf file located in $SPLUNK_HOME/etc/apps/search/local, put:

[http_status]
filename = http_status.csv

3. In the props.conf file located in $SPLUNK_HOME/etc/apps/search/local/, put:

[access_combined]
LOOKUP-http = http_status status OUTPUT status_description, status_type

4. Restart Splunk Enterprise.

Now, when you run a search that returns Web access information for the access_combined source type, you will see the fields status_description and status_type listed in your fields sidebar menu. All events with an http_status value get corresponding status_description and status_type field/value pairs from the lookup.

External lookup example

Here's an example of an external lookup that is delivered with Splunk Enterprise. It matches with information from a DNS server. It does not have a props.conf component, so it is not an automatic lookup. You access it by running a search with the lookup command.

Splunk Enterprise ships with a script located in $SPLUNK_HOME/etc/system/bin/ called external_lookup.py, which is a DNS lookup script that:

  • if given a host, returns the IP address.
  • if given an IP address, returns the host name.

Splunk also ships with a configuration for this script in $SPLUNK_HOME/etc/system/default/transforms.conf.

[dnslookup]
external_cmd = external_lookup.py clienthost clientip 
fields_list = clienthost,clientip

You can run a search with the lookup command that uses the the [dnslookup] stanza from the default transforms.conf.

sourcetype=access_combined | lookup dnslookup clienthost AS host | stats count by clientip

This search:

  • Matches the clienthost field in the external lookup table with the host field in your events</code>
  • Returns a table that provides a count for each of the clientip values that corresponds with the clienthost matches.

This search does not add fields to your events.

You can design a search that performs a reverse lookup, which returns a host value for each IP address it receives.

sourcetype=access_combined | lookup dnslookup clientip | stats count by clienthost

Note that this reverse lookup search does not include an AS clause. This is because Splunk automatically extracts IP addresses as clientip.

More about the external lookup script

Your external lookup script must take in a partially empty CSV file and output a filled-in CSV file. The arguments that you pass to the script are the headers for these input and output files.

In the DNS lookup example above, the CSV file contains two fields: clienthost and clientip. The fields that you pass to this script are the ones you specify in transforms.conf using the external_cmd attribute. If you do not pass these arguments, the script returns an error.

external_cmd = external_lookup.py clienthost clientip

When you run this search string:

... | lookup dnsLookup clienthost

You are telling Splunk Enterprise to:

  1. Use the lookup table that you defined in transforms.conf as [dnsLookup]
  2. Pass the values for the clienthost field into the external command script as a CSV file. The CSV file looks like this.
clienthost,clientip
work.com
home.net

This is a CSV file with clienthost and clientip as column headers, but without values for clientip. The script includes the two headers because they are the fields you specified in the fields_list attribute of the [dnslookup] stanza in the default transforms.conf.

The script then outputs the following CSV file and returns it to Splunk Enterprise, which populates the clientip field in your results:

host,ip
work.com,127.0.0.1
home.net,127.0.0.2

Note: When writing your script, if you refer to any external resources (such as a file), the reference must be relative to the directory where the script is located.

Optional field matching rules for lookup configurations

These attributes provide field matching rules for lookups. They can be applied to all three lookup types. Add them to the transforms.conf stanza for your lookup.

Attribute Type Description Default
max_matches Integer The maximum number of possible matches for each value input to the lookup table from your events. Range is 1-1000. If the time_field attribute is is not specified, Splunk Enterprise uses the first <integer> entries, in file order. If the time_field attribute is specified (because it is a time-bounded lookup), Splunk Enterprise uses the first <integer> entries, in descending time order. In other words, up to <max_matches> are allowed to match. When this number is surpassed, Splunk Enterprise uses the matches closest to the lookup value. 1000 if the time_field attribute is not specified. 1 if the time_field attribute is specified.
min_matches Integer The minimum number of possible matches for each value input to the lookup table from your events. You can use default_match to help with situations where Splunk Enterprise finds fewer than min_matches for any given input. 0 for both non-time-bounded lookups and time-bounded lookups, which means Splunk Enterprise outputs nothing to your event if it cannot find a match.
default_match String When min_matches is greater than 0 and and Splunk Enterprise finds less than min_matches for any given input, it provides this default_match value one or more times until the min_matches threshold is reached. Empty string
case_sensitive_match Boolean When set to true, Splunk Enterprise performs case sensitive matching for all fields in the lookup table. When set to false, Splunk Enterprise disregards case when matching lookup table fields. True
match_type String Allows non-exact matching of one or more fields arranged in a list delimited by a comma followed by a space. Format is match_type = <match_type>(<field_name1>, <field_name2>,...<field_nameN>). Set match_type to WILDCARD to apply wildcard matching, or set it to CIDR to apply CIDR matching (specifically for IP address values). EXACT (does not need to be specified)

Set up a time-bounded lookup

If your lookup table has a field that represents time, you can use it to create a time-bounded lookup (also referred to as a temporal lookup). You can configure all three lookup types as time-bounded lookups.

To create a time-bounded lookups, add the following lines to your lookup stanza in transforms.conf:

time_field = <field_name>
time_format = <string>

If the time_field attribute is present, max_matches = 1 by default and Splunk Enterprise applies the first matching entry in descending order.

The time_format attribute specifies the strptime() format of the time_field attribute. The default value for the time_format attribute is %s.%Q, where you enter a Unix epoch time value in seconds (%s) and can optionally include milliseconds (%Q).

Note: Splunk Enterprise lets you use some nonstandard date-time strptime() formats. For example, when you define ISO 8601 timestamps (a Unix epoch time value in seconds), you may use time_format = '%s.%Q', where %s represents seconds and %Q represents milliseconds. See the subtopic "Enhanced strptime() support" in "Configure timestamp recognition," in the Getting Data In Manual.

For a match to occur with time-bounded lookups, you can also specify offsets for the minimum and maximum amounts of time that an event may be later than a lookup entry. To do this, add the following lines to your stanza:

max_offset_secs = <integer>
min_offset_secs = <integer>

By default there is no maximum offset. The default minimum offset is 0.

Time-bounded lookup example

Here's an example of how you might use DHCP logs to identify users on your network based on their IP address and the timestamp. Let's say the DHCP logs are in a file, dhcp.csv, which contains the timestamp, IP address, and the user's name and MAC address.

1. In a transforms.conf file, put:

[dhcpLookup]
filename = dhcp.csv
time_field = timestamp
time_format = %d/%m/%y %H:%M:%S

2. In a props.conf file, put:

[dhcp]
LOOKUP-table = dhcpLookup ip mac OUTPUT user

3. Restart Splunk Enterprise.

Use search results to populate a lookup table

You can edit a local or app-specific copy of savedsearches.conf to use the results of a report to populate a lookup table.

In a report stanza, where the search returns a results table:

1. Add the following line to enable the lookup population action.

action.populate_lookup = 1

This tells Splunk Enterprise to save your results table into a CSV file.

2. Add the following line to tell Splunk Enterprise where to copy your lookup table.

action.populate_lookup.dest = <string>

The action.populate_lookup.dest value is a lookup name from transforms.conf or a path to a CSV file where Splunk Enterprise should copy the search results. If it is a path to a CSV file, the path should be relative to $SPLUNK_HOME.

For example, if you want to save the results to a global lookup table, you might include:

action.populate_lookup.dest = etc/system/lookups/myTable.csv

The destination directory, $SPLUNK_HOME/etc/system/lookups or $SPLUNK_HOME/etc/<app_name>/lookups, should already exist.

3. Add the following line if you want this search to run when Splunk Enterprise starts up.

run_on_startup = true

If it does not run on startup, it will run at the next scheduled time. We recommend that you set run_on_startup = true for scheduled searches that populate lookup tables.

Because Splunk Enterprise copies the results of the report to a CSV file, you can set up this lookup the same way you set up a CSV lookup.

PREVIOUS
Use field lookups to add information to your events
  NEXT
Configure KV Store lookups

This documentation applies to the following versions of Splunk® Enterprise: 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15


Comments

Ohoppe--

Thanks for your feedback. We agree that restarts are not needed when lookups are managed via the Splunk Web interface. However, this topic discusses the creation and update of lookups via direct edits to .conf files. You will need to restart after you make updates to these files.

Mness
March 13, 2015

The documentation is a bit outdated here. A restart is not needed when you add a new lookup file, lookup definition or automatic lookup via the webinterface. Please correct.

Ohoppe
February 26, 2015

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters