Splunk® Enterprise

Getting Data In

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

Reduce lookup overhead with ingest-time lookups

If you have certain lookups that you routinely apply to all of your incoming events, consider processing them at ingest time with ingest-time lookups. You can do this by configuring an ingest-time eval that uses the lookup() eval function to add values from lookup tables to events before they are added to your indexes.

Ingest-time lookup prerequisites

This section covers things you should know before you attempt to configure an ingest-time lookup.

The lookup() eval function

Ingest-time lookups rely upon the lookup() function for eval.

The lookup() function performs a CSV lookup in a manner not unlike that of the lookup command, except that it utilizes JSON functions for eval such as json_object to apply output fields and values in the form of JSON objects. The output of a lookup() function is a JSON object.

For more about the lookup() function and the JSON functions, see the following Search Reference topics:

Filesystem access

You must have Splunk filesystem access to set up an ingest-time eval expression. Ingest-time eval expressions cannot be configured in the context of an app or user.

Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make changes to the files in the local directory.

Ingest-time eval expressions

Ingest-time lookups are a type of ingest-time eval expression. You use ingest-time eval expressions to create new fields and perform a wide range of operations on incoming data, including mathematical, statistical, and cryptographic functions. For an overview, see Process events with ingest-time eval.

Configure ingest-time eval expressions in transforms.conf with the INGEST_EVAL setting at $SPLUNK_HOME/etc/system/local.

CSV lookup files and definitions

Ingest-time lookups are CSV file lookups and as such use CSV files as their lookup tables. Ingest-time lookups expect the CSV lookup file to be stored in $SPLUNK_HOME/etc/system/lookups. If you have a single instance of Splunk you can manually load the file to this location. If you have a distributed search environment, you can use the deployer to update the files on your peers.

You can optionally specify a CSV lookup definition instead of a CSV lookup file. CSV lookup definitions include references to CSV lookup files. CSV lookup definitions can also include filters, field and value matching rules, and other settings. If you specify a CSV lookup definition, it must be configured as a transforms.conf stanza at $SPLUNè_HOME/etc/system/local. See Configure CSV lookups in the Knowledge Manager Manual.

Ingest-time lookup syntax and usage

Ingest-time lookups have a syntax that is similar to that of the lookup command, and to the syntax of configurations for automatic search-time lookups. See Make your lookup automatic in the Knowledge Manager Manual.

Ingest-time lookups run the lookup() function through an INGEST_EVAL expression. The syntax looks like this:

[lookup1]
INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field>",<match_field>,...), json_array("<output_field>",...)) 

If the first quoted string supplied for the <lookup_table> lacks a ".csv" file descriptor, the Splunk software assumes it is the name of a CSV lookup definition.

Specify a lookup definition if you want the various settings associated with the definition to apply to the ingest-time lookup. These can include filters, filters, field and value matching rules, and more.

A lookup() function can use multiple <input_field>/<match_field> pairs to identify events, and multiple <output_field> values can be applied to those events. Here is an example of valid lookup() syntax with multiple inputs, matches, and outputs.

[lookup1]
INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field1>", <match_field1>, "<input_field2>", <match_field2>), json_array("<output_field1>", "<output_field2>", "<output_field3>")

You can set up INGEST_EVAL expressions that nest a lookup() function inside another eval function. This example uses a json_extract function to pull a field value from the JSON object produced by the lookup() function:

[lookup-extract]
INGEST_EVAL= status_detail=json_extract(lookup("http_status.csv", json_object("status", status), json_array("status_description")), "status_description") 

This results in the ingest-time addition of field-value pairs like status_detail=Created and status_detail=Not Found to your events, depending on the value of the status field in those events.

Limits.conf settings

Two limits.conf settings give you deployment-wide control over the usage of ingest-time lookups.

ingest_max_memtable_bytes

The ingest_max_memtable_bytes setting puts an upper boundary on the size of CSV lookup tables that are used for ingest-time lookup processing. This ensures that ingest-time lookup files never take up too much space in memory. By default, lookup tables larger than 10mb in size cannot be used for the lookup() eval function when it is used with INGEST_EVAL. When the size of a lookup table being used by an ingest-time lookup exceeds the ingest_max_memtable_bytes setting, any lookups that rely on it fail. Error messages indicating their failure appear in splunkd.log when Splunk is restarted.

The ingest_max_memtable_bytes setting is intentionally separate from max_memtable_bytes, a similar setting for search-time lookups. It is set to a lower value so that the indexing pipeline is not affected as much in terms of memory and CPU usage.

ingest_lookup_refresh_period_secs

If you have ingest-time lookups that are based on CSV lookup tables that change on a frequent basis, you can adjust the ingest_lookup_refresh_period_secs setting to ensure that these changes are captured. By default this setting ensures that the in-memory lookup tables that are used with the lookup() function at ingest time are refreshed every 60 seconds.

Last modified on 31 October, 2020
PREVIOUS
Process events with ingest-time eval
  NEXT
About hosts

This documentation applies to the following versions of Splunk® Enterprise: 8.1.0


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters