Reduce lookup overhead with ingest-time lookups
If you have certain lookups that you routinely apply to all of your incoming events, consider processing them at ingest time with ingest-time lookups. You can do this by configuring an ingest-time eval that uses the
lookup() eval function to add values from lookup tables to events before they are added to your indexes.
Ingest-time lookup prerequisites
This section covers things you should know before you attempt to configure an ingest-time lookup.
The lookup() eval function
Ingest-time lookups rely upon the
lookup() function for
lookup() function performs a CSV lookup in a manner not unlike that of the
lookup command, except that it utilizes JSON functions for
eval such as
json_object to apply output fields and values in the form of JSON objects. The output of a
lookup() function is a JSON object.
For more about the
lookup() function and the JSON functions, see the following Search Reference topics:
You must have Splunk filesystem access to set up an ingest-time eval expression. Ingest-time eval expressions cannot be configured in the context of an app or user.
- Review the steps in How to edit a configuration file in the Splunk Enterprise Admin Manual.
- You can have configuration files with the same name in your default, local, and app directories. Read Where you can place (or find) your modified configuration files in the Splunk Enterprise Admin Manual.
- If your Splunk deployment uses distributed search, use the configuration bundle deployer to push CSV lookup files and configurations to cluster members and peers. See Use the deployer to distribute apps and configuration changes in Distributed Search.
Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make changes to the files in the local directory.
Ingest-time eval expressions
Ingest-time lookups are a type of ingest-time eval expression. You use ingest-time eval expressions to create new fields and perform a wide range of operations on incoming data, including mathematical, statistical, and cryptographic functions. For an overview, see Process events with ingest-time eval.
Configure ingest-time eval expressions in
transforms.conf with the
INGEST_EVAL setting at
CSV lookup files and definitions
Ingest-time lookups are CSV file lookups and as such use CSV files as their lookup tables. Ingest-time lookups expect the CSV lookup file to be stored in
$SPLUNK_HOME/etc/system/lookups. If you have a single instance of Splunk you can manually load the file to this location. If you have a distributed search environment, you can use the deployer to update the files on your peers.
You can optionally specify a CSV lookup definition instead of a CSV lookup file. CSV lookup definitions include references to CSV lookup files. CSV lookup definitions can also include filters, field and value matching rules, and other settings. If you specify a CSV lookup definition, it must be configured as a
transforms.conf stanza at
$SPLUNè_HOME/etc/system/local. See Configure CSV lookups in the Knowledge Manager Manual.
Ingest-time lookup syntax and usage
Ingest-time lookups have a syntax that is similar to that of the
lookup command, and to the syntax of configurations for automatic search-time lookups. See Make your lookup automatic in the Knowledge Manager Manual.
Ingest-time lookups run the
lookup() function through an
INGEST_EVAL expression. The syntax looks like this:
[lookup1] INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field>",<match_field>,...), json_array("<output_field>",...))
If the first quoted string supplied for the
<lookup_table> lacks a ".csv" file descriptor, the Splunk software assumes it is the name of a CSV lookup definition.
Specify a lookup definition if you want the various settings associated with the definition to apply to the ingest-time lookup. These can include filters, filters, field and value matching rules, and more.
lookup() function can use multiple
<match_field> pairs to identify events, and multiple
<output_field> values can be applied to those events. Here is an example of valid
lookup() syntax with multiple inputs, matches, and outputs.
[lookup1] INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field1>", <match_field1>, "<input_field2>", <match_field2>), json_array("<output_field1>", "<output_field2>", "<output_field3>")
You can set up
INGEST_EVAL expressions that nest a
lookup() function inside another
eval function. This example uses a
json_extract function to pull a field value from the JSON object produced by the
[lookup-extract] INGEST_EVAL= status_detail=json_extract(lookup("http_status.csv", json_object("status", status), json_array("status_description")), "status_description")
This results in the ingest-time addition of field-value pairs like
status_detail=Not Found to your events, depending on the value of the
status field in those events.
limits.conf settings give you deployment-wide control over the usage of ingest-time lookups.
ingest_max_memtable_bytes setting puts an upper boundary on the size of CSV lookup tables that are used for ingest-time lookup processing. This ensures that ingest-time lookup files never take up too much space in memory. By default, lookup tables larger than 10mb in size cannot be used for the
eval function when it is used with
INGEST_EVAL. When the size of a lookup table being used by an ingest-time lookup exceeds the
ingest_max_memtable_bytes setting, any lookups that rely on it fail. Error messages indicating their failure appear in
splunkd.log when Splunk is restarted.
ingest_max_memtable_bytes setting is intentionally separate from
max_memtable_bytes, a similar setting for search-time lookups. It is set to a lower value so that the indexing pipeline is not affected as much in terms of memory and CPU usage.
If you have ingest-time lookups that are based on CSV lookup tables that change on a frequent basis, you can adjust the
ingest_lookup_refresh_period_secs setting to ensure that these changes are captured. By default this setting ensures that the in-memory lookup tables that are used with the
lookup() function at ingest time are refreshed every 60 seconds.
Process events with ingest-time eval
This documentation applies to the following versions of Splunk® Enterprise: 8.1.0, 8.1.1