
Reduce lookup overhead with ingest-time lookups
If you have certain lookups that you routinely apply to all of your incoming events, consider processing them at ingest time with ingest-time lookups. You can do this by configuring an ingest-time eval that uses the lookup()
eval function to add values from lookup tables to events before they are added to your indexes.
Ingest-time lookup prerequisites
This section covers things you should know before you attempt to configure an ingest-time lookup.
The lookup() eval function
Ingest-time lookups rely upon the lookup()
function for eval
.
The lookup()
function performs a CSV lookup in a manner not unlike that of the lookup
command, except that it utilizes JSON functions for eval
such as json_object
to apply output fields and values in the form of JSON objects. The output of a lookup()
function is a JSON object.
For more about the lookup()
function and the JSON functions, see the following Search Reference topics:
Filesystem access
You must have Splunk filesystem access to set up an ingest-time eval expression. Ingest-time eval expressions cannot be configured in the context of an app or user.
- Review the steps in How to edit a configuration file in the Splunk Enterprise Admin Manual.
- You can have configuration files with the same name in your default, local, and app directories. Read Where you can place (or find) your modified configuration files in the Splunk Enterprise Admin Manual.
- If your Splunk deployment uses distributed search, use the configuration bundle deployer to push CSV lookup files and configurations to cluster members and peers. See Use the deployer to distribute apps and configuration changes in Distributed Search.
Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make changes to the files in the local directory.
Ingest-time eval expressions
Ingest-time lookups are a type of ingest-time eval expression. You use ingest-time eval expressions to create new fields and perform a wide range of operations on incoming data, including mathematical, statistical, and cryptographic functions. For an overview, see Process events with ingest-time eval.
Configure ingest-time eval expressions in transforms.conf
with the INGEST_EVAL
setting at $SPLUNK_HOME/etc/system/local
.
CSV lookup files and definitions
Ingest-time lookups are CSV file lookups and as such use CSV files as their lookup tables. Ingest-time lookups expect the CSV lookup file to be stored in $SPLUNK_HOME/etc/system/lookups
. If you have a single instance of Splunk you can manually load the file to this location. If you have a distributed search environment, you can use the deployer to update the files on your peers.
You can optionally specify a CSV lookup definition instead of a CSV lookup file. CSV lookup definitions include references to CSV lookup files. CSV lookup definitions can also include filters, field and value matching rules, and other settings. If you specify a CSV lookup definition, it must be configured as a transforms.conf
stanza at $SPLUNè_HOME/etc/system/local
. See Configure CSV lookups in the Knowledge Manager Manual.
Ingest-time lookup syntax and usage
Ingest-time lookups have a syntax that is similar to that of the lookup
command, and to the syntax of configurations for automatic search-time lookups. See Make your lookup automatic in the Knowledge Manager Manual.
Ingest-time lookups run the lookup()
function through an INGEST_EVAL
expression. The syntax looks like this:
[lookup1] INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field>",<match_field>,...), json_array("<output_field>",...))
If the first quoted string supplied for the <lookup_table>
lacks a ".csv" file descriptor, the Splunk software assumes it is the name of a CSV lookup definition.
Specify a lookup definition if you want the various settings associated with the definition to apply to the ingest-time lookup. These can include filters, filters, field and value matching rules, and more.
A lookup()
function can use multiple <input_field>
/<match_field>
pairs to identify events, and multiple <output_field>
values can be applied to those events. Here is an example of valid lookup()
syntax with multiple inputs, matches, and outputs.
[lookup1] INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field1>", <match_field1>, "<input_field2>", <match_field2>), json_array("<output_field1>", "<output_field2>", "<output_field3>")
You can set up INGEST_EVAL
expressions that nest a lookup()
function inside another eval
function. This example uses a json_extract
function to pull a field value from the JSON object produced by the lookup()
function:
[lookup-extract] INGEST_EVAL= status_detail=json_extract(lookup("http_status.csv", json_object("status", status), json_array("status_description")), "status_description")
This results in the ingest-time addition of field-value pairs like status_detail=Created
and status_detail=Not Found
to your events, depending on the value of the status
field in those events.
Limits.conf settings
Two limits.conf
settings give you deployment-wide control over the usage of ingest-time lookups.
ingest_max_memtable_bytes
The ingest_max_memtable_bytes
setting puts an upper boundary on the size of CSV lookup tables that are used for ingest-time lookup processing. This ensures that ingest-time lookup files never take up too much space in memory. By default, lookup tables larger than 10mb in size cannot be used for the lookup()
eval
function when it is used with INGEST_EVAL
. When the size of a lookup table being used by an ingest-time lookup exceeds the ingest_max_memtable_bytes
setting, any lookups that rely on it fail. Error messages indicating their failure appear in splunkd.log
when Splunk is restarted.
The ingest_max_memtable_bytes
setting is intentionally separate from max_memtable_bytes
, a similar setting for search-time lookups. It is set to a lower value so that the indexing pipeline is not affected as much in terms of memory and CPU usage.
ingest_lookup_refresh_period_secs
If you have ingest-time lookups that are based on CSV lookup tables that change on a frequent basis, you can adjust the ingest_lookup_refresh_period_secs
setting to ensure that these changes are captured. By default this setting ensures that the in-memory lookup tables that are used with the lookup()
function at ingest time are refreshed every 60 seconds.
PREVIOUS Process events with ingest-time eval |
NEXT About hosts |
This documentation applies to the following versions of Splunk Cloud™: 8.1.2008, 8.1.2009, 8.1.2011, 8.1.2012, 8.1.2101
Feedback submitted, thanks!