Reduce lookup overhead with ingest-time lookups
If you have certain lookups that you routinely apply to all of your incoming events in Splunk Enterprise or Splunk Cloud Platform, consider processing them at ingest time. You can use the following methods to configure ingest-time lookups and add values from lookup tables to events before they are added to your indexes:
- If the data is being ingested into Splunk Enterprise, then in the transforms.conf file, you can configure an ingest-time eval that uses the
lookup()
eval function. This configuration method is only supported in Splunk Enterprise, not Splunk Cloud Platform. For more information, see the rest of the current documentation page. - If you have access to the Edge Processor solution, you can use an Edge Processor to apply lookups to your data before routing that data to Splunk Enterprise or Splunk Cloud Platform. For more information, see About the Edge Processor solution and Enrich data with lookups using an Edge Processor in the Use Edge Processors manual.
Ingest-time lookup prerequisites
This section covers things you should know before you attempt to configure an ingest-time lookup in Splunk Enterprise.
- Familiarize yourself with the
lookup()
function foreval
, as ingest-time lookups rely upon it to apply output fields and values in the form of JSON objects. See Comparison and conditional functions in the Search Reference. - Get configuration file access.
- Review the steps in How to edit a configuration file in the Splunk Enterprise Admin Manual.
- Read Where you can place (or find) your modified configuration files in the Splunk Enterprise Admin Manual.
- If your Splunk deployment uses distributed search, use the configuration bundle cluster manager to push CSV lookup files and lookup configurations to indexer cluster members and peers. See Update common peer configurations and apps in Managing Indexers and Clusters of Indexers.
- Learn about ingest-time eval expressions. Ingest-time lookups are a type of ingest-time eval expression. You use ingest-time eval expressions to create new fields and perform a wide range of operations on incoming data, including mathematical, statistical, and cryptographic functions. For an overview, see Process events with ingest-time eval.
- Ingest-time lookups for Splunk Enterprise are CSV file lookups and as such use CSV files as their lookup tables.
- Ingest-time lookups expect the CSV lookup file to be stored in
$SPLUNK_HOME/etc/system/lookups
. Make sure the lookup file is located in the add-on's lookups folder. If you have a single instance of Splunk you can manually load the file to this location. If you have a distributed search environment, you can use the configuration bundle cluster manager to update the files on your peers. - You can optionally specify a CSV lookup definition instead of a CSV lookup file. CSV lookup definitions include references to CSV lookup files. CSV lookup definitions can also include filters, field and value matching rules, and other settings. If you specify a CSV lookup definition, you must configure the definition as a
transforms.conf
stanza at$SPLUNK_HOME/etc/system/local
. - For more information about CSV file lookups, see Configure CSV lookups in the Knowledge Manager Manual.
- Ingest-time lookups expect the CSV lookup file to be stored in
- Ingest-time lookups have a syntax that is similar to that of the
lookup
command, and to the syntax of configurations for automatic search-time lookups. See Make your lookup automatic in the Knowledge Manager Manual.
Ingest-time lookup syntax
Ingest-time lookups run the lookup()
function through an INGEST_EVAL
expression. The syntax looks like this:
[lookup1] INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field>",<match_field>,...), json_array("<output_field>",...))
If the first quoted string supplied for the <lookup_table>
lacks a ".csv" file descriptor, the Splunk software assumes it is the name of a CSV lookup definition.
Specify a CSV lookup definition if you want the various settings associated with the definition to apply to the ingest-time lookup. These can include filters, field and value matching rules, and more.
Ingest-time lookup examples
A lookup()
function can use multiple <input_field>
/<match_field>
pairs to identify events, and multiple <output_field>
values can be applied to those events. Here is an example of valid lookup()
syntax with multiple inputs, matches, and outputs.
[lookup1] INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field1>", <match_field1>, "<input_field2>", <match_field2>), json_array("<output_field1>", "<output_field2>", "<output_field3>")
You can set up INGEST_EVAL
expressions that nest a lookup()
function inside another eval
function. This example uses a json_extract
function to pull a field value from the JSON object produced by the lookup()
function:
[lookup-extract] INGEST_EVAL= status_detail=json_extract(lookup("http_status.csv", json_object("status", status), json_array("status_description")), "status_description")
This results in the ingest-time addition of field-value pairs like status_detail=Created
and status_detail=Not Found
to your events, depending on the value of the status
field in those events.
Limits.conf settings
Two limits.conf
settings give you deployment-wide control over the usage of ingest-time lookups.
ingest_max_memtable_bytes
The ingest_max_memtable_bytes
setting puts an upper boundary on the size of CSV lookup tables that are used for ingest-time lookup processing. This ensures that ingest-time lookup files never take up too much space in memory. By default, lookup tables larger than 10mb in size cannot be used for the lookup()
eval
function when it is used with INGEST_EVAL
. When the size of a lookup table being used by an ingest-time lookup exceeds the ingest_max_memtable_bytes
setting, any lookups that rely on it fail. Error messages indicating their failure appear in splunkd.log
when Splunk is restarted.
The ingest_max_memtable_bytes
setting is intentionally separate from max_memtable_bytes
, a similar setting for search-time lookups. It is set to a lower value so that the indexing pipeline is not affected as much in terms of memory and CPU usage.
ingest_lookup_refresh_period_secs
If you have ingest-time lookups that are based on CSV lookup tables that change on a frequent basis, you can adjust the ingest_lookup_refresh_period_secs
setting to ensure that these changes are captured. By default this setting ensures that the in-memory lookup tables that are used with the lookup()
function at ingest time are refreshed every 60 seconds.
Process events with ingest-time eval | About hosts |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)
Feedback submitted, thanks!