Splunk Cloud Platform

Getting Data In

Reduce lookup overhead with ingest-time lookups

If you have certain lookups that you routinely apply to all of your incoming events in Splunk Enterprise or Splunk Cloud Platform, consider processing them at ingest time. You can use the following methods to configure ingest-time lookups and add values from lookup tables to events before they are added to your indexes:

  • If the data is being ingested into Splunk Enterprise, then in the transforms.conf file, you can configure an ingest-time eval that uses the lookup() eval function. This configuration method is only supported in Splunk Enterprise, not Splunk Cloud Platform. For more information, see the rest of the current documentation page.
  • If you have access to the Edge Processor solution, you can use an Edge Processor to apply lookups to your data before routing that data to Splunk Enterprise or Splunk Cloud Platform. For more information, see About the Edge Processor solution and Enrich data with lookups using an Edge Processor in the Use Edge Processors manual.

Ingest-time lookup prerequisites

This section covers things you should know before you attempt to configure an ingest-time lookup in Splunk Enterprise.

  • Familiarize yourself with the lookup() function for eval, as ingest-time lookups rely upon it to apply output fields and values in the form of JSON objects. See Comparison and conditional functions in the Search Reference.
  • Get configuration file access.
  • Learn about ingest-time eval expressions. Ingest-time lookups are a type of ingest-time eval expression. You use ingest-time eval expressions to create new fields and perform a wide range of operations on incoming data, including mathematical, statistical, and cryptographic functions. For an overview, see Process events with ingest-time eval.
  • Ingest-time lookups for Splunk Enterprise are CSV file lookups and as such use CSV files as their lookup tables.
    • Ingest-time lookups expect the CSV lookup file to be stored in $SPLUNK_HOME/etc/system/lookups. Make sure the lookup file is located in the add-on's lookups folder. If you have a single instance of Splunk you can manually load the file to this location. If you have a distributed search environment, you can use the configuration bundle cluster manager to update the files on your peers.
    • You can optionally specify a CSV lookup definition instead of a CSV lookup file. CSV lookup definitions include references to CSV lookup files. CSV lookup definitions can also include filters, field and value matching rules, and other settings. If you specify a CSV lookup definition, you must configure the definition as a transforms.conf stanza at $SPLUNK_HOME/etc/system/local.
    • For more information about CSV file lookups, see Configure CSV lookups in the Knowledge Manager Manual.
  • Ingest-time lookups have a syntax that is similar to that of the lookup command, and to the syntax of configurations for automatic search-time lookups. See Make your lookup automatic in the Knowledge Manager Manual.

Ingest-time lookup syntax

Ingest-time lookups run the lookup() function through an INGEST_EVAL expression. The syntax looks like this:

[lookup1]
INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field>",<match_field>,...), json_array("<output_field>",...)) 

If the first quoted string supplied for the <lookup_table> lacks a ".csv" file descriptor, the Splunk software assumes it is the name of a CSV lookup definition.

Specify a CSV lookup definition if you want the various settings associated with the definition to apply to the ingest-time lookup. These can include filters, field and value matching rules, and more.

Ingest-time lookup examples

A lookup() function can use multiple <input_field>/<match_field> pairs to identify events, and multiple <output_field> values can be applied to those events. Here is an example of valid lookup() syntax with multiple inputs, matches, and outputs.

[lookup1]
INGEST_EVAL= <string>=lookup("<lookup_table>", json_object("<input_field1>", <match_field1>, "<input_field2>", <match_field2>), json_array("<output_field1>", "<output_field2>", "<output_field3>")

You can set up INGEST_EVAL expressions that nest a lookup() function inside another eval function. This example uses a json_extract function to pull a field value from the JSON object produced by the lookup() function:

[lookup-extract]
INGEST_EVAL= status_detail=json_extract(lookup("http_status.csv", json_object("status", status), json_array("status_description")), "status_description") 

This results in the ingest-time addition of field-value pairs like status_detail=Created and status_detail=Not Found to your events, depending on the value of the status field in those events.

Limits.conf settings

Two limits.conf settings give you deployment-wide control over the usage of ingest-time lookups.

ingest_max_memtable_bytes

The ingest_max_memtable_bytes setting puts an upper boundary on the size of CSV lookup tables that are used for ingest-time lookup processing. This ensures that ingest-time lookup files never take up too much space in memory. By default, lookup tables larger than 10mb in size cannot be used for the lookup() eval function when it is used with INGEST_EVAL. When the size of a lookup table being used by an ingest-time lookup exceeds the ingest_max_memtable_bytes setting, any lookups that rely on it fail. Error messages indicating their failure appear in splunkd.log when Splunk is restarted.

The ingest_max_memtable_bytes setting is intentionally separate from max_memtable_bytes, a similar setting for search-time lookups. It is set to a lower value so that the indexing pipeline is not affected as much in terms of memory and CPU usage.

ingest_lookup_refresh_period_secs

If you have ingest-time lookups that are based on CSV lookup tables that change on a frequent basis, you can adjust the ingest_lookup_refresh_period_secs setting to ensure that these changes are captured. By default this setting ensures that the in-memory lookup tables that are used with the lookup() function at ingest time are refreshed every 60 seconds.

Last modified on 20 March, 2024
Process events with ingest-time eval   About hosts

This documentation applies to the following versions of Splunk Cloud Platform: 9.3.2408, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters