Upload a CSV file to the to enrich data with a lookup

CSV lookups are file-based lookups that match field values from your events to field values in the static table represented by a CSV file. They output corresponding field values from the table to your data. They are also referred to as static lookups. Use lookups to enrich fields to your streaming data by adding fields from CSV files.

CSV lookups are best for small sets of data. The general workflow for creating a CSV lookup in the is to upload a file in the Lookups tab and then invoke the CSV file using the lookup function.

Lookup table files

Lookup table files are files that contain a lookup table. A standard lookup pulls fields out of this table and adds them to your records when corresponding fields in the table are matched in your records.

A single lookup table file can be used by multiple pipelines.

Upload the lookup table file

To use a CSV lookup, you must first upload a lookup table file to the .

Prerequisites

An available .csv file. The maximum file size is 50MiB.

Steps

In the Splunk Data Stream Processor, select Lookups
On the Lookups page, click Add lookup.
Enter a name for your lookup.
Upload the CSV file.
Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
Click Save.

You can now use the lookup file in your pipelines using the lookup function.

Update a CSV lookup

Follow these steps to upload a new version of a lookup table file. By default, the automatically detects when you upload a new version of a CSV lookup file and active pipelines will automatically switch to using the latest version of the CSV file.

Update a CSV lookup using the UI

In the Splunk Data Stream Processor, select Lookups and find the lookup that you'd like to update.
Click on the name of the lookup, and then click the Edit lookup button.
Upload the new CSV file.
Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
Click Save.

You now have an updated lookup file that can be used in your pipelines with the lookup function.

Update a CSV lookup using the Streams API

Log in to the Splunk Cloud Services CLI. Copy and save the bearer token returned to a preferred location.
```
./scloud login --verbose
```
Upload the new CSV file that you want to use. Copy and save the returned id value to a preferred location.
```
curl -k  --location --request POST 'https://<DSP_HOST>/default/streams/v3beta1/lookups/files' \ 
--header 'Authorization: Bearer <my-bearer-token>' 
--form 'file=@/path/to/my/csv_file.csv'
```
This CSV file must contain the same schema, or headers, as the previous CSV file. If you want to use a CSV file containing a different schema, then you must create a new lookup connection.
Now that you've uploaded the new CSV file that you want to use, retrieve the id corresponding to the connection to the CSV file. Copy and save the returned id value to a preferred location.
```
curl -k  --location --request GET 'https://<DSP_HOST>/default/streams/v3beta1/connections' \
--header 'Authorization: Bearer <my-bearer-token>' 
```

Modify the existing lookup connection to use the updated CSV file. Replace connection_id with the id from step 3 and file_id with the id from step 2.

curl -k -X PATCH "https://<DSP_HOST>/default/streams/v3beta1/connections/<connection_id>" \
    -H "Authorization: Bearer <my-bearer-token>" \
    -H "Content-Type: application/json" \
    -d '{"data": {"file_id": "<file_id>"}}'

The following table lists the full range of options available when you update the lookup connection.

JSON Parameter	Format	Description
check_for_new_connection_secs	integer	Optional. The checks every minute to see if there have been updates to CSV lookup files. If there was an update, any active pipelines using the CSV file automatically switch to using the latest version of the file. This option enables or disables automatic updates and allows you to select how frequently you want to check for updates to the CSV file. Set this to 0 to disable automatic updates. This value must be 30 seconds or greater. Set this to a higher value, such as 300 seconds (5 minutes) to decrease network traffic. Defaults to 60 seconds. If this setting is enabled and your pipeline fails shortly after updating an in-use CSV lookup file, check to see if you have violated the total allowed cache quota for your pipeline. The cumulative size of all CSV lookups in a single pipeline cannot exceed 50MiB. For example, in a single pipeline, you can use one 50MiB CSV lookup file or five 10MiB files. If you update your CSV file and exceed this quota, then your pipeline will fail. To prevent this, make sure that the cumulative size of all CSV lookups in a single pipeline do not exceed 50MiB.
trim_edge_whitespace	boolean	Optional. Set to false if you do not want to trim leading and trailing whitespaces from your file headers or data rows. Defaults to true.

You now have an updated lookup file that can be used in your pipelines with the lookup function.

Enable or disable automatic updates using the DSP CLI tool

By default, the checks every minute to see if there have been any updates to your CSV lookup files. If an update is detected, then any active pipelines using the CSV file automatically switch to using the latest version of the file.

As an alternative to configuring this automatic update behavior using the check_for_new_connection_secs JSON parameter described in the previous section, you can also enable or disable automatic updates using the DSP CLI tool.

Navigate to the working directory of a DSP controller node.
Run one of the following commands to enable or disable automatic updates for CSV lookup files:
- To enable automatic updates, run the following command:
```
dsp admin connection update-lookup --enable-update-checks
```
- To disable automatic updates, run the following command:
```
dsp admin connection update-lookup --disable-update-checks
```

Deleting unused lookup files

After 24 hours, the automatically deletes any unused lookup files. This means that any files that are not associated with an existing lookup connection and any old versions of a lookup file are automatically deleted from the system.

If you want to modify how frequently unused lookup files are deleted, perform the following steps.

From the working directory of a DSP controller node, run the following command to configure how frequently you want to delete unused lookup files. Set to 0 to disable automatic cleanup of lookup files.
```
dsp config set streams pipelines_data_file_cleanup_frequency_in_hrs=<value>
```
Deploy your changes.
```
dsp deploy streams
```

Related answers from Splunk Community