Upload a CSV file to the to enrich data with a lookup
CSV lookups are file-based lookups that match field values from your events to field values in the static table represented by a CSV file. They output corresponding field values from the table to your data. They are also referred to as static lookups. Use lookups to enrich fields to your streaming data by adding fields from CSV files.
CSV lookups are best for small sets of data. The general workflow for creating a CSV lookup in the is to upload a file in the Lookups Management page and then invoke the CSV file using the lookup function.
Lookup table files
Lookup table files are files that contain a lookup table. A standard lookup pulls fields out of this table and adds them to your records when corresponding fields in the table are matched in your records.
A single lookup table file can be used by multiple pipelines.
Upload the lookup table file
To use a CSV lookup, you must first upload a lookup table file to the .
Prerequisites
- An available .csv file. The maximum file size is 50MiB.
Steps
- From the UI, click Data Management > Lookups to go to the Lookups management page.
- In the Lookups management page, click Add lookup.
- Enter a name for your lookup.
- Upload the CSV file.
- Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
- Click Save.
You can now use the lookup file in your pipelines using the lookup function.
Update a CSV lookup
Follow these steps to upload a new version of a lookup table file.
Update a CSV lookup using the UI
- From the UI, click Data Management > Lookups to go to the Lookups management page.
- In the Lookups management page, find the lookup that you'd like to update.
- Click on the name of the lookup, and then click the Edit lookup button.
- Upload the new CSV file.
- Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
- Click Save.
- If you have any active pipelines currently using this lookup, restart those pipelines to use the latest version of the lookup.
You now have an updated lookup file that can be used in your pipelines with the lookup function.
Update a CSV lookup using the Streams API
- Log in to SCloud. Copy and save the bearer token returned to a preferred location.
./scloud login --verbose
- Upload the new CSV file that you want to use. Copy and save the
id
value returned to a preferred location.curl -k --location --request POST 'https://<DSP_HOST>:31000/default/streams/v3beta1/files' \ --header 'Authorization: Bearer <my-bearer-token>' --form 'file=@/path/to/my/csv_file.csv'
This CSV file must contain the same schema, or headers, as the previous CSV file. If you want to use a CSV file containing a different schema, then you must create a new lookup connection.
- Now that you've uploaded the new CSV file that you want to use, retrieve the
id
corresponding to the connection to the CSV file. Copy and save theid
value returned to a preferred location.curl -k --location --request GET 'https://<DSP_HOST>:31000/default/streams/v3beta1/connections' \ --header 'Authorization: Bearer <my-bearer-token>'
- Modify the existing lookup connection to use the updated CSV file. Replace
connection_id
with theid
from step 3 andfile_id
with theid
from step 2.curl -k -X PATCH "https://<DSP_HOST>:31000/default/streams/v3beta1/connections/<connection_id>" \ -H "Authorization: Bearer <my-bearer-token>" \ -H "Content-Type: application/json" \ -d '{"data": {"file_id": "<file_id>"}}'
The following table lists the full range of options available when you update the lookup connection.
JSON Parameter Format Description check_for_new_connection_secs integer Optional. By default, the does not automatically check if there have been updates to the CSV file. This option enables automatic updates and allows you to select how frequently you want to check for updates to the CSV file. If the detects that an update has been made to the CSV file, any active pipelines using this CSV file will automatically switch to using the latest version of the CSV file. This value must be 30 seconds or greater. Set this to a higher value, such as 300 seconds (5 minutes) to decrease network traffic. If you have this setting enabled and your pipeline fails shortly after updating an in-use CSV lookup file, check to see if you have violated the total allowed cache quota for your pipeline. The cumulative size of all CSV lookups in a single pipeline cannot exceed 50MiB. For example, in a single pipeline, you can use one 50MiB CSV lookup file or five 10MiB files. If you update your CSV file and exceed this quota, then your pipeline will fail. To prevent this, make sure that the cumulative size of all CSV lookups in a single pipeline do not exceed 50MiB.
trim_edge_whitespace boolean Optional. Set to false if you do not want to trim leading and trailing whitespaces from your file headers or data rows. Defaults to true.
You now have an updated lookup file that can be used in your pipelines with the lookup function.
Deleting unused lookup files
After 24 hours, the automatically deletes any unused lookup files. This means that any files that are not associated with an existing lookup connection and any old versions of a lookup file are automatically deleted from the system.
If you want to modify how frequently unused lookup files are deleted, perform the following steps.
- Run the following command to configure how frequently you want to delete unused lookup files. Set to 0 to disable automatic cleanup of lookup files.
./set-config K8S_PIPELINES_DATA_FILE_CLEANUP_FREQUENCY_IN_HRS <value>
- Deploy your changes.
./deploy
About lookups | Connect the to a Splunk Enterprise KV Store |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!