Upload a CSV file to the to enrich data with a lookup

CSV lookups are file-based lookups that match field values from your events to field values in the static table represented by a CSV file. They output corresponding field values from the table to your data. They are also referred to as static lookups. Use lookups to enrich fields to your streaming data by adding fields from CSV files.

CSV lookups are best for small sets of data. The general workflow for creating a CSV lookup in the is to upload a file in the Lookups Management page and then invoke the CSV file using the lookup function.

Lookup table files

Lookup table files are files that contain a lookup table. A standard lookup pulls fields out of this table and adds them to your records when corresponding fields in the table are matched in your records.

A single lookup table file can be used by multiple pipelines.

Upload the lookup table file

To use a CSV lookup, you must first upload a lookup table file to the .

Prerequisites

An available .csv file. The maximum file size is 50MiB.

Steps

From the UI, click Data Management > Lookups to go to the Lookups management page.
In the Lookups management page, click Add lookup.
Enter a name for your lookup.
Upload the CSV file.
Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
Click Save.

You can now use the lookup file in your pipelines using the lookup function.

Update a CSV lookup

Follow these steps to upload a new version of a lookup table file.

Update a CSV lookup using the UI

From the UI, click Data Management > Lookups to go to the Lookups management page.
In the Lookups management page, find the lookup that you'd like to update.
Click on the name of the lookup, and then click the Edit lookup button.
Upload the new CSV file.
Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
Click Save.
If you have any active pipelines currently using this lookup, restart those pipelines to use the latest version of the lookup.

You now have an updated lookup file that can be used in your pipelines with the lookup function.

Update a CSV lookup using the Streams API

Log in to SCloud. Copy and save the bearer token returned to a preferred location.
```
./scloud login --verbose
```
Upload the new CSV file that you want to use. Copy and save the id value returned to a preferred location.
```
curl -k  --location --request POST 'https://<DSP_HOST>:31000/default/streams/v3beta1/files' \
--header 'Authorization: Bearer <my-bearer-token>' 
--form 'file=@/path/to/my/csv_file.csv'
```
This CSV file must contain the same schema, or headers, as the previous CSV file. If you want to use a CSV file containing a different schema, then you must create a new lookup connection.
Now that you've uploaded the new CSV file that you want to use, retrieve the id corresponding to the connection to the CSV file. Copy and save the id value returned to a preferred location.
```
curl -k  --location --request GET 'https://<DSP_HOST>:31000/default/streams/v3beta1/connections' \
--header 'Authorization: Bearer <my-bearer-token>' 
```

Modify the existing lookup connection to use the updated CSV file. Replace connection_id with the id from step 3 and file_id with the id from step 2.

curl -k -X PATCH "https://<DSP_HOST>:31000/default/streams/v3beta1/connections/<connection_id>" \
    -H "Authorization: Bearer <my-bearer-token>" \
    -H "Content-Type: application/json" \
    -d '{"data": {"file_id": "<file_id>"}}'

The following table lists the full range of options available when you update the lookup connection.

JSON Parameter	Format	Description
check_for_new_connection_secs	integer	Optional. By default, the does not automatically check if there have been updates to the CSV file. This option enables automatic updates and allows you to select how frequently you want to check for updates to the CSV file. If the detects that an update has been made to the CSV file, any active pipelines using this CSV file will automatically switch to using the latest version of the CSV file. This value must be 30 seconds or greater. Set this to a higher value, such as 300 seconds (5 minutes) to decrease network traffic. If you have this setting enabled and your pipeline fails shortly after updating an in-use CSV lookup file, check to see if you have violated the total allowed cache quota for your pipeline. The cumulative size of all CSV lookups in a single pipeline cannot exceed 50MiB. For example, in a single pipeline, you can use one 50MiB CSV lookup file or five 10MiB files. If you update your CSV file and exceed this quota, then your pipeline will fail. To prevent this, make sure that the cumulative size of all CSV lookups in a single pipeline do not exceed 50MiB.
trim_edge_whitespace	boolean	Optional. Set to false if you do not want to trim leading and trailing whitespaces from your file headers or data rows. Defaults to true.

You now have an updated lookup file that can be used in your pipelines with the lookup function.

Delete unused lookup files

When you upload a new version of a lookup file, the does not automatically delete the old version of the lookup file. Follow these steps to delete any old versions of lookup files that are no longer in use.

Log in to SCloud. Copy and save the bearer token returned to a preferred location.
```
./scloud login --verbose
```
Run the following command to list all of the files uploaded to your tenant. Replace my-bearer-token with the token that you copied in step 1. Locate the file that you want to delete in the response, and save the associated ID to a preferred location.
```
 curl -k  --location --request POST 'https://<DSP_HOST>:31000/default/streams/v3beta1/files' \
--header 'Authorization: Bearer <my-bearer-token>' 
```

Run the following command to delete the desired file. Replace file_id with the ID from the previous step.

 curl -k -X DELETE 'https://<DSP_HOST>:31000/default/streams/v3beta1/lookups/files/<file_id>' \
-H "Authorization: Bearer <my-bearer-token>" \
-H "Content-Type: application/json"

Use the Data Stream Processor

Related Answers