Splunk® Data Stream Processor

Use the Data Stream Processor

Acrobat logo Download manual as PDF


DSP 1.2.1 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Upload a CSV file to the to enrich data with a lookup

CSV lookups are file-based lookups that match field values from your events to field values in the static table represented by a CSV file. They output corresponding field values from the table to your data. They are also referred to as static lookups. Use lookups to enrich fields to your streaming data by adding fields from CSV files.

CSV lookups are best for small sets of data. The general workflow for creating a CSV lookup in the is to upload a file in the Lookups Management page and then invoke the CSV file using the lookup function.

Lookup table files

Lookup table files are files that contain a lookup table. A standard lookup pulls fields out of this table and adds them to your records when corresponding fields in the table are matched in your records.

A single lookup table file can be used by multiple pipelines.

Upload the lookup table file

To use a CSV lookup, you must first upload a lookup table file to the .

Prerequisites

  • An available .csv file. The maximum file size is 50MiB.

Steps

  1. From the UI, click Data Management > Lookups to go to the Lookups management page.
  2. In the Lookups management page, click Add lookup.
  3. Enter a name for your lookup.
  4. Upload the CSV file.
  5. Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
  6. Click Save.

You can now use the lookup file in your pipelines using the lookup function.

Update a CSV lookup

Follow these steps to upload a new version of a lookup table file.

Update a CSV lookup using the UI

  1. From the UI, click Data Management > Lookups to go to the Lookups management page.
  2. In the Lookups management page, find the lookup that you'd like to update.
  3. Click on the name of the lookup, and then click the Edit lookup button.
  4. Upload the new CSV file.
  5. Check whether your file has a header in the first row. If your file doesn't have a header, enter the header fields separated by commas.
  6. Click Save.
  7. If you have any active pipelines currently using this lookup, restart those pipelines to use the latest version of the lookup.

You now have an updated lookup file that can be used in your pipelines with the lookup function.

Update a CSV lookup using the Streams API

  1. Log in to SCloud. Copy and save the bearer token returned to a preferred location.
    ./scloud login --verbose
  2. Upload the new CSV file that you want to use. Copy and save the id value returned to a preferred location.
    curl -k  --location --request POST 'https://<DSP_HOST>:31000/default/streams/v3beta1/files' \
    --header 'Authorization: Bearer <my-bearer-token>' 
    --form 'file=@/path/to/my/csv_file.csv'
    

    This CSV file must contain the same schema, or headers, as the previous CSV file. If you want to use a CSV file containing a different schema, then you must create a new lookup connection.

  3. Now that you've uploaded the new CSV file that you want to use, retrieve the id corresponding to the connection to the CSV file. Copy and save the id value returned to a preferred location.
    curl -k  --location --request GET 'https://<DSP_HOST>:31000/default/streams/v3beta1/connections' \
    --header 'Authorization: Bearer <my-bearer-token>' 
    
  4. Modify the existing lookup connection to use the updated CSV file. Replace connection_id with the id from step 3 and file_id with the id from step 2.
    curl -k -X PATCH "https://<DSP_HOST>:31000/default/streams/v3beta1/connections/<connection_id>" \
        -H "Authorization: Bearer <my-bearer-token>" \
        -H "Content-Type: application/json" \
        -d '{"data": {"file_id": "<file_id>"}}'
    

    The following table lists the full range of options available when you update the lookup connection.

    JSON Parameter Format Description
    check_for_new_connection_secs integer Optional. By default, the does not automatically check if there have been updates to the CSV file. This option enables automatic updates and allows you to select how frequently you want to check for updates to the CSV file. If the detects that an update has been made to the CSV file, any active pipelines using this CSV file will automatically switch to using the latest version of the CSV file. This value must be 30 seconds or greater. Set this to a higher value, such as 300 seconds (5 minutes) to decrease network traffic.

    If you have this setting enabled and your pipeline fails shortly after updating an in-use CSV lookup file, check to see if you have violated the total allowed cache quota for your pipeline. The cumulative size of all CSV lookups in a single pipeline cannot exceed 50MiB. For example, in a single pipeline, you can use one 50MiB CSV lookup file or five 10MiB files. If you update your CSV file and exceed this quota, then your pipeline will fail. To prevent this, make sure that the cumulative size of all CSV lookups in a single pipeline do not exceed 50MiB.

    trim_edge_whitespace boolean Optional. Set to false if you do not want to trim leading and trailing whitespaces from your file headers or data rows. Defaults to true.

You now have an updated lookup file that can be used in your pipelines with the lookup function.

Deleting unused lookup files

After 24 hours, the automatically deletes any unused lookup files. This means that any files that are not associated with an existing lookup connection and any old versions of a lookup file are automatically deleted from the system.

If you want to modify how frequently unused lookup files are deleted, perform the following steps.

  1. Run the following command to configure how frequently you want to delete unused lookup files. Set to 0 to disable automatic cleanup of lookup files.
    ./set-config K8S_PIPELINES_DATA_FILE_CLEANUP_FREQUENCY_IN_HRS <value>
    
  2. Deploy your changes.
    ./deploy
Last modified on 01 July, 2021
PREVIOUS
About lookups
  NEXT
Connect the to a Splunk Enterprise KV Store

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters