Splunk® Data Stream Processor

Use the Data Stream Processor

Acrobat logo Download manual as PDF


DSP 1.2.1 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Connect the to a Splunk Enterprise KV Store

You can connect the to a Splunk Enterprise KV Store to do the following.

  • Enrich streaming data with data from a Splunk Enterprise KV Store collection.
  • Write data from the to a Splunk Enterprise KV Store collection.

For both use cases, you must first connect the to a Splunk Enterprise KV Store.

The general workflow for using a KV Store lookup in the is to connect to a Splunk Enterprise KV Store collection in the Lookups page and then use the desired lookup functions to either enrich your data with data from the connected KV Store or write data to the connected KV Store. You can use a single lookup connection across multiple pipelines.

The Splunk Data Stream Processor currently supports lookups to KV Store collections up to 10GB in size or 6.5 million records, depending on whichever is lower.

About KV Store collections

Before you create a KV Store lookup, you must have at least one KV Store collection defined in collections.conf in Splunk Enterprise. See Use configuration files to create a KV Store collection on the Splunk Developer Portal for information about creating a Splunk Enterprise KV Store collection.

A Splunk Enterprise KV Store collection is a database that stores your data as key-value pairs. When you create a KV Store lookup, the collection must have at least two fields. One of those fields should have a set of values that match with the values of a field in your event data, so that lookup matching can take place.

When you invoke the lookup in a pipeline with the lookup function, you designate a field in your incoming data to match with the field in your KV Store collection. When a value of this field matches a value of the designated field in your KV Store collection, the corresponding values for the other fields in your KV Store collection can be added to the outgoing data.

How the handles data types for Splunk Enterprise KV Store lookups

The Splunk Enterprise KV Store is a generic store of key-value data where you can store data with a limited set of types. Since the has a more robust typing system, a mapping must be made from the KV Store types to types. To start, a lookup must determine the schema (data types) of the KV Store fields. It can do that by one of two ways.

  • (Recommended) The checks the Splunk Enterprise KV Store to see if a schema was defined for the KV Store in collections.conf.
  • As an emergency fallback, if the KV Store does not have a user-defined schema, then the will sample 10 records from the KV Store collection to try and infer the data type of the fields in the KV Store collection.

Data types are defined in the KV Store collection

To ensure the proper mapping of KV Store types to types, best practices are to define and enforce data types for fields in the KV Store collection. To learn how to define and enforce data types for the KV Store collection, see Use configuration files to create a KV Store collection in the Splunk Developer Portal.

The following table describes supported Splunk Enterprise KV Store data types and its data type equivalent.

Splunk Enterprise KV Store Data Type Data Type Notes
string string
number double By default, the maps KV Store type number to the type double. However, if the record field you are mapping to has a different numeric type, such as long or integer, then that numeric type is used instead.
bool boolean
time double The and Splunk Enterprise both use Unix epoch time, but represent and store timestamps in different ways. In Splunk Enterprise, time is defined in Unix epoch time and stored in a fixed point numeric format. When you store Splunk Enterprise time values in the Splunk Enterprise KV store, it is saved as a floating point number. In the , timestamp values are stored in the timestamp field in epoch time format in milliseconds.

To convert the Splunk Enterprise time format to the time format, use the parse_millis and parse_nanos functions.
array collection<any>
cidr string CIDR matching is not currently supported in the .

The Splunk Enterprise KV Store defines two built-in fields for all records: _key and _user, both of type string. If you want to use either of those fields in a lookup, you'll need to include them in your schema definition in the collections.conf file.

Data types are not defined in the KV Store collection

If you do not define data types in the KV Store collection, the attempts to infer the data type of your KV Store collection fields. In order to check that the KV Store schema is valid, the lookup function samples 10 records from the KV Store collection in order to infer the data type of the KV Store collection fields. Because the KV Store collection supports a limited set of data types, the only attempts to infer whether the data type should be a string, number, boolean, or array. If there are any type conflicts, an error is shown when activating a pipeline containing this lookup.

If there are any fields where all of the values are null, then a subsequent query is done to the KV store asking for a row where that field has a non-null value. The type is inferred from the row returned. If no rows are returned, then an error is shown when activating a pipeline containing this lookup.

In order for the to infer the schema of the KV Store collection, the collection must have the following properties:

  • The data in the collection must have a consistent schema.
  • Fields in the collection must be consistently typed, and a field in the collection should not have two different types. If there are at least two rows containing the same fields that have different types, then the cannot infer the schema.
  • Fields in the collection must have at least one non-null value.
  • The collection cannot be empty.

Connect to a Splunk Enterprise KV Store collection using the UI

You can connect to a Splunk Enterprise KV Store using either the UI or the Streams API. To perform a KV Store lookup or to write data to a KV Store, you must first create a connection to the Splunk Enterprise KV Store.

Prerequisites

  • A KV Store collection. In order for the to perform lookups to the KV Store collection, there are some additional prerequisites around what the KV Store collection can contain. See the "How the determines Splunk Enterprise KV Store types and schema" section, on this page. If you have not created a KV Store collection, see Use configuration files to create a KV Store collection.

Steps

  1. From the UI, click Data Management > Lookups to go to the Lookups management page.
  2. In the Lookups management page, click Add lookup.
  3. Enter a name for your lookup. This is the name that you use in the lookup functions.
  4. (Optional) Provide a description for the lookup connection.
  5. Click Connect to Splunk Enterprise KV Store.
  6. Enter the URL for Splunk Enterprise REST API access. This is the URL to the Splunk Enterprise management port.
    https://localhost:8089/
  7. The username and password used to login to Splunk Web.
  8. Select whether you want to use a lookup already defined in a Splunk Enterprise transforms.conf file. If not, skip to step 9.
    1. Provide the lookup name as defined in transforms.conf.
    2. Provide the application name that this lookup resides in. This is the application name where the transforms.conf file is located. The KV Store collection it uses will also be in this app. For example, $SPLUNK_HOME/etc/apps/yourappname/default/transforms.conf.
    3. Provide the username of the user that can access the lookup. For shared lookups, type nobody.
  9. (Optional) If you are not using a lookup already defined in Splunk Enterprise transforms.conf and just want to use the Splunk Enterprise KV store directly, do the following:
    1. Provide the name of the collection as defined in collections.conf.
    2. Provide the application name that the collection resides in. This is the application name where the collections.conf file is located. For example, $SPLUNK_HOME/etc/apps/yourappname/default/collections.conf.
    3. Provide the username of the user that can access the collection. Must be set to nobody.
  10. Choose a fall-back behavior for active pipelines using this lookup in case there is an issue with this lookup later on.
    • Ignore this lookup if the connection fails.
      • Select this option if you want active pipelines to continue running even if there is an issue with the lookup connection.
    • Fail pipelines using this lookup if the connection fails.
      • Select this option if you want the to fail the pipeline if the lookup connection fails. Once the lookup connection is re-established, you must restart the pipeline to start processing data again.
  11. Click Save.

You can now use the lookup function to enrich incoming data with data from a Splunk Enterprise KV Store.

Connect to the Splunk Enterprise KV Store using the Streams API

The following example demonstrates how to use the Streams API to connect to a Splunk Enterprise KV Store.

Prerequisites

  • A KV Store collection. In order for the to perform lookups to the KV Store collection, there are some additional prerequisites around what the KV Store collection can contain. See the "How the determines Splunk Enterprise KV Store types and schema" section, on this page. If you have not created a KV Store collection, see Use configuration files to create a KV Store collection.
  • A valid bearer token. You can get a bearer token by logging into SCloud using the --verbose flag.
    scloud login --verbose

Steps if you want to connect to a KV Store lookup with a transforms.conf stanza

A transforms.conf KV Store lookup stanza provides the location of the KV Store collection that is to be used as a lookup table. Follow these steps if your transforms.conf file contains a KV Store lookup stanza for your KV Store collection.

  1. Enter the following from a command-line. You must provide a name for your lookup in the (can be different from the name of the lookup in Splunk Enterprise), the base_url, and the username and password used to login to Splunk Enterprise. You must also provide the splunk_lookup_name, the splunk_lookup_app, and the splunk_lookup_user.
    curl -X POST "https://<DSP_HOST>:31000/default/streams/v3beta1/connections" \
        -H "Authorization: Bearer <token>" \
        -H "Content-Type: application/json" \
        -d '{"connectorId": "adb4705d-4fba-44e0-b1d2-3d8f1878b9a2", 
         "name": "lookup_demo",
         "description": "",
         "data": {
            "base_url": "https://localhost:8089", 
            "username": "admin", 
            "password": "changeme"
            "splunk_lookup_name": "my_splunk_lookup",
            "splunk_lookup_app": "myapp",
            "splunk_lookup_user": "nobody"
                  }
             }'
    

    If the application where the lookup is defined is different from the application where the collection was created, then you must additionally provide the collection_app and collection_user values.

    Parameter Type Description
    base_url String The URL to the Splunk Enterprise REST API, including the management port.:
    https://localhost:8089/
    username String The username used to login to Splunk Enterprise.
    password String The password used to login to Splunk Enterprise.
    splunk_lookup_name String The name of the lookup as defined in transforms.conf. For example, http_status. This field is required when integrating with an existing Splunk Enterprise lookup.
    splunk_lookup_app String The name of the application that this lookup resides in. This is the application name where the transforms.conf file is located. For example, $SPLUNK_HOME/etc/apps/yourappname/default/transforms.conf. This field is required when integrating with an existing Splunk Enterprise lookup.
    splunk_lookup_user String The username of the user that can access the lookup. For shared lookups, enter nobody.
    collection_app String The name of the app containing the KV Store collection. This is the application name where the collections.conf file is located. For example, $SPLUNK_HOME/etc/apps/yourappname/default/collections.conf. This field is optional when integrating with an existing Splunk Enterprise lookup.
    collection_user String Must be set to nobody.

    In addition, the following optional tuning parameters are available through the Streams API. Use these optional parameters if you are experiencing throughput issues when receiving data from a KV Store into the lookup function.

    Parameter Type Description
    cache_size Long The maximum cache size in bytes. Entries expire based on the cache_expiry_after_access and cache_expiry_after_write settings. Defaults to 10MB.
    cache_expiry_after_access Long Expire cache entry after this amount of time, in milliseconds, if there is no cache hit for the entry. This timer is reset each time the entry is accessed. Defaults to 0.


    If both cache_expiry_after_access and cache_expiry_after_write are configured, then the value of cache_expiry_after_write is ignored.

    cache_expiry_after_write Long Expire cache entry after this amount of time after the entry is written, in milliseconds. In other words, sets a timer to expire a cache entry after it's updated. Defaults to 24 hours. After 24 hours, if the cached entry hasn't been updated, it will be automatically removed from the cache.


    If both cache_expiry_after_access and cache_expiry_after_write are configured, then the value of cache_expiry_after_write is ignored.

    The Splunk Data Stream Processor does not globally invalidate the cache when a key in the KV Store collection is updated. This means that the cache might be partially invalidated and you might sometimes see stale values for the duration set in cache_expiry_after_write.

    batch_size Long Number of records to accumulate before performing a lookup. Batches are accumulated until they hit the target batch size or it reaches the time specified in batch_expiry. Batching records will decrease the number of network calls and improve overall throughput. Applies to both lookup reads and writes. Defaults to 1000.
    batch_expiry Long The maximum amount of time, in milliseconds, to wait before doing a batch lookup. Applies to both lookup reads and writes. Defaults to 1 second.
    kv_store_lookup_timeout_ms Long The maximum amount of time, in milliseconds, to wait for a response when reading or writing to a Splunk Enterprise KV Store. If the does not receive a response within this time, an error is thrown and the pipeline restarts unless fail_on_disconnect is set to false. Defaults to 9 seconds.
    fail_on_disconnect Boolean By default, active pipelines continue to run even if the lookup function loses its connection to the lookup service. Set to true to deactivate the pipeline if the lookup function cannot connect to the KV Store service.
    output_lookup_status Boolean Add an additional field to each record that specifies whether a lookup match was found.

    If you are using the lookup function, this field is named lookup_result and contains a map of a key-value pair where key is the lookup name and value is one of the following: ENRICHED, NO_MATCH, or CONNECTION_ERROR depending on the lookup's success.

    If you are using the Write Thru KV Store function, this field is named lookup_write_result and contains a map of a key-value pair where key is the lookup name and value is one of the following: WRITE_SUCCESSFUL, WRITE_SKIPPED, or WRITE_ERROR.
  2. (Optional) You can use the PATCH endpoint to update the connection to the KV Store. In the following example, we'll PATCH the KV Store connection we created in step 1 to add a batch_size of 10.
    curl -X PATCH "https://<DSP_HOST>:31000/default/streams/v3beta1/connections/adb4705d-4fba-44e0-b1d2-3d8f1878b9a2" \
        -H "Authorization: Bearer <token>" \
        -H "Content-Type: application/json" \
        -d '{
         "name": "lookup_demo",
         "description": "",
         "data": {
            "batch_size": "10"
                  }
             }'
    

    You must restart any pipelines that are using this lookup connection in order for these changes to be used in your pipeline.

You can now use the lookup function to enrich incoming data with data from a Splunk Enterprise KV Store.

Steps to use a Splunk Enterprise KV Store collection directly

If your transforms.conf does not contain a KV Store lookup stanza for your collection, you must tell the how to connect to the Splunk Enterprise KV store you want to use directly. To do so, follow these steps.

  1. (Optional) Enter the following from a command-line. You must provide a name for your lookup in the (can be different from the name of the lookup in Splunk Enterprise), the base_url, and the username and password used to login to Splunk Enterprise. You must also provide the collection_name, the collection_app, and the collection_user.
    curl -X POST "https://<DSP_HOST>:31000/default/streams/v3beta1/connections" \
        -H "Authorization: Bearer <token>" \
        -H "Content-Type: application/json" \
        -d '{"connectorId": "adb4705d-4fba-44e0-b1d2-3d8f1878b9a2", 
         "name": "lookup_demo",
         "description": "",
         "data": {
            "base_url": "https://localhost:8089", 
            "username": "admin", 
            "password": "changeme"
            "collection_name": "demo_collection",
            "collection_app": "myapp",
            "collection_user": "nobody"
                  }
             }'
    
    Parameter Type Description
    base_url String The URL to the Splunk Enterprise REST API, including the management port.:
    https://localhost:8089/
    username String The username used to login to Splunk Enterprise.
    password String The password used to login to Splunk Enterprise.
    collection_name String The name of the KV Store collection as defined in collections.conf. For example, http_status.
    collection_app String The name of the app containing the KV Store collection. This is the application name where the collections.conf file is located. For example, $SPLUNK_HOME/etc/apps/yourappname/default/collections.conf.
    collection_user String Must be set to nobody. This field is optional when integrating with an existing Splunk Enterprise lookup.

    In addition, the following optional tuning parameters are available through the Streams API. Use these optional parameters if you are experiencing throughput issues when receiving data from a KV Store into the lookup function.

    Parameter Type Description
    cache_size Long The maximum cache size in bytes. Entries expire based on the cache_expiry_after_access and cache_expiry_after_write settings. Defaults to 10MB.
    cache_expiry_after_access Long Expire cache entry after this amount of time, in milliseconds, if there is no cache hit for the entry. This timer is reset each time the entry is accessed. Defaults to 0.


    If both cache_expiry_after_access and cache_expiry_after_write are configured, then the value of cache_expiry_after_write is ignored.

    cache_expiry_after_write Long Expire cache entry after this amount of time after the entry is written, in milliseconds. In other words, sets a timer to expire a cache entry after it's updated. Defaults to 24 hours. After 24 hours, if the cached entry hasn't been updated, it will be automatically removed from the cache.


    If both cache_expiry_after_access and cache_expiry_after_write are configured, then the value of cache_expiry_after_write is ignored.

    The Splunk Data Stream Processor does not globally invalidate the cache when a key in the KV Store collection is updated. This means that the cache might be partially invalidated and you might sometimes see stale values for the duration set in cache_expiry_after_write.

    batch_size Long Number of records to accumulate before performing a lookup. Batches are accumulated until they hit the target batch size or it reaches the time specified in batch_expiry. Batching records will decrease the number of network calls and improve overall throughput. Applies to both lookup reads and writes. Defaults to 1000.
    batch_expiry Long The maximum amount of time, in milliseconds, to wait before doing a batch lookup. Applies to both lookup reads and writes. Defaults to 1 second.
    kv_store_lookup_timeout_ms Long The maximum amount of time, in milliseconds, to wait for a response when reading or writing to a Splunk Enterprise KV Store. If the does not receive a response within this time, an error is thrown and the pipeline restarts unless fail_on_disconnect is set to false. Defaults to 9 seconds.
    fail_on_disconnect Boolean By default, active pipelines continue to run even if the lookup function loses its connection to the lookup service. Set to true to deactivate the pipeline if the lookup function cannot connect to the KV Store service.
    output_lookup_status Boolean Add an additional field to each record that specifies whether a lookup match was found.

    If you are using the lookup function, this field is named lookup_result and contains a map of a key-value pair where key is the lookup name and value is one of the following: ENRICHED, NO_MATCH, or CONNECTION_ERROR depending on the lookup's success.

    If you are using the Write Thru KV Store function, this field is named lookup_write_result and contains a map of a key-value pair where key is the lookup name and value is one of the following: WRITE_SUCCESSFUL, WRITE_SKIPPED, or WRITE_ERROR.
  2. (Optional) You can use the PATCH endpoint to update the connection to the KV Store. In the following example, we'll PATCH the KV Store connection we created in step 1 to add a batch_size of 10.
    curl -X PATCH "https://<DSP_HOST>:31000/default/streams/v3beta1/connections/adb4705d-4fba-44e0-b1d2-3d8f1878b9a2" \
        -H "Authorization: Bearer <token>" \
        -H "Content-Type: application/json" \
        -d '{
         "name": "lookup_demo",
         "description": "",
         "data": {
            "batch_size": "10"
                  }
             }'
    

    You must restart any pipelines that are using this lookup connection in order for these changes to be used in your pipeline.

You can now use the lookup function to enrich incoming data with data from a Splunk Enterprise KV Store.

Last modified on 16 March, 2022
PREVIOUS
Upload a CSV file to the to enrich data with a lookup
  NEXT
About lookup cache quotas

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters