Splunk® Data Stream Processor

Use the Data Stream Processor

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

About lookup cache quotas

You can cache the contents of a lookup to improve lookup performance, however there are some limitations that you should be aware of. These limitations only apply when you are using the Lookup function, not the Write Thru KV Store function.

The lookup cache is subject to a quota, or a maximum amount of data that can be contained, per pipeline. The following table describes the cache quota that applies for each type of lookup.

Lookup type Default cache quota per pipeline
CSV 50 MiB
KV Store 200 MiB

Although there are different cache quotas for each lookup type, the percentage of cache quotas that can be used is shared between CSV and KV Store lookups. For example, if 30% of the CSV cache quota is used, then only 70% of the KV Store cache quota remains available. As another example, if you have the following:

  • A pipeline with four lookup functions: two lookups to CSV files and two lookups to KV Stores.
  • CSV files that are sizes 10MiB and 20MiB in size.

In this case, you have used 30MiB of your 50MiB total, or 60% of your total quota. That means that you have 40% or 80MiB of the cache quota remaining for KV Store lookups (0.4*200MiB = 80MiB). In order to stay under the cache size limitations, the cache_size parameter for your KV Store connection should be 40MiB. Since you have two KV Store lookups in your pipeline, this adds up to 80MiB or 83886080 bytes.

Configure the maximum lookup cache quota

To ensure that your pipelines using a lookup are not cancelled, best practices are to ensure that all lookup results fit into the cache. There are two settings that control the lookup cache quota.

  • The K8S_SS_REST_LOOKUP_QUOTA_MAX_STATIC_MB setting specifies the maximum cache quota per pipeline for CSV lookups.
  • The K8S_SS_REST_LOOKUP_QUOTA_MAX_CACHED_BYTES setting specifies the maximum cache quota per pipeline for KV Store lookups.

In addition to the two settings above, you may need to update the Kubernetes memory settings to make sure that you have enough memory to support the desired cache quotas.

  • The K8S_FLINK_TASK_MGR_MEM_LIMIT setting specifies the minimum amount of memory assigned to the pod.
  • The K8S_FLINK_TASK_MGR_MEM_REQUEST setting specifies the maximum amount of memory assigned to the pod.
  • The K8S_FLINK_TASK_MGR_HEAP_MB setting specifies the heap size for the pod.

Configure the cache quota for CSV lookups

Do the following steps to increase the cache quota for CSV lookups. This allows you to upload CSV files larger than 50MiB using the Streams API.

  1. Configure the cache quota for CSV lookups by running the following command in the command-line. The value must be in mebibytes (MiB).
    ./set-config K8S_SS_REST_LOOKUP_QUOTA_MAX_STATIC_MB <value>
  2. Increase the heap size. Best practices are to increase the heap size by at least 4x the size of the file. For example, if your heap is already set at 3000 (3GB) and you are increasing the lookup cache quota to 100MB, you should set the heap size to be at least 3400 MB.
    ./set-config K8S_FLINK_TASK_MGR_HEAP_MB <value> 
  3. Since you are increasing the cache quota, you should also increase the minimum amount of memory allocated to a pod accordingly. For a list of accepted memory sizes, see the "Managing Resources for Containers" section in the Kubernetes documentation.
    ./set-config K8S_FLINK_TASK_MGR_MEM_REQUEST <value>
  4. Since you are increasing the cache quota, you should also increase the maximum amount of memory allocated to a pod accordingly. For a list of accepted memory sizes, see the "Managing Resources for Containers" section in the Kubernetes documentation.
    ./set-config K8S_FLINK_TASK_MGR_MEM_LIMIT <value>
  5. Deploy your changes.
    ./deploy

Even if you increase the CSV cache quota, there is still a maximum file size of 50MB when uploading a CSV file using the UI. If you want to upload a larger CSV file, you'll need to upload the CSV file using the Streams API. See Upload a CSV file to the to enrich data with a lookup.

Configure the cache quota for KV Store lookups

  1. Configure the cache quota for KV Store lookups by running the following command in the command-line. The value must in bytes.
    ./set-config K8S_SS_REST_LOOKUP_QUOTA_MAX_CACHED_BYTES <value>
  2. Increase the heap size. Best practices are to increase the heap size the same amount that you are increasing the KV Store lookups cache size. For example, if you increased the K8S_SS_REST_LOOKUP_QUOTA_MAX_CACHED_BYTES cache size by 500MiB, then you should increase the heap size by at least 500MiB.
    ./set-config K8S_FLINK_TASK_MGR_HEAP_MB <value> 
  3. Since you are increasing the cache quota, you should also increase the minimum amount of memory allocated to a pod accordingly. For a list of accepted memory sizes, see the "Managing Resources for Containers" section in the Kubernetes documentation.
    ./set-config K8S_FLINK_TASK_MGR_MEM_REQUEST <value>
  4. Since you are increasing the cache quota, you should also increase the maximum amount of memory allocated to a pod accordingly. For a list of accepted memory sizes, see the "Managing Resources for Containers" section in the Kubernetes documentation.
    ./set-config K8S_FLINK_TASK_MGR_MEM_LIMIT <value>
  5. Deploy your changes.
    ./deploy
Last modified on 19 January, 2021
PREVIOUS
Connect the to a Splunk Enterprise KV Store
  NEXT
Troubleshoot lookups to the Splunk Enterprise KV Store

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.1


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters