The SmartStore cache manager

Each indexer incorporates a cache manager that manages the SmartStore data in local storage. The cache manager attempts to maintain in local storage any data that is likely to participate in future searches. By caching the search working set, the cache manager minimizes the potential of search delays resulting from data being downloaded from the remote store.

Each indexer's cache manager operates independently from those on other indexers.

The cache manager performs these functions:

It copies buckets from local to remote storage when the buckets roll from hot to warm.
It fetches bucket files from remote storage into the local storage cache when a search needs the files.
It evicts files from the local storage cache when they are no longer likely to be needed for future searches.

How the cache manager handles hot buckets rolling to warm

When a hot bucket rolls to warm, the cache manager uploads a copy of the bucket to the remote storage. Once uploaded to remote storage, the bucket is eligible for eviction from the local cache. This process is described in detail in How indexing works in SmartStore.

Once a bucket is uploaded to remote storage, the files in it do not change, but a few files can be added to it later. For example, the cache manager might upload report acceleration summaries or delete journals as they are created on the indexer.

How the cache manager fetches buckets

The cache manager fetches bucket files from remote storage to fulfill search requests.

When the cache manager fetches a file from remote storage, it copies the file to the local cache. The master copy of the file remains on remote storage.

The cache manager does not always fetch entire buckets from remote storage. Instead, it can fetch individual bucket files if the search might not need the entire bucket. For example, by fetching the bloomfilter file first, the cache manager can, in some cases, avoid the need to fetch the rest of the bucket. Similarly, some searches, such as metadata and tstats, do not need the rawdata journal.gz file. Others, such as dbinspect and eventcount, do not need any bucket content.

The cache manager attempts to prefetch buckets during a search, so that each bucket is already local by the time the search is ready to access it. When prefetching buckets, the cache manager uses a heuristic that adjusts according to how the bloomfilter and tsidx files in other buckets eliminate results. Accordingly, the cache manager might prefetch either entire buckets or just individual files within buckets. For example, during a search, if the search on several previous buckets did not require the tsidx files, the cache manager does not prefetch tsidx files as it continues to process additional buckets.

When prefetching, the cache manager notes how long the search process is taking per bucket, and it prefetches buckets at a speed that allows the search to continue uninterrupted without the need to wait for a bucket fetch to complete. The cache manager also adjusts its speed to avoid prefetching an excessive number of buckets.

How the cache manager evicts buckets

The cache manager attempts to minimize the use of local storage by retaining those buckets and files that have a high likelihood of participating in a future search and evicting the rest. When the cache manager evicts a bucket, it removes the copy of that bucket residing on the cache. Because cached buckets are local copies of buckets whose master copies reside in remote storage, the eviction process does not result in loss of data.

When a bucket is evicted from the cache, its directory remains in the cache, but the directory is now empty. The .bucketManifest file for the bucket's index also retains metadata for the bucket.

The cache manager does not necessarily evict all files in a bucket. It favors evicting large files, such as the rawdata journal and the tsidx files, while leaving small files, such as bloomfilter and metadata, in the cache. By doing so, the cache manager can reduce the need to fetch the large files from remote storage. For example, inspection of a bucket's bloomfilter file at the start of a search can sometimes eliminate the need to search the bucket's data, therefore avoiding the need to fetch the bucket's rawdata journal and tsidx files. This behavior is configurable through the cache manager recency settings. See Set cache retention periods based on recency.

The cache manager operates according to a configurable policy. The default policy is "lru", which tells the cache manager to evict the least recently used bucket. To learn about the set of eviction policies available, see Set the eviction policy.

Configure the cache manager

See Configure the SmartStore cache manager.

Related answers from Splunk Community

The SmartStore cache manager

How the cache manager handles hot buckets rolling to warm

How the cache manager fetches buckets

How the cache manager evicts buckets

Configure the cache manager

Comments

The SmartStore cache manager

Was this topic useful?