How search works in SmartStore

SmartStore is optimized for certain characteristics that are common to the great majority of Splunk platform searches. Specifically, most searches have these characteristics:

They occur over near-term data. 97% of searches look back 24 hours or less.
They have spatial and temporal locality. If a search finds an event at a specific time or in a specific log, it's likely that other searches will look for events within a closely similar time range or in that log.

The cache manager favors recently created buckets and recently accessed buckets, attempting to ensure that most of the data that a search is likely to need is available in local cache. Similarly, the cache manager tends to evict buckets that are likely to participate in searches only infrequently.

Most fundamentals of search are the same for SmartStore and non-SmartStore indexes. An indexer searches buckets in response to a request from a search head. The process of searching buckets always occurs in local storage. As the first step in any search, the indexer compiles a list of buckets that it needs to search across.

To understand how search works on a SmartStore index and how it differs from search on a non-SmartStore index, it is necessary to differentiate between these bucket states:

Hot buckets
Warm buckets residing in the local cache
Warm buckets not residing in the local cache

Hot buckets always reside in local storage, whether the bucket is part of a SmartStore index or a non-SmartStore index. Therefore, searches of hot buckets work the same in both environments.

The search process differs, however, when it comes to searches of warm buckets.

When an indexer needs to search a warm bucket for a SmartStore index, it first "opens" that bucket. Later, when it has finished searching the bucket, it "closes" it. The designation of "open" and "closed" is necessary for the proper functioning of the cache eviction process. When a bucket is open, the indexer's cache manager knows that the bucket is participating in a search and is thus not eligible for eviction. Once a bucket is closed, it becomes eligible for eviction.

Once the warm bucket is open, the indexer determines whether the bucket currently resides in the local cache. The next stage of processing depends on whether the bucket is currently in the cache:

If the bucket resides in the local cache, the indexer searches it as usual. After the bucket has been searched and closed, it becomes eligible for eviction.

If the bucket does not reside in the local cache, the cache manager must first fetch a copy of it from remote storage and place it in the local cache. The indexer then searches the bucket as usual. After the bucket has been searched and closed, it becomes eligible for eviction.

Note: In some cases, the cache manager only fetches certain bucket files, not the entire bucket. See How the cache manager fetches buckets.

In the case of indexer clusters, primary bucket copies function similarly as with non-SmartStore indexes. With a hot bucket, only the primary copy of the bucket participates in a search. With a warm bucket, only the peer with the primary designation for that bucket fetches the bucket or accesses a cached copy.

Because the primary designation for a bucket rarely switches between peers, warm buckets in cache usually reside on the peer with the primary designation for that bucket. It is unusual for copies of the same warm bucket to reside in caches on multiple peers, or for the bucket's primary designation to reside on a peer without a local copy while another peer already has a local copy of that bucket.

Related answers from Splunk Community

How search works in SmartStore

Comments

Was this topic useful?