How indexing works in SmartStore

Indexers handle buckets in SmartStore indexes differently from buckets in non-SmartStore indexes.

Bucket states and SmartStore

Indexers maintain buckets for non-SmartStore indexes in these states:

Hot buckets
Warm buckets
Cold buckets

Indexers maintain buckets for SmartStore indexes in these states only:

Hot buckets
Warm buckets

The hot buckets of SmartStore indexes reside on local storage, just as with non-SmartStore indexes. Warm buckets reside on remote storage, although copies of those buckets might also reside temporarily in local storage.

The concept of cold buckets goes away, because the need to distinguish between warm and cold buckets no longer exists. With non-SmartStore indexes, the cold bucket state exists as a way to identify older buckets that can be safely moved to some type of cheaper storage, because buckets are typically searched less frequently as they age. But with SmartStore indexes, warm buckets are already on inexpensive storage, so there is no reason to move them to another type of storage as they age.

Buckets roll to frozen directly from warm.

Cold buckets can, in fact, exist in a SmartStore index, but only under limited circumstances. Specifically, if you migrate an index from non-SmartStore to SmartStore, any migrated cold buckets will use the existing cold path as their cache location, post-migration.

In all respects, cold buckets in SmartStore indexes are functionally equivalent to warm buckets. The cache manager manages the migrated cold buckets in the same way that it manages warm buckets. The only difference is that the cold buckets, when needed, will be fetched into the cold path location, rather than the home path location.

The indexing process

The indexing process is the same with SmartStore and non-SmartStore indexes.

The indexer indexes the incoming data and writes the data to hot buckets in local storage. In the case of an indexer cluster, the source peer streams the hot bucket data to target peers to fulfill the replication factor.

When buckets roll to warm, however, the SmartStore process differs from non-SmartStore.

Warm bucket handling

Starting from the point that a bucket rolls from hot to warm, the indexer handles SmartStore indexes differently from non-SmartStore indexes.

When a bucket in a SmartStore index rolls to warm, the bucket is copied to remote storage.

The rolled bucket does not immediately get removed from the indexer's local storage. Rather, it remains cached locally until it is evicted in response to the cache manager's eviction policy. Because searches tend to occur most frequently across recent data, this process helps to minimize the number of buckets that need to be retrieved from remote storage to fulfill a search request.

After the cache manager finally does remove the bucket from the indexer's local storage, the indexer still retains metadata information for that bucket in the index's .bucketManifest file. In addition, the indexer retains an empty directory for the bucket.

Note: Under certain circumstances the cache manager retains the bloomfilter file, as well as some other small files, when it otherwise evicts a bucket. That is, it deletes the bucket's rawdata journal and tsidx files but leaves the small files temporarily in place. This behavior is configurable through the cache manager recency settings. See Set cache retention periods based on recency.

In the case of an indexer cluster, when a bucket rolls to warm, the source peer uploads the bucket to remote storage. The source peer continues to retain its bucket copy in local cache until, in due course, the cache manager evicts the copy.

After successfully uploading the bucket, the source peer sends messages to the bucket's target peers, notifying them that the bucket has been uploaded to remote storage. The target peers, like the source peer, continue to retain their bucket copies in local cache until, in due course, their cache managers evict the copies.

During the upload process, if the target peers do not hear from the source peer within five minutes, they query the remote storage to learn whether the bucket was uploaded. If it wasn't uploaded by the source peer, one of the target peers then uploads it.

When a bucket copy does get evicted from a peer's local cache, the peer retains metadata for the bucket, so that the cluster has enough copies of the bucket, in the form of its metadata, to match the replication factor.

In addition to retaining metadata information for the bucket, the source peer continues to retain the primary designation for the bucket. The peer with primary designation fetches the bucket from remote storage when the bucket is needed for a search.

How SmartStore handles report and data model acceleration summaries

SmartStore-enabled indexers handle summaries in a similar way to non-SmartStore-enabled indexers. In an indexer cluster, the summary is created on the peer node that is primary for the associated bucket or buckets. The peer then uploads the summary to remote storage. When a peer needs the summary, its cache manager fetches the summary from remote storage.

Summary replication is unnecessary, and is therefore unsupported, because the uploaded summary is available to all peer nodes.

When using SmartStore, the settings summaryHomePath and tstatsHomePath must remain unset. See Settings in indexes.conf that are incompatible with SmartStore or otherwise restricted.

For details on report and data model acceleration summaries in indexer clusters, see How indexer clusters handle report and data model acceleration summaries. For general information on report and data model acceleration, see Manage report acceleration and Accelerate data models, respectively, in the Knowledge Manager Manual.

Bucket freezing and SmartStore

See Configure data retention for SmartStore indexes.

Related answers from Splunk Community

How indexing works in SmartStore

Bucket states and SmartStore

The indexing process

Warm bucket handling

How SmartStore handles report and data model acceleration summaries

Bucket freezing and SmartStore

Comments

How indexing works in SmartStore

Was this topic useful?