Configure data retention for SmartStore indexes

Data retention policy for SmartStore indexes on indexer clusters is configured using settings similar to those for non-SmartStore indexes. However, with SmartStore indexes, data retention is managed cluster-wide, rather than on a per-indexer basis.

Once a bucket in the index meets the criteria for freezing, the cluster removes the bucket entirely from the system, both from remote storage and from any local caches where copies of it exist.

Because SmartStore indexes do not support the cold bucket state, buckets roll to frozen directly from warm.

For general information on bucket freezing and archiving, see Set a retirement and archiving policy and Archive indexed data. Most of the material in those topics is relevant to all indexes, SmartStore or not. The differences are covered here.

Data retention policy

Use these indexes.conf settings to configure data retention policies for a SmartStore index:

maxGlobalDataSizeMB
frozenTimePeriodInSecs

Like all indexes.conf settings, the settings must be the same for all peer nodes on an index cluster. Use the configuration bundle method to distribute the settings from the master node to the peer nodes, as described in Update common peer configurations and apps.

These settings, available for non-SmartStore indexes only, have no effect on a SmartStore index:

maxTotalDataSizeMB
maxWarmDBCount

maxGlobalDataSizeMB

The maxGlobalDataSizeMB setting specifies the maximum size, in MB, for all warm buckets in a SmartStore index on a cluster. When the size of an index's set of warm buckets exceeds this value, the cluster freezes the oldest buckets, until the size again falls below this value.

The total size of an index's warm buckets approximates closely the size that the index occupies in remote storage. Note these aspects of the size calculation:

It applies on a per-index basis.
It applies across all peers in the cluster.
It includes the sum of the size of all buckets that reside on remote storage, along with any buckets that have recently rolled from hot to warm on a peer node and are awaiting upload to remote storage.
It includes only one copy of each bucket. If a duplicate copy of a bucket exists on a peer node, the size calculation does not include it. For example, if the bucket exists on both remote storage and on a peer node's local cache, the calculation ignores the copy on local cache.
It includes only the size of the buckets themselves. It does not include the size of any associated files, such as report acceleration or data model acceleration summaries.

If the total size that the warm buckets of an index occupy exceeds maxGlobalDataSizeMB, the oldest bucket in the index is frozen. For example, assume that maxGlobalDataSizeMB is set to 5000 for an index, and the index's warm buckets occupy 4800 MB. If a 750 MB hot bucket then rolls to warm, the index size now exceeds maxGlobalDataSizeMB, triggering bucket freezing. The cluster freezes the oldest buckets on the index, until the total warm bucket size falls below maxGlobalDataSizeMB.

You set this value under the stanza for the index to which it applies. To specify the same value for all indexes, you can set it at the global stanza level. In that case, however, the value still applies individually to each index. That is, if you set maxGlobalDataSizeMB at the global stanza level to 5000 MB, then indexA has its own maximum of 5000 MB, indexB has its own maximum of 5000 MB, and so on.

This setting defaults to 0, which means that it does not limit the amount of space that the warm buckets on an index can occupy.

If maxGlobalDataSizeMB is reached before frozenTimePeriodInSecs, data will be rolled to frozen before the configured time period has elapsed. This can result in unintended data loss.

frozenTimePeriodInSecs

This setting is the same setting used with non-SmartStore indexes. It specifies that buckets freeze when they reach the configured age. The default value is 188697600 seconds, or approximately 6 years.

For details on this setting, see Freeze data when it grows too old.

The process of freezing buckets

The process of freezing a SmartStore index's buckets proceeds in this fashion:

The master node runs a search every 15 minutes, by default, on all peer nodes, to identify any buckets that need to be frozen.
The search period is controlled by the remote_storage_retention_period setting in server.conf on the master.
For each bucket to be frozen, the master randomly assigns the job to one of the peers with a local copy of the bucket; that is, to one of the peers with metadata for the bucket in its index's .bucketManifest file.
For each bucket to be frozen:
1. The designated peer checks whether the bucket is present on remote storage. It continues the freezing process for that bucket only if the bucket exists on remote storage. Otherwise, the peer skips the bucket.
  The most likely reason for a warm bucket not existing on remote storage is if the bucket recently rolled from hot to warm.
2. The next steps taken by the bucket's designated peer vary, depending on whether the cluster peers are configured to archive frozen buckets before deleting them:
  - If the cluster peers are configured to archive buckets, the designated peer fetches the bucket from remote storage if it is not already in local cache. It then archives the bucket and removes its local copy.
  - If the peers are configured to remove frozen buckets without first archiving them, the designated peer does not fetch the bucket. It simply removes its local copy.
  For information on how to configure archiving, see Archive indexed data.
3. The designated peer removes the bucket from the remote store.
4. The designated peer notifies the master of the remote store deletion.
5. The master tells each other peer with a local copy of the bucket to remove its copy.

The term "local bucket copy" means all information about the bucket on the peer. When a peer removes its local bucket copy during the freezing process, it removes the bucket's metadata from the index's .bucketManifest file, as well as any copy of the bucket in its cache. If there is no copy in its cache, then it removes the empty directory for that bucket.

Thawing data and SmartStore

You cannot thaw an archived bucket into a SmartStore index, even if the bucket, prior to freezing, was part of a SmartStore index.

Instead, you can thaw the bucket into the thawed directory of a non-SmartStore index. When thawing a bucket to a non-SmartStore index, you must make sure that its bucket ID is unique within that index.

If you plan to thaw buckets frequently, you might want to create a set of non-SmartStore indexes that parallel the SmartStore indexes in name. For example, "nonS2_main".

For information on bucket IDs, see Bucket names.

For information on thawing buckets, see Thaw a 4.2+ archive.

Managing Indexers and Clusters of Indexers

Related Answers

Configure data retention for SmartStore indexes

Data retention policy

maxGlobalDataSizeMB

frozenTimePeriodInSecs

The process of freezing buckets

Thawing data and SmartStore

Comments

Configure data retention for SmartStore indexes