Configure data retention for SmartStore indexes

Data retention policy for SmartStore indexes is configured with settings similar to those for non-SmartStore indexes.

On indexer clusters, data retention for SmartStore indexes is managed cluster-wide. Once a bucket in the index meets the criteria for freezing, the cluster removes the bucket entirely from the system, both from remote storage and from any local caches where copies of it exist.

Because SmartStore indexes do not support the cold bucket state, except for the case of migrated buckets, buckets roll from warm directly to frozen.

For general information on bucket freezing and archiving, see Set a retirement and archiving policy and Archive indexed data. Most of the material in those topics is relevant to all indexes, SmartStore or not. The differences are covered here.

Data retention policy

Use these indexes.conf settings to configure data retention policies for a SmartStore index:

maxGlobalDataSizeMB
maxGlobalRawDataSizeMB
frozenTimePeriodInSecs

Bucket freezing occurs whenever any of these limits is reached. This can result in unexpected data loss if the settings are improperly configured. For example, if maxGlobalDataSizeMB is reached before frozenTimePeriodInSecs, buckets will be rolled to frozen before the configured time period has elapsed. If you require data to remain in your system for a specific amount of time, ensure that the other settings won't pre-empt your frozenTimePeriodInSecs setting.

When configuring data retention limits, be sure to accommodate any critical retention criteria.

In the case of an indexer cluster, data retention settings, like all indexes.conf settings, must be the same for all peer nodes. Use the configuration bundle method to distribute the settings from the manager node to the peer nodes, as described in Update common peer configurations and apps.

These settings, available for non-SmartStore indexes only, have no effect on a SmartStore index:

maxTotalDataSizeMB
maxWarmDBCount

maxGlobalDataSizeMB

The maxGlobalDataSizeMB setting specifies the maximum size, in MB, for all warm and cold buckets in a SmartStore index. When the size of an index's set of warm and cold buckets exceeds this value, the system freezes the oldest buckets, until the size again falls below this value.

The total size of an index's warm and cold buckets approximates closely the size that the index occupies in remote storage. Note these aspects of the size calculation:

It applies on a per-index basis.
In the case of an indexer cluster, it applies across all peers in the cluster.
In the case of a standalone indexer, it applies only to that indexer. Standalone indexers, by their nature, each manage their own data retention.
It includes the sum of the size of all buckets that reside on remote storage, along with any buckets that have recently rolled from hot to warm and are awaiting upload to remote storage.
It includes only one copy of each bucket. If a duplicate copy of a bucket exists on an indexer, the size calculation does not include it. For example, if the bucket exists on both remote storage and on an indexer's local cache, the calculation ignores the copy on local cache.
It includes only the size of the buckets themselves. It does not include the size of any associated files, such as report acceleration or data model acceleration summaries.

If the total size of an index's warm and cold buckets exceeds maxGlobalDataSizeMB, the oldest bucket in the index is frozen. For example, assume that maxGlobalDataSizeMB is set to 5000 for an index, and the index's warm and cold buckets occupy 4800MB. If a 750MB hot bucket then rolls to warm, the index size now exceeds maxGlobalDataSizeMB, triggering bucket freezing. The cluster freezes the oldest buckets on the index, until the total warm and cold bucket size falls below maxGlobalDataSizeMB.

You set this value under the stanza for the index to which it applies. To specify the same value for all indexes, you can set it at the global stanza level. In that case, however, the value still applies individually to each index. That is, if you set maxGlobalDataSizeMB at the global stanza level to 5000MB, then indexA has its own maximum of 5000MB, indexB has its own maximum of 5000MB, and so on.

This setting defaults to 0, which means that it does not limit the amount of space that the warm and cold buckets on an index can occupy.

maxGlobalRawDataSizeMB

The maxGlobalRawDataSizeMB setting specifies the maximum size, in MB, of raw data residing in all warm buckets in a SmartStore index. When the size of an index's raw data in the set of warm buckets exceeds this value, the system freezes the oldest buckets, until the size again falls below this value.

Raw data size is the uncompressed size of data, as measured at the time that the indexers ingested it. It is not the size of the compressed rawdata journal files that reside in the buckets.

This setting is useful in cases where you want to retain data based on the amount of raw data originally ingested, rather than based on the age of the data or the size that the data occupies in storage. For example, you might have a requirement that an index retain the last 10TB of ingested raw data. Since the indexing process compresses and indexes ingested data, the size of the indexed data stored on disk could vary markedly from its raw size.

In the case of an indexer cluster, maxGlobalRawDataSizeMB is calculated as the total amount of raw data ingested for the index and currently residing in warm or cold buckets, across all peer nodes. Only the amount of ingested data counts in the calculation, so the amount of raw data is not increased by data replication. For example, in a three peer cluster with a replication factor of 3, if peer1 ingests 300MB of raw data, peer2 ingests 400MB, and peer3 ingests 500MB, the total amount of raw data residing on the cluster is 1200MB, not 3600MB.

Note these key aspects of the size calculation:

It uses the raw data size, that is, the uncompressed size of the data at the time that the indexer ingested it.
It applies on a per-index basis.
It applies to warm and cold buckets only. Hot bucket data is ignored
In the case of an indexer cluster, it applies across all peers in the cluster.

If the total raw ingested size of the data residing in an index's warm buckets exceeds maxGlobalRawDataSizeMB, the oldest bucket in the index is frozen. For example, assume that maxGlobalRawDataSizeMB is set to 5000 (MB) for an index, and the index's warm buckets contain 4800MB of raw data. If a hot bucket containing 500MB of raw data then rolls to warm, the amount of raw data in the index now exceeds maxGlobalRawDataSizeMB, triggering bucket freezing. The system freezes the oldest buckets on the index, until the total amount of raw data residing in warm buckets falls below maxGlobalRawDataSizeMB.

You set this value under the stanza for the index to which it applies. To specify the same value for all indexes, you can set it at the global stanza level. In that case, however, the value still applies individually to each index. That is, if you set maxGlobalRawDataSizeMB at the global stanza level to 5000MB, then indexA has its own maximum of 5000MB, indexB has its own maximum of 5000MB, and so on.

This setting defaults to 0, which means that it does not limit the amount of raw data in an index.

The maxGlobalRawDataSizeMBsetting is available only for indexers running version 7.3.0 or later.

frozenTimePeriodInSecs

This setting is the same setting used with non-SmartStore indexes. It specifies that buckets freeze when they reach the configured age. The default value is 188697600 seconds, or approximately 6 years.

For details on this setting, see Freeze data when it grows too old.

The process of freezing buckets on indexer clusters

The process of freezing a SmartStore index's buckets on an indexer cluster proceeds in this fashion:

The manager node runs a search every 15 minutes, by default, on all peer nodes, to identify any buckets that need to be frozen.
The search period is controlled by the remote_storage_retention_period setting in server.conf on the manager.
For each bucket to be frozen, the manager randomly assigns the job to one of the peers with a local copy of the bucket; that is, to one of the peers with metadata for the bucket in its index's .bucketManifest file.
For each bucket to be frozen:
1. The designated peer checks whether the bucket is present on remote storage. It continues the freezing process for that bucket only if the bucket exists on remote storage. Otherwise, the peer skips the bucket.
  The most likely reason for a warm bucket not existing on remote storage is if the bucket recently rolled from hot to warm.
2. The next steps taken by the bucket's designated peer vary, depending on whether the cluster peers are configured to archive frozen buckets before deleting them:
  - If the cluster peers are configured to archive buckets, the designated peer fetches the bucket from remote storage if it is not already in local cache. It then archives the bucket and removes its local copy.
  - If the peers are configured to remove frozen buckets without first archiving them, the designated peer does not fetch the bucket. It simply removes its local copy.
  For information on how to configure archiving, see Archive indexed data.
3. The designated peer removes the bucket from the remote store.
4. The designated peer notifies the manager of the remote store deletion.
5. The manager tells each other peer with a local copy of the bucket to remove its copy.

In this context, he term "local bucket copy" means all information about the bucket on the peer, optionally including the actual bucket copy. When a peer removes its local bucket copy during the freezing process, it removes the bucket's metadata from the index's .bucketManifest file, as well as any copy of the bucket in its cache. If there is no copy in its cache, then it removes the empty directory for that bucket.

The process of freezing buckets on standalone indexers

The process of freezing a SmartStore index's buckets on a standalone indexer proceeds in this fashion:

The indexer checks every 60 seconds, by default, to identify any buckets that need to be frozen.
The service period is controlled by the rotatePeriodInSecs setting in indexes.conf.
For each bucket to be frozen:
1. The indexer checks whether the bucket is present on remote storage. It continues the freezing process for that bucket only if the bucket exists on remote storage. Otherwise, it skips the bucket.
  The most likely reason for a warm bucket not existing on remote storage is if the bucket recently rolled from hot to warm.
2. The next steps taken by the indexer vary, depending on whether the indexer is configured to archive frozen buckets before deleting them:
  - If the indexer is configured to archive buckets, the indexer fetches the bucket from remote storage if it is not already in local cache. It then archives the bucket and removes its local copy.
  - If the indexer is configured to remove frozen buckets without first archiving them, the indexer does not fetch the bucket. It simply removes its local copy.
  For information on how to configure archiving, see Archive indexed data.
3. The indexer removes the bucket from the remote store.

In this context, the term "local bucket copy" means all information about the bucket on the indexer, optionally including the actual bucket copy. When an indexer removes its local bucket copy during the freezing process, it removes the bucket's metadata from the index's .bucketManifest file, as well as any copy of the bucket in its cache. If there is no copy in its cache, then it removes the empty directory for that bucket.

Thawing data and SmartStore

You cannot thaw an archived bucket into a SmartStore index, even if the bucket, prior to freezing, was part of a SmartStore index.

Instead, you can thaw the bucket into the thawed directory of a non-SmartStore index. When thawing a bucket to a non-SmartStore index, you must make sure that its bucket ID is unique within that index.

If you plan to thaw buckets frequently, you might want to create a set of non-SmartStore indexes that parallel the SmartStore indexes in name. For example, "nonS2_main".

For information on bucket IDs, see Bucket names.

For information on thawing buckets, see Thaw a 4.2+ archive.

Related answers from Splunk Community

Configure data retention for SmartStore indexes

Data retention policy

maxGlobalDataSizeMB

maxGlobalRawDataSizeMB

frozenTimePeriodInSecs

The process of freezing buckets on indexer clusters

The process of freezing buckets on standalone indexers

Thawing data and SmartStore

Comments

Configure data retention for SmartStore indexes

Was this topic useful?