Migrate existing data on a standalone indexer to SmartStore

You can migrate the existing data on your standalone indexer from local storage to the remote store.

This procedure describes how to migrate all the indexes on the indexer to SmartStore. You can modify the procedure if you only want to migrate some of the indexes. Indexers support a mixed environment of SmartStore and non-SmartStore indexes.

Because this process requires the indexer to upload large amounts of data, it can take a long time to complete and can have a significant impact on concurrent indexing and searching.

You cannot revert an index to non-SmartStore after you migrate it to SmartStore.

Migrate data

Perform the migration operation in two phases:

Test the SmartStore configurations and remote connectivity on a test indexer.
Run the migration by applying the configurations to your production indexer.

Prerequisites

Read:
- SmartStore system requirements
- Configure SmartStore
- Configure the S3 remote store for SmartStore or Configure the GCS remote store for SmartStore.
- Choose the storage location for each index
- Documentation provided by the vendor of the remote storage service that you are using
Be aware of these configuration issues:
- The value of the path setting for each remote volume stanza must be unique to the indexer. You can share remote volumes only among indexes within a single standalone indexer. In other words, if indexes on one indexer use a particular remote volume, no index on any other standalone indexer or indexer cluster can use the same remote volume.
- Leave maxDataSize at its default value of "auto" (750MB) for each SmartStore index.
- The coldPath setting for each SmartStore index requires a value, even though the setting is ignored except in the case of migrated indexes.
The thawedPath setting for each SmartStore index requires a value, even though the setting has no practical purpose because you cannot thaw data to a SmartStore index. See Thawing data and SmartStore.
Reconfigure the indexer as necessary to conform with the lists of unsupported features, current restrictions, and incompatible settings:

1. Test the configuration on a test indexer

The purpose of testing the configuration is to:

test remote store connectivity.
validate the configuration.

Steps

Ensure that you have met all prerequisites relevant to this test setup. In particular, read:
- SmartStore system requirements
- Configure the S3 remote store for SmartStore or Configure the GCS remote store for SmartStore.
Understand SmartStore security strategies and prepare to implement them as necessary during the deployment process. See SmartStore on S3 security strategies or SmartStore on GCS security strategies.
Install a new Splunk Enterprise instance. For information on how to install Splunk Enterprise, read the Installation Manual.

Edit indexes.conf in $SPLUNK_HOME/etc/system/local to specify the SmartStore settings for your indexes. These should be the same group of settings that you intend to use later on your production deployment.

Using an S3 remote object store:
This example configures SmartStore indexes, using an S3 remote object store. The SmartStore-related settings are configured at the global level, which means that all indexes are SmartStore-enabled, and they all use a single remote storage volume, named "remote_store". The example also creates one new index, "cs_index".

[default]
# Configure all indexes to use the SmartStore remote volume called
# "remote_store".
# Note: If you want only some of your indexes to use SmartStore, 
# place this setting under the individual stanzas for each of the 
# SmartStore indexes, rather than here.
remotePath = volume:remote_store/$_index_name

# Configure the remote volume.
[volume:remote_store]
storageType = remote

# The volume's 'path' setting points to the remote storage location where
# indexes reside. Each SmartStore index resides directly below the location 
# specified by the 'path' setting.   
path = s3://mybucket/some/path

# The following S3 settings are required only if you're using the access and secret 
# keys. They are not needed if you are using AWS IAM roles.

remote.s3.access_key = <S3 access key>
remote.s3.secret_key = <S3 secret key>
remote.s3.endpoint = https:|http://<S3 host>

# This example stanza configures a custom index, "cs_index".
[cs_index]
homePath = $SPLUNK_DB/cs_index/db
# SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
coldPath = $SPLUNK_DB/cs_index/colddb
thawedPath = $SPLUNK_DB/cs_index/thaweddb

For details on these settings, see Configure SmartStore. Also see indexes.conf.spec in the Admin Manual.

Using a GCS remote object store:
This example configures SmartStore indexes, using a GCS remote object store. The SmartStore-related settings are configured at the global level, which means that all indexes are SmartStore-enabled, and they all use a single remote storage volume, named "remote_store". The example also creates one new index, "cs_index".

[default]
# Configure all indexes to use the SmartStore remote volume called
# "remote_store".
# Note: If you want only some of your indexes to use SmartStore, 
# place this setting under the individual stanzas for each of the 
# SmartStore indexes, rather than here.
remotePath = volume:remote_store/$_index_name

# Configure the remote volume.
[volume:remote_store]
storageType = remote

# The volume's 'path' setting points to the remote storage location where
# indexes reside. Each SmartStore index resides directly below the location 
# specified by the 'path' setting. 
path = gs://mybucket/some/path

# There are several ways to specify credentials. For details, see the topic, 
# "SmartStore on GCS security strategies." One way to specify credentials 
# is to point to a file, as shown here.
remote.gs.credential_file = credential.json

# This example stanza configures a custom index, "cs_index".
[cs_index]
homePath = $SPLUNK_DB/cs_index/db
# SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
coldPath = $SPLUNK_DB/cs_index/colddb
thawedPath = $SPLUNK_DB/cs_index/thaweddb

For details on these settings, see Configure SmartStore. Also see indexes.conf.spec in the Admin Manual.

Restart the indexer.
Test the deployment:
1. To confirm remote storage access:
  1. Place a sample text file in the remote store.
  2. On the indexer, run this command, which recursively lists any files that are present in the remote store:
```
splunk cmd splunkd rfs -- ls --starts-with volume:remote_store
```
  If you see the sample file when you run the command, you have access to the remote store.
2. Validate data transfer to the remote store:
  1. Send some data to the indexer.
  2. Wait for buckets to roll. If you don't want to wait for buckets to roll naturally, you can manually roll some buckets:
```
splunk _internal call /data/indexes/<index_name>/roll-hot-buckets -auth <admin>:<password>
```
  3. Look for warm buckets being uploaded to remote storage.
3. Validate data transfer from the remote store:
  
  Note: At this point, you should be able to run normal searches against this data. In the majority of cases, you will not be transferring any data from the remote storage, because the data will already be in the local cache. Therefore, to validate data transfer from the remote store, it is recommended that you first evict a bucket from the local cache.
  1. Evict a bucket from the cache, with a POST to this REST endpoint:
```
services/admin/cacheman/<cid>/evict
```
    where <cid> is bid|<bucketId>|. For example: "bid|cs_index~0~7D76564B-AA17-488A-BAF2-5353EA0E9CE5|"
    
    Note: To get the bucketId for a bucket, run a search on your test index. For example:
```
splunk search "|rest /services/admin/cacheman | search title=*cs_index*  | fields title" -auth <admin>:<password>
```
  2. Run a search that requires data from the evicted bucket.
  The indexer must now transfer the bucket from remote storage to run the search. After running the search, you can check that the bucket has reappeared in the cache.

2. Run the migration on the production indexer

In this procedure, you configure your production indexer for SmartStore. The goal of the procedure is to migrate all existing warm and cold buckets on all indexes to SmartStore. Going forward, all new warm buckets will also reside in SmartStore.

The migration process takes a while to complete. If you have a large amount of data, it can take a long while. Expect some degradation of indexing and search performance during the migration. For that reason, it is best to schedule the migration for a time when your indexer will be relatively idle.

Steps

Ensure that you have met the prerequisites. In particular, read:
Understand SmartStore security strategies and prepare to implement them as necessary during the deployment process. See SmartStore on S3 security strategies or SmartStore on GCS security strategies.
Upgrade the indexer to the latest version of Splunk Enterprise.
Stop the indexer.

Edit the existing $SPLUNK_HOME/etc/system/local/indexes.conf file to make the following additions.

Do not replace the existing indexes.conf file. You need to retain its current settings, such as its index definition settings. Instead, merge these additional settings into the existing file. Be sure to remove any other copies of these settings from the file.

Specify the SmartStore index global and volume settings. Assuming that you have already tested these settings on your test instance, you can simply copy the settings over from the test instance. For example:

Using an S3 remote object store:

[default]
# Configure all indexes to use the SmartStore remote volume called
# "remote_store".
# Note: If you want only some of your indexes to use SmartStore, 
# place this setting under the individual stanzas for each of the 
# SmartStore indexes, rather than here.
remotePath = volume:remote_store/$_index_name

# Configure the remote volume.
[volume:remote_store]
storageType = remote

# The volume's 'path' setting points to the remote storage location where
# indexes reside. Each SmartStore index resides directly below the location 
# specified by the 'path' setting.   
path = s3://mybucket/some/path

# The following S3 settings are required only if you're using the access and secret 
# keys. They are not needed if you are using AWS IAM roles.

remote.s3.access_key = <S3 access key>
remote.s3.secret_key = <S3 secret key>
remote.s3.endpoint = https:|http://<S3 host>

# This example stanza configures a custom index, "cs_index".
[cs_index]
homePath = $SPLUNK_DB/cs_index/db
# SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
coldPath = $SPLUNK_DB/cs_index/colddb
thawedPath = $SPLUNK_DB/cs_index/thaweddb

Using a GCS remote object store:

[default]
# Configure all indexes to use the SmartStore remote volume called
# "remote_store".
# Note: If you want only some of your indexes to use SmartStore, 
# place this setting under the individual stanzas for each of the 
# SmartStore indexes, rather than here.
remotePath = volume:remote_store/$_index_name

# Configure the remote volume.
[volume:remote_store]
storageType = remote

# The volume's 'path' setting points to the remote storage location where
# indexes reside. Each SmartStore index resides directly below the location 
# specified by the 'path' setting. 
path = gs://mybucket/some/path

# There are several ways to specify credentials. For details, see the topic, 
# "SmartStore on GCS security strategies." One way to specify credentials 
# is to point to a file, as shown here.
remote.gs.credential_file = credential.json

# This example stanza configures a custom index, "cs_index".
[cs_index]
homePath = $SPLUNK_DB/cs_index/db
# SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
coldPath = $SPLUNK_DB/cs_index/colddb
thawedPath = $SPLUNK_DB/cs_index/thaweddb

Configure the data retention settings, as necessary, to ensure that the indexer will follow your desired freezing behavior, post-migration. See Configure data retention for SmartStore indexes.
This step is extremely important, to avoid unwanted bucket freezing and possible data loss. SmartStore bucket-freezing behavior and settings are different from the non-SmartStore behavior and settings.

Edit $SPLUNK_HOME/etc/system/local/server.conf to make any necessary changes to the SmartStore-related server.conf settings. In particular, configure the cache size to fit the needs of your deployment. See Configure the SmartStore cache manager.
Start the indexer.
Wait briefly for the indexer to begin uploading its warm and cold buckets to the remote store.
Cold buckets use the cold path as their cache location, post-migration.

In all respects, cold buckets are functionally equivalent to warm buckets. The cache manager manages the migrated cold buckets in the same way that it manages warm buckets. The only difference is that the cold buckets will be fetched into the cold path location, rather than the home path location.
To confirm remote storage access, run this command:
```
splunk cmd splunkd rfs -- ls --starts-with volume:remote_store
```
This command recursively lists any files that are present in the remote store. It should show that the indexer is starting to upload warm buckets to the remote store. If necessary, wait a little while for the first uploads to occur.
To determine that migration is complete, see Monitor the migration process.
Test SmartStore functionality. At this point, you should be able to run normal searches against this data. In the majority of cases, you will not be transferring any data from the remote storage, because the data will already be in the local cache. To validate data fetching from remote storage, do the following:
1. On the indexer, look for a fully populated bucket, containing both tsidx files and the rawdata file.
2. Evict the bucket from the cache, with a POST to this REST endpoint:
```
services/admin/cacheman/<cid>/evict
```
  where <cid> is bid|<bucketId>|. For example: "bid|cs_index~0~7D76564B-AA17-488A-BAF2-5353EA0E9CE5|"
  
  Note: To get the bucketId for a bucket, run a search on your test index. For example:
```
splunk search "|rest /services/admin/cacheman | search title=*cs_index*  | fields title" -auth <admin>:<password>
```
3. Run a search locally on the indexer. The search must be one that requires data from the evicted bucket.
The indexer must now transfer the bucket from remote storage to run the search. After running the search, you can check that the bucket has reappeared in the cache.

If you need to restart the indexer during migration, upon restart, migration will continue from where it left off.

Monitor the migration process

You can use the monitoring console to monitor migration progress. See Troubleshoot with the monitoring console.

You can also run an endpoint from the indexer to determine the status of the migration:

$ splunk search "|rest /services/admin/cacheman/_metrics |fields splunk_server migration.*" -auth <admin>:<password>

The endpoint returns data on the migration, which you can use to determine how far along in the process the indexer is.

If the indexer restarts during migration, its migration information is lost, and this endpoint cannot be used to check status, although the migration will, in fact, resume. The indexer's reported status will remain "not_started" even after migration resumes.

Instead, you can run the following endpoint on the indexer:

"|rest /services/admin/cacheman |search cm:bucket.stable=0 |stats count"

The count equals the number of upload jobs remaining, where an upload job represents a single bucket to be uploaded. The count decrements to zero as migration continues.

Related answers from Splunk Community

Migrate existing data on a standalone indexer to SmartStore

Migrate data

Prerequisites

1. Test the configuration on a test indexer

2. Run the migration on the production indexer

Monitor the migration process

Comments

Migrate existing data on a standalone indexer to SmartStore

Was this topic useful?