Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Migrate existing data on a standalone indexer to SmartStore

You can migrate the existing data on your standalone indexer from local storage to the remote store.

This procedure describes how to migrate all the indexes on the indexer to SmartStore. You can modify the procedure if you only want to migrate some of the indexes. Indexers support a mixed environment of SmartStore and non-SmartStore indexes.

Because this process requires the indexer to upload large amounts of data, it can take a long time to complete and can have a significant impact on concurrent indexing and searching.

You cannot revert an index to non-SmartStore after you migrate it to SmartStore.

Migrate data

Perform the migration operation in two phases:

  1. Test the SmartStore configurations and remote connectivity on a test indexer.
  2. Run the migration by applying the configurations to your production indexer.

Prerequisites

1. Test the configuration on a test indexer

The purpose of testing the configuration is to:

  • test remote store connectivity.
  • validate the configuration.

Steps

  1. Ensure that you have met all prerequisites relevant to this test setup. In particular, read:
  2. Understand SmartStore security strategies and prepare to implement them as necessary during the deployment process. See the topic on security strategies for your remote storage type:
  3. Install a new Splunk Enterprise instance. For information on how to install Splunk Enterprise, read the Installation Manual.
  4. Edit indexes.conf in $SPLUNK_HOME/etc/system/local to specify the SmartStore settings for your indexes. These should be the same group of settings that you intend to use later on your production deployment.

    Using an S3 remote object store:
    This example configures SmartStore indexes, using an S3 remote object store. The SmartStore-related settings are configured at the global level, which means that all indexes are SmartStore-enabled, and they all use a single remote storage volume, named "remote_store". The example also creates one new index, "cs_index".
    [default]
    # Configure all indexes to use the SmartStore remote volume called
    # "remote_store".
    # Note: If you want only some of your indexes to use SmartStore, 
    # place this setting under the individual stanzas for each of the 
    # SmartStore indexes, rather than here.
    remotePath = volume:remote_store/$_index_name
    
    # Configure the remote volume.
    [volume:remote_store]
    storageType = remote
    
    # The volume's 'path' setting points to the remote storage location where
    # indexes reside. Each SmartStore index resides directly below the location 
    # specified by the 'path' setting.   
    path = s3://mybucket/some/path
    
    # The following S3 settings are required only if you're using the access and secret 
    # keys. They are not needed if you are using AWS IAM roles.
    
    remote.s3.access_key = <S3 access key>
    remote.s3.secret_key = <S3 secret key>
    remote.s3.endpoint = https:|http://<S3 host>
    
    # This example stanza configures a custom index, "cs_index".
    [cs_index]
    homePath = $SPLUNK_DB/cs_index/db
    # SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
    coldPath = $SPLUNK_DB/cs_index/colddb
    thawedPath = $SPLUNK_DB/cs_index/thaweddb
    

    For details on these settings, see Configure SmartStore. Also see indexes.conf.spec in the Admin Manual.

    Using a GCS remote object store:
    This example configures SmartStore indexes, using a GCS remote object store. The SmartStore-related settings are configured at the global level, which means that all indexes are SmartStore-enabled, and they all use a single remote storage volume, named "remote_store". The example also creates one new index, "cs_index".

    [default]
    # Configure all indexes to use the SmartStore remote volume called
    # "remote_store".
    # Note: If you want only some of your indexes to use SmartStore, 
    # place this setting under the individual stanzas for each of the 
    # SmartStore indexes, rather than here.
    remotePath = volume:remote_store/$_index_name
    
    # Configure the remote volume.
    [volume:remote_store]
    storageType = remote
    
    # The volume's 'path' setting points to the remote storage location where
    # indexes reside. Each SmartStore index resides directly below the location 
    # specified by the 'path' setting. 
    path = gs://mybucket/some/path
    
    # There are several ways to specify credentials. For details, see the topic, 
    # "SmartStore on GCS security strategies." One way to specify credentials 
    # is to point to a file, as shown here.
    remote.gs.credential_file = credential.json
    
    # This example stanza configures a custom index, "cs_index".
    [cs_index]
    homePath = $SPLUNK_DB/cs_index/db
    # SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
    coldPath = $SPLUNK_DB/cs_index/colddb
    thawedPath = $SPLUNK_DB/cs_index/thaweddb
    

    For details on these settings, see Configure SmartStore. Also see indexes.conf.spec in the Admin Manual.

    Using an Azure Blob remote object store:
    This example configures SmartStore indexes, using an Azure Blob remote object store. The SmartStore-related settings are configured at the global level, which means that all indexes are SmartStore-enabled, and they all use a single remote storage volume, named "remote_store". The example also creates one new index, "cs_index".

    [default]
    # Configure all indexes to use the SmartStore remote volume called
    # "remote_store".
    # Note: If you want only some of your indexes to use SmartStore, 
    # place this setting under the individual stanzas for each of the 
    # SmartStore indexes, rather than here.
    remotePath = volume:remote_store/$_index_name
    
    # Configure the remote volume.
    [volume:remote_store]
    storageType = remote
    
    # The volume's 'path' setting points to the remote storage location where
    # indexes reside. Each SmartStore index resides directly below the location 
    # specified by the 'path' setting. 
    # There are multiple ways to fully specify the location. Here, for example, the
    # Azure container is specified in its own setting, but it can also be specified as 
    # part of the "path" setting. See the indexes.conf.spec file for more information.
    remote.azure.endpoint = https://account-name.blob.core.windows.net
    remote.azure.container_name = your-container
    path = azure://example/20_39/TID_01
    
    # To authenticate with the remote storage service, you must use either hardcoded access/secret 
    # keys or Azure Active Directory with configured Managed Identity. See the topic, "SmartStore on 
    # Azure Blob security strategies."  
    
    # This example stanza configures a custom index, "cs_index".
    [cs_index]
    homePath = $SPLUNK_DB/cs_index/db
    # SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
    coldPath = $SPLUNK_DB/cs_index/colddb
    thawedPath = $SPLUNK_DB/cs_index/thaweddb
    

    For details on these settings, see Configure SmartStore. Also see indexes.conf.spec in the Admin Manual.

  5. Restart the indexer.
  6. Test the deployment:
    1. To confirm remote storage access:
      1. Place a sample text file in the remote store.
      2. On the indexer, run this command, which recursively lists any files that are present in the remote store:
        splunk cmd splunkd rfs -- ls --starts-with volume:remote_store
        

      If you see the sample file when you run the command, you have access to the remote store.

    2. Validate data transfer to the remote store:
      1. Send some data to the indexer.
      2. Wait for buckets to roll. If you don't want to wait for buckets to roll naturally, you can manually roll some buckets:
        splunk _internal call /data/indexes/<index_name>/roll-hot-buckets -auth <admin>:<password>
        
      3. Look for warm buckets being uploaded to remote storage.
    3. Validate data transfer from the remote store:

      Note: At this point, you should be able to run normal searches against this data. In the majority of cases, you will not be transferring any data from the remote storage, because the data will already be in the local cache. Therefore, to validate data transfer from the remote store, it is recommended that you first evict a bucket from the local cache.
      1. Evict a bucket from the cache, with a POST to this REST endpoint:
        services/admin/cacheman/<cid>/evict
        

        where <cid> is bid|<bucketId>|. For example: "bid|cs_index~0~7D76564B-AA17-488A-BAF2-5353EA0E9CE5|"

        Note: To get the bucketId for a bucket, run a search on your test index. For example:

        splunk search "|rest /services/admin/cacheman | search title=*cs_index*  | fields title" -auth <admin>:<password>
        
      2. Run a search that requires data from the evicted bucket.

      The indexer must now transfer the bucket from remote storage to run the search. After running the search, you can check that the bucket has reappeared in the cache.

2. Run the migration on the production indexer

In this procedure, you configure your production indexer for SmartStore. The goal of the procedure is to migrate all existing warm and cold buckets on all indexes to SmartStore. Going forward, all new warm buckets will also reside in SmartStore.

The migration process takes a while to complete. If you have a large amount of data, it can take a long while. Expect some degradation of indexing and search performance during the migration. For that reason, it is best to schedule the migration for a time when your indexer will be relatively idle.

Steps

  1. Ensure that you have met the prerequisites. In particular, read:
  2. Understand SmartStore security strategies and prepare to implement them as necessary during the deployment process. See the topic on security strategies for your remote storage type:
  3. Upgrade the indexer to the latest version of Splunk Enterprise.
  4. Stop the indexer.
  5. Edit the existing $SPLUNK_HOME/etc/system/local/indexes.conf file to make the following additions.

    Do not replace the existing indexes.conf file. You need to retain its current settings, such as its index definition settings. Instead, merge these additional settings into the existing file. Be sure to remove any other copies of these settings from the file.

    1. Specify the SmartStore index global and volume settings. Assuming that you have already tested these settings on your test instance, you can simply copy the settings over from the test instance. For example:

      Using an S3 remote object store:
      [default]
      # Configure all indexes to use the SmartStore remote volume called
      # "remote_store".
      # Note: If you want only some of your indexes to use SmartStore, 
      # place this setting under the individual stanzas for each of the 
      # SmartStore indexes, rather than here.
      remotePath = volume:remote_store/$_index_name
      
      # Configure the remote volume.
      [volume:remote_store]
      storageType = remote
      
      # The volume's 'path' setting points to the remote storage location where
      # indexes reside. Each SmartStore index resides directly below the location 
      # specified by the 'path' setting.   
      path = s3://mybucket/some/path
      
      # The following S3 settings are required only if you're using the access and secret 
      # keys. They are not needed if you are using AWS IAM roles.
      
      remote.s3.access_key = <S3 access key>
      remote.s3.secret_key = <S3 secret key>
      remote.s3.endpoint = https:|http://<S3 host>
      
      # This example stanza configures a custom index, "cs_index".
      [cs_index]
      homePath = $SPLUNK_DB/cs_index/db
      # SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
      coldPath = $SPLUNK_DB/cs_index/colddb
      thawedPath = $SPLUNK_DB/cs_index/thaweddb
      

      Using a GCS remote object store:

      [default]
      # Configure all indexes to use the SmartStore remote volume called
      # "remote_store".
      # Note: If you want only some of your indexes to use SmartStore, 
      # place this setting under the individual stanzas for each of the 
      # SmartStore indexes, rather than here.
      remotePath = volume:remote_store/$_index_name
      
      # Configure the remote volume.
      [volume:remote_store]
      storageType = remote
      
      # The volume's 'path' setting points to the remote storage location where
      # indexes reside. Each SmartStore index resides directly below the location 
      # specified by the 'path' setting. 
      path = gs://mybucket/some/path
      
      # There are several ways to specify credentials. For details, see the topic, 
      # "SmartStore on GCS security strategies." One way to specify credentials 
      # is to point to a file, as shown here.
      remote.gs.credential_file = credential.json
      
      # This example stanza configures a custom index, "cs_index".
      [cs_index]
      homePath = $SPLUNK_DB/cs_index/db
      # SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
      coldPath = $SPLUNK_DB/cs_index/colddb
      thawedPath = $SPLUNK_DB/cs_index/thaweddb
      

      Using an Azure Blob remote object store:

      [default]
      # Configure all indexes to use the SmartStore remote volume called
      # "remote_store".
      # Note: If you want only some of your indexes to use SmartStore, 
      # place this setting under the individual stanzas for each of the 
      # SmartStore indexes, rather than here.
      remotePath = volume:remote_store/$_index_name
      
      # Configure the remote volume.
      [volume:remote_store]
      storageType = remote
      
      # The volume's 'path' setting points to the remote storage location where
      # indexes reside. Each SmartStore index resides directly below the location 
      # specified by the 'path' setting. 
      # There are multiple ways to fully specify the location. Here, for example, the
      # Azure container is specified in its own setting, but it can also be specified as 
      # part of the "path" setting. See the indexes.conf.spec file for more information.
      remote.azure.endpoint = https://account-name.blob.core.windows.net
      remote.azure.container_name = your-container
      path = azure://example/20_39/TID_01
      
      # To authenticate with the remote storage service, you must use either hardcoded access/secret 
      # keys or Azure Active Directory with configured Managed Identity. See the topic, "SmartStore on 
      # Azure Blob security strategies."  
      
      # This example stanza configures a custom index, "cs_index".
      [cs_index]
      homePath = $SPLUNK_DB/cs_index/db
      # SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
      coldPath = $SPLUNK_DB/cs_index/colddb
      thawedPath = $SPLUNK_DB/cs_index/thaweddb
      
    2. Configure the data retention settings, as necessary, to ensure that the indexer will follow your desired freezing behavior, post-migration. See Configure data retention for SmartStore indexes.

      This step is extremely important, to avoid unwanted bucket freezing and possible data loss. SmartStore bucket-freezing behavior and settings are different from the non-SmartStore behavior and settings.

  6. Edit $SPLUNK_HOME/etc/system/local/server.conf to make any necessary changes to the SmartStore-related server.conf settings. In particular, configure the cache size to fit the needs of your deployment. See Configure the SmartStore cache manager.
  7. Start the indexer.
  8. Wait briefly for the indexer to begin uploading its warm and cold buckets to the remote store.

    Cold buckets use the cold path as their cache location, post-migration.

    In all respects, cold buckets are functionally equivalent to warm buckets. The cache manager manages the migrated cold buckets in the same way that it manages warm buckets. The only difference is that the cold buckets will be fetched into the cold path location, rather than the home path location.

  9. To confirm remote storage access, run this command:
    splunk cmd splunkd rfs -- ls --starts-with volume:remote_store
    

    This command recursively lists any files that are present in the remote store. It should show that the indexer is starting to upload warm buckets to the remote store. If necessary, wait a little while for the first uploads to occur.

  10. To determine that migration is complete, see Monitor the migration process.
  11. Test SmartStore functionality. At this point, you should be able to run normal searches against this data. In the majority of cases, you will not be transferring any data from the remote storage, because the data will already be in the local cache. To validate data fetching from remote storage, do the following:
    1. On the indexer, look for a fully populated bucket, containing both tsidx files and the rawdata file.
    2. Evict the bucket from the cache, with a POST to this REST endpoint:
      services/admin/cacheman/<cid>/evict
      

      where <cid> is bid|<bucketId>|. For example: "bid|cs_index~0~7D76564B-AA17-488A-BAF2-5353EA0E9CE5|"

      Note: To get the bucketId for a bucket, run a search on your test index. For example:

      splunk search "|rest /services/admin/cacheman | search title=*cs_index*  | fields title" -auth <admin>:<password>
      
    3. Run a search locally on the indexer. The search must be one that requires data from the evicted bucket.

    The indexer must now transfer the bucket from remote storage to run the search. After running the search, you can check that the bucket has reappeared in the cache.

If you need to restart the indexer during migration, upon restart, migration will continue from where it left off.

Monitor the migration process

You can use the monitoring console to monitor migration progress. See Troubleshoot with the monitoring console.

You can also run an endpoint from the indexer to determine the status of the migration:

$ splunk search "|rest /services/admin/cacheman/_metrics |fields splunk_server migration.*" -auth <admin>:<password>

The endpoint returns data on the migration, which you can use to determine how far along in the process the indexer is.

If the indexer restarts during migration, its migration information is lost, and this endpoint cannot be used to check status, although the migration will, in fact, resume. The indexer's reported status will remain "not_started" even after migration resumes.

Instead, you can run the following endpoint on the indexer:

"|rest /services/admin/cacheman |search cm:bucket.stable=0 |stats count"

The count equals the number of upload jobs remaining, where an upload job represents a single bucket to be uploaded. The count decrements to zero as migration continues.

Last modified on 15 September, 2021
PREVIOUS
Migrate existing data on an indexer cluster to SmartStore
  NEXT
Bootstrap SmartStore indexes

This documentation applies to the following versions of Splunk® Enterprise: 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.2.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters