Resync the KV store

When a KV store member fails to transform its data with all of the write operations, then the KV store member might be stale. To resolve this issue, you must resynchronize the member.

Before downgrading Splunk Enterprise to version 7.1 or earlier, you must use the REST API to resynchronize the KV store.

Identify a stale KV store member

You can check the status of the KV store using the command line.

Log into the shell of any KV store member.
Navigate to the bin subdirectory in the Splunk Enterprise installation directory.
Type ./splunk show kvstore-status. The command line returns a summary of the KV store member you are logged into, as well as information about every other member in the KV store cluster.
Look at the replicationStatus field and identify any members that have neither "KV store captain" nor "Non-captain KV store member" as values.

Resync stale KV store members

If more than half of the members are stale, you can either recreate the cluster or resync it from one of the members. See Back up KV store for details about restoring from backup.

To resync the cluster from one of the members, use the following procedure. This procedure triggers the recreation of the KV store cluster, when all of the members of current existing KV store cluster resynchronize all data from the current member (or from the member specified in -source sourceId). The command to resync the KV store cluster can be invoked only from the node that is operating as search head cluster captain.

Determine which node is currently the search head cluster captain. Use the CLI command splunk show shcluster-status.
Log into the shell on the search head cluster captain node.
Run the command splunk resync kvstore [-source sourceId]. The source is an optional parameter, if you want to use a member other than the search head cluster captain as the source. SourceId refers to the GUID of the search head member that you want to use.
Enter your admin login credentials.
Wait for a confirmation message on the command line.
Use the splunk show kvstore-status command to verify that the cluster is resynced.

If fewer than half of the members are stale, resync each member individually.

Stop the search head that has the stale KV store member.
Run the command splunk clean kvstore --local.
Restart the search head. This triggers the initial synchronization from other KV store members.
Run the command splunk show kvstore-status to verify synchronization.

Prevent stale members by increasing operations log size

If you find yourself resyncing KV store frequently because KV store members are transitioning to stale mode frequently (daily or maybe even hourly), this means that apps or users are writing a lot of data to the KV store and the operations log is too small. Increasing the size of the operations log (or oplog) might help.

After initial synchronization, noncaptain KV store members no longer access the captain collection. Instead, new entries in the KV store collection are inserted in the operations log. The members replicate the newly inserted data from there. When the operations log reaches its allocation (1 GB by default), it overwrites the beginning of the oplog. Consider a lookup that is close to the size of the allocation. The KV store rolls the data (and overwrites starting from the beginning of the oplog) only after the majority of the members have accessed it, for example, three out of five members in a KV store cluster. But once that happens, it rolls, so a minority member (one of the two remaining members in this example) cannot access the beginning of the oplog. Then that minority member becomes stale and need to be resynced, which means reading from the entire collection (which is likely much larger than the operations log).

To decide whether to increase the operations log size, visit the Monitoring Console KV store: Instance dashboard or use the command line as follows:

Determine which search head cluster member is currently the KV store captain by running splunk show kvstore-status from any cluster member.
On the KV store captain, run splunk show kvstore-status.
Compare the oplog start and end timestamps. The start is the oldest change, and the end is the newest one. If the difference is on the order of a minute, you should probably increase the operations log size.

While keeping your operations log too small has obvious negative effects (like members becoming stale), setting an oplog size much larger than your needs might not be ideal either. The KV store takes the full log size that you allocate right away, regardless of how much data is actually being written to the log. Reading the oplog can take a fair bit of RAM, too, although it is loosely bound. Work with Splunk Support to determine an appropriate operations log size for your KV store use. The operations log is 1 GB by default.

To increase the log size:

Determine which search head cluster member is currently the KV store captain by running splunk show kvstore-status from any cluster member.
On the KV store captain, edit server.conf file, located in $SPLUNK_HOME/etc/system/local/. Increase the oplogSize setting in the [kvstore] stanza. The default value is 1000 (in units of MB).
Restart the KV store captain.
For each of the other cluster members:
1. Stop the member.
2. Run splunk clean kvstore --local.
3. Edit server.conf file, located in $SPLUNK_HOME/etc/system/local/. Increase the oplogSize setting in the [kvstore] stanza. The default value is 1000 (in units of MB).
4. Restart the member.
5. Run splunk show kvstore-status to verify synchronization.

Related answers from Splunk Community

Resync the KV store

Identify a stale KV store member

Resync stale KV store members

Prevent stale members by increasing operations log size

Comments

Resync the KV store

Was this topic useful?