Configuration updates that the cluster replicates

The cluster automatically replicates certain runtime configuration changes that a user makes on one cluster member to all the other members.

Note: The cluster replicates configuration changes to all cluster members. The cluster's replication factor applies only to search artifact replication. See Choose the replication factor for the search head cluster.

The changes that the cluster replicates

These are the main types of configuration changes that the cluster replicates:

Runtime changes or additions to knowledge objects, such as saved searches, lookup tables, and dashboards. For example, when a user in Splunk Web defines a field extraction, the cluster replicates that field extraction to all search heads in the cluster.
Runtime changes to users and roles. See Add users to the search head cluster.

Replication operates under these constraints:

The cluster only replicates changes made at runtime, through specific configuration methods.
An include list determines the specific types of changes that the cluster replicates.

Configuration methods that trigger replication

The cluster replicates changes made through these methods:

Splunk Web
The Splunk CLI
The REST API

The cluster does not replicate any configuration changes that you make manually, such as direct edits to configuration files.

For example, if a user creates a saved search in Splunk Web on a cluster member, the cluster replicates that saved search to all cluster members. However, if you, as the administrator, add a saved search by directly editing the savedsearches.conf file on one cluster member, the cluster does not replicate that saved search to the other cluster members. You must use the deployer to push that saved search to all cluster members.

The replication include list

The cluster uses an include list to determine what changes to replicate. The default list is configured through the set of conf_replication_include attributes in the default version of server.conf, located in $SPLUNK_HOME/etc/system/default.

To add or remove items from the include list, update the affected conf_replication_include settings in a server.conf file within an app on the deployer. Then push the changes from the deployer to the cluster members. See Use the deployer to distribute apps and configuration updates.

For a comprehensive list of items in the include list, consult the default version of server.conf. This is the approximate set of include list items:

alert_actions 
authentication 
authorize 
datamodels 
event_renderers 
eventtypes 
fields 
html 
literals 
lookups 
macros 
manager 
models 
multikv 
nav 
panels 
passwd
passwords
props 
quickstart 
savedsearches 
searchbnf 
searchscripts 
segmenters 
tags 
times 
transforms 
transactiontypes 
ui-prefs 
user-prefs 
views 
viewstates 
workflow_actions

The cluster replicates changes to all files underlying the include list items. In addition to configuration files themselves, the set of replicated files includes dashboard and nav XML, lookup table files, data model JSON files, and so on. The cluster also replicates permissions stored in *.meta files.

These are examples of the types of files replicated for various include list items:

# escape-hatch HTML views
conf_replication_include.html = true
# lookup table files
conf_replication_include.lookups = true
# manager XML
conf_replication_include.manager = true
# datamodel JSON files
conf_replication_include.models  = true
# nav XML
conf_replication_include.nav = true
# view XML
conf_replication_include.views = true

The cluster does not replicate user search history. This is reflected in the default server.conf file, which includes the line, conf_replication_include.history = false. Changing that value to "true" has no effect and does not cause the cluster to replicate search history.

The changes that the cluster ignores

The cluster ignores configuration changes for any items that are not on the include list. The ignored configuration changes include most system configuration files, such as indexes.conf, server.conf, and so on. For a complete list of such files, see Global configuration files in the Admin Manual. Exceptions from that list include certain settings in authorize.conf and authentication.conf.

You cannot work around this situation by simply adding system configuration files to the include list. Most settings in system configuration files require a restart to take effect, and there's no mechanism to initiate an automatic restart of cluster members following replication of such configurations.

In addition, the cluster only replicates changes that are made through Splunk Web, the Splunk CLI, or the REST API. If you directly edit a configuration file, the cluster does not replicate it. Instead, you must use the deployer to distribute the file to all cluster members.

The cluster also does not replicate newly installed or upgraded apps.

For information on how to distribute such configuration changes through the deployer, see Use the deployer to distribute apps and configuration updates.

How replication works

When a user makes a configuration change to a cluster member search head, the member saves the change to a file, or set of files, locally and also sends the change to the captain. Approximately every five seconds, each cluster member contacts the captain and pulls any changes that have arrived since the last time it pulled changes. Each cluster member then applies the changes locally.

For example, assume a user on one cluster member uses Splunk Web to create a new field extraction. Splunk Web saves the field extraction in local files on that member. The member then sends the file changes to the captain. When each cluster member next contacts the captain, it pulls the changes, along with any other recent changes, and applies them locally. Within a few seconds, all cluster members have the new field extraction.

Note: Files replicated and updated this way are semantically and functionally equivalent across the set of cluster members. The files might not be identical on all members, however. For example, depending on circumstances such as the order in which changes reach the captain, it is possible that an updated setting in props.conf could appear in different locations within the file on different members.

For details on the specifics of your cluster's configuration replication process, view the Search Head Clustering: Configuration Replication dashboard in the monitoring console. See Use the monitoring console to view search head cluster status and troubleshoot issues.

When replication happens

The purpose of replication is to keep search-related configurations in sync across all cluster members. To ensure this happens, replication occurs at various times, depending on the state of the member:

Each active cluster member contacts the captain every five seconds and pulls any changes that have arrived since the last time it pulled changes.

When a new member joins the cluster, it contacts the captain and downloads a tarball containing the current set of replicated configurations, including all changes that have been made over the life of the cluster. It applies the tarball locally.

When a member rejoins the cluster. First, follow the procedure outlined in Add a member that was previously removed from the cluster, cleaning the instance before you re-add it to the cluster. The member then contacts the captain and downloads the tarball, the same way that a new member does.

During cluster recovery. See How a recovering member resyncs with the cluster.

Replication of deployer configurations

The deployer distributes non-runtime configurations to the cluster. For some configuration types, it distributes the configurations directly to the cluster members. For other configuration types, it distributes the configurations to the captain, which then replicates the configurations to the members through the same method that it uses to distribute runtime configurations.

The deployer distributes these types of configurations to the captain:

User configurations
App local configurations

When the captain receives such configurations from the deployer, it replicates them to the members.

See:

View replication status

The monitoring console contains a wealth of information about the status of configuration replication. See Use the monitoring console to view search head cluster status and troubleshoot issues.

To see when the members last pulled a set of configuration changes from the captain, run the splunk show shcluster-status command from any member:

splunk show shcluster-status

The output from this command includes, for each member, the field last_conf_replication. It indicates the last time that the member successfully pulled an updated set of configurations from the captain.

For general information on the command, see Show cluster status.

Replication synchronization issues

Under normal circumstances, the cluster continually replicates changes across all cluster members. Each member sends any changes to the captain, and the captain quickly replicates those changes to the other members. This process ensures that the members share a common baseline of configurations.

Certain conditions can cause a member's baseline to get out-of-sync with the captain's baseline, and thus with the other members's baseline. In particular, a member can be out-of-sync when recovering from a loss of connectivity with the cluster. To remediate this situation, the member must resync with the cluster.

How a recovering member resyncs with the cluster

When a member rejoins the cluster, it must resync its baseline with the captain's baseline. Until the process is complete, the member is considered to be out-of-sync with the cluster.

To resync its baseline, the member contacts the captain to request the set of intervening replicated changes. What happens next depends on whether the member and the captain still share a common commit in their replication change histories:

If the captain and the member share a common commit, the member automatically downloads the intervening changes from the captain and applies them to its pre-offline configuration. The member also pushes its intervening changes, if any, to the captain, which replicates them to the other members. In this way, the member resyncs its baseline with the captain's baseline.

If the captain and the member do not share a common commit, they cannot properly sync without manual intervention. To update the member's configuration, you must instruct the member to download the entire configuration tarball from the captain, as described in Perform a manual resync. The tarball overwrites the member's existing set of configurations, causing it to lose any local changes that occurred during the time that it was disconnected from the cluster.

Why a recovering member might need to resync manually

If the captain and the member do not share a common commit in their set of configuration changes, they cannot sync without manual intervention.

The members, including the captain, periodically purge older configuration changes from their change history. See Set replication history purging behavior.

If the recovering member has been disconnected from the cluster for so long that the cluster has purged some intervening change history, the recovering member will not share a common commit with the captain and therefore cannot apply the full set of intervening changes. Instead, the member must undergo a manual resync.

At the end of the manual resync process, the member once again shares a common baseline with the other members. In the process, the member loses any local changes made during the time that it was disconnected from the cluster. For this reason, a manual resync is also known as a "destructive resync."

See Handle failure of a search head cluster member.

A similar situation can occur if the entire cluster stops functioning for a while, and the members operate during that time as independent search heads. See Recovery from a non-functioning cluster.

Perform a manual resync

Upon rejoining the cluster, the member attempts to apply the set of intervening replicated changes from the captain. If the set exceeds the purge limits and the member and captain no longer share a common commit, a banner message appears on the member's UI, with text similar to the following:

Error pulling configurations from the search head cluster captain; consider performing a destructive configuration resync on this search head cluster member.

The message also appears in the member's splunkd.logfile.

If this message appears, it means that the member is unable to update its configuration through the configuration change delta and must apply the entire configuration tarball. It does not do this automatically. Instead, it waits for your intervention.

You must then initiate the process of downloading and applying the tarball by running this CLI command on the member:

splunk resync shcluster-replicated-config

You do not need to restart the member after running this command.

Caution: This command causes an overwrite of the member's entire set of search-related configurations, resulting in the loss of any local changes.

Set replication history purging behavior

The purging of the configuration change history is determined by these attributes in server.conf:

conf_replication_purge.eligibile_count. Its default is 20,000 changes.
conf_replication_purge.eligibile_age. Its default is one day.

When both limits have been exceeded on a member, the member begins to purge the change history, starting with the oldest changes.

For more information on purge limit attributes, see the server.conf specification file.

Captain election and out-of-sync members

During captain election, it is important to ensure that out-of-sync members do not become captain. By default, the cluster attempts to prevent this situation from occurring.

An out-of-sync member lacks an up-to-date baseline configuration. If it becomes captain, it cannot manage the baseline for the cluster.

See Prevent out-of-sync members from becoming captain.

Troubleshoot the baseline configuration

The monitoring console provides information on the state of the baseline configuration across all cluster members. See Troubleshoot baseline consistency.

Replication quarantine for large CSV lookups

Search head cluster replication of large CSV lookup files can cause cluster out-of-sync issues. For this reason, Splunk Enterprise 9.4.0 and higher provides replication quarantine for large CSV lookups.

If a lookup exceeds 5GB, the cluster automatically puts that lookup into quarantine and suspends replication of the lookup to other cluster members. When the size of the quarantined lookup falls below the 5GB limit, the cluster automatically removes the lookup from quarantine, and replication of the lookup across the cluster resumes.

The splunkd health report indicates if any CSV lookup files are in quarantine across the cluster. You can use the health report to monitor quarantined lookups and access information to help you remediate oversized lookup files.

For detailed information on replication quarantine for large CSV lookups, see Quarantining large CSV lookup files in search head clusters in the Knowledge Manager Manual.

Related answers from Splunk Community

Configuration updates that the cluster replicates

The changes that the cluster replicates

Configuration methods that trigger replication

The replication include list

The changes that the cluster ignores

How replication works

When replication happens

Replication of deployer configurations

View replication status

Replication synchronization issues

How a recovering member resyncs with the cluster

Why a recovering member might need to resync manually

Perform a manual resync

Set replication history purging behavior

Captain election and out-of-sync members

Troubleshoot the baseline configuration

Replication quarantine for large CSV lookups

Comments

Configuration updates that the cluster replicates

Was this topic useful?