Configuration updates that the cluster replicates
The cluster automatically replicates certain runtime configuration changes that a user makes on one cluster member to all the other members.
Note: The cluster replicates configuration changes to all cluster members. The cluster's replication factor applies only to search artifact replication. See Choose the replication factor for the search head cluster.
The changes that the cluster replicates
These are the main types of configuration changes that the cluster replicates:
- Runtime changes or additions to knowledge objects, such as saved searches, lookup tables, and dashboards. For example, when a user in Splunk Web defines a field extraction, the cluster replicates that field extraction to all search heads in the cluster.
- Runtime changes to users and roles. See Add users to the search head cluster.
Replication operates under these constraints:
- The cluster only replicates changes made at runtime, through specific configuration methods.
- An include list determines the specific types of changes that the cluster replicates.
Configuration methods that trigger replication
The cluster replicates changes made through these methods:
- Splunk Web
- The Splunk CLI
- The REST API
The cluster does not replicate any configuration changes that you make manually, such as direct edits to configuration files.
For example, if a user creates a saved search in Splunk Web on a cluster member, the cluster replicates that saved search to all cluster members. However, if you, as the administrator, add a saved search by directly editing the savedsearches.conf
file on one cluster member, the cluster does not replicate that saved search to the other cluster members. You must use the deployer to push that saved search to all cluster members.
The replication include list
The cluster uses an include list to determine what changes to replicate. The default list is configured through the set of conf_replication_include
attributes in the default version of server.conf
, located in $SPLUNK_HOME/etc/system/default
.
To add or remove items from the include list, update the affected conf_replication_include
settings in a server.conf
file within an app on the deployer. Then push the changes from the deployer to the cluster members. See Use the deployer to distribute apps and configuration updates.
For a comprehensive list of items in the include list, consult the default version of server.conf
. This is the approximate set of include list items:
alert_actions authentication authorize datamodels event_renderers eventtypes fields html literals lookups macros manager models multikv nav panels passwd passwords props quickstart savedsearches searchbnf searchscripts segmenters tags times transforms transactiontypes ui-prefs user-prefs views viewstates workflow_actions
The cluster replicates changes to all files underlying the include list items. In addition to configuration files themselves, the set of replicated files includes dashboard and nav XML, lookup table files, data model JSON files, and so on. The cluster also replicates permissions stored in *.meta files.
These are examples of the types of files replicated for various include list items:
# escape-hatch HTML views conf_replication_include.html = true # lookup table files conf_replication_include.lookups = true # manager XML conf_replication_include.manager = true # datamodel JSON files conf_replication_include.models = true # nav XML conf_replication_include.nav = true # view XML conf_replication_include.views = true
The cluster does not replicate user search history. This is reflected in the default server.conf
file, which includes the line, conf_replication_include.history = false
. Changing that value to "true" has no effect and does not cause the cluster to replicate search history.
The changes that the cluster ignores
The cluster ignores configuration changes for any items that are not on the include list. The ignored configuration changes include most system configuration files, such as indexes.conf
, server.conf
, and so on. For a complete list of such files, see Global configuration files in the Admin Manual. Exceptions from that list include certain settings in authorize.conf
and authentication.conf
.
You cannot work around this situation by simply adding system configuration files to the include list. Most settings in system configuration files require a restart to take effect, and there's no mechanism to initiate an automatic restart of cluster members following replication of such configurations.
In addition, the cluster only replicates changes that are made through Splunk Web, the Splunk CLI, or the REST API. If you directly edit a configuration file, the cluster does not replicate it. Instead, you must use the deployer to distribute the file to all cluster members.
The cluster also does not replicate newly installed or upgraded apps.
For information on how to distribute such configuration changes through the deployer, see Use the deployer to distribute apps and configuration updates.
How replication works
When a user makes a configuration change to a cluster member search head, the member saves the change to a file, or set of files, locally and also sends the change to the captain. Approximately every five seconds, each cluster member contacts the captain and pulls any changes that have arrived since the last time it pulled changes. Each cluster member then applies the changes locally.
For example, assume a user on one cluster member uses Splunk Web to create a new field extraction. Splunk Web saves the field extraction in local files on that member. The member then sends the file changes to the captain. When each cluster member next contacts the captain, it pulls the changes, along with any other recent changes, and applies them locally. Within a few seconds, all cluster members have the new field extraction.
Note: Files replicated and updated this way are semantically and functionally equivalent across the set of cluster members. The files might not be identical on all members, however. For example, depending on circumstances such as the order in which changes reach the captain, it is possible that an updated setting in props.conf
could appear in different locations within the file on different members.
For details on the specifics of your cluster's configuration replication process, view the Search Head Clustering: Configuration Replication dashboard in the monitoring console. See Use the monitoring console to view search head cluster status and troubleshoot issues.
When replication happens
The purpose of replication is to keep search-related configurations in sync across all cluster members. To ensure this happens, replication occurs at various times, depending on the state of the member:
- Each active cluster member contacts the captain every five seconds and pulls any changes that have arrived since the last time it pulled changes.
- When a new member joins the cluster, it contacts the captain and downloads a tarball containing the current set of replicated configurations, including all changes that have been made over the life of the cluster. It applies the tarball locally.
- When a member rejoins the cluster. First, follow the procedure outlined in Add a member that was previously removed from the cluster, cleaning the instance before you re-add it to the cluster. The member then contacts the captain and downloads the tarball, the same way that a new member does.
- During cluster recovery. See How a recovering member resyncs with the cluster.
Replication of deployer configurations
The deployer distributes non-runtime configurations to the cluster. For some configuration types, it distributes the configurations directly to the cluster members. For other configuration types, it distributes the configurations to the captain, which then replicates the configurations to the members through the same method that it uses to distribute runtime configurations.
The deployer distributes these types of configurations to the captain:
- User configurations
- App local configurations
When the captain receives such configurations from the deployer, it replicates them to the members.
See:
- What exactly does the deployer send to the cluster?
- Where deployed configurations live on the cluster members
View replication status
The monitoring console contains a wealth of information about the status of configuration replication. See Use the monitoring console to view search head cluster status and troubleshoot issues.
To see when the members last pulled a set of configuration changes from the captain, run the splunk show shcluster-status
command from any member:
splunk show shcluster-status
The output from this command includes, for each member, the field last_conf_replication
. It indicates the last time that the member successfully pulled an updated set of configurations from the captain.
For general information on the command, see Show cluster status.
Replication synchronization issues
Under normal circumstances, the cluster continually replicates changes across all cluster members. Each member sends any changes to the captain, and the captain quickly replicates those changes to the other members. This process ensures that the members share a common baseline of configurations.
Certain conditions can cause a member's baseline to get out-of-sync with the captain's baseline, and thus with the other members's baseline. In particular, a member can be out-of-sync when recovering from a loss of connectivity with the cluster. To remediate this situation, the member must resync with the cluster.
How a recovering member resyncs with the cluster
When a member rejoins the cluster, it must resync its baseline with the captain's baseline. Until the process is complete, the member is considered to be out-of-sync with the cluster.
To resync its baseline, the member contacts the captain to request the set of intervening replicated changes. What happens next depends on whether the member and the captain still share a common commit in their replication change histories:
- If the captain and the member share a common commit, the member automatically downloads the intervening changes from the captain and applies them to its pre-offline configuration. The member also pushes its intervening changes, if any, to the captain, which replicates them to the other members. In this way, the member resyncs its baseline with the captain's baseline.
- If the captain and the member do not share a common commit, they cannot properly sync without manual intervention. To update the member's configuration, you must instruct the member to download the entire configuration tarball from the captain, as described in Perform a manual resync. The tarball overwrites the member's existing set of configurations, causing it to lose any local changes that occurred during the time that it was disconnected from the cluster.
Why a recovering member might need to resync manually
If the captain and the member do not share a common commit in their set of configuration changes, they cannot sync without manual intervention.
The members, including the captain, periodically purge older configuration changes from their change history. See Set replication history purging behavior.
If the recovering member has been disconnected from the cluster for so long that the cluster has purged some intervening change history, the recovering member will not share a common commit with the captain and therefore cannot apply the full set of intervening changes. Instead, the member must undergo a manual resync.
At the end of the manual resync process, the member once again shares a common baseline with the other members. In the process, the member loses any local changes made during the time that it was disconnected from the cluster. For this reason, a manual resync is also known as a "destructive resync."
See Handle failure of a search head cluster member.
A similar situation can occur if the entire cluster stops functioning for a while, and the members operate during that time as independent search heads. See Recovery from a non-functioning cluster.
Perform a manual resync
Upon rejoining the cluster, the member attempts to apply the set of intervening replicated changes from the captain. If the set exceeds the purge limits and the member and captain no longer share a common commit, a banner message appears on the member's UI, with text similar to the following:
Error pulling configurations from the search head cluster captain; consider performing a destructive configuration resync on this search head cluster member.
The message also appears in the member's splunkd.log
file.
If this message appears, it means that the member is unable to update its configuration through the configuration change delta and must apply the entire configuration tarball. It does not do this automatically. Instead, it waits for your intervention.
You must then initiate the process of downloading and applying the tarball by running this CLI command on the member:
splunk resync shcluster-replicated-config
You do not need to restart the member after running this command.
Caution: This command causes an overwrite of the member's entire set of search-related configurations, resulting in the loss of any local changes.
Set replication history purging behavior
The purging of the configuration change history is determined by these attributes in server.conf
:
conf_replication_purge.eligibile_count
. Its default is 20,000 changes.conf_replication_purge.eligibile_age
. Its default is one day.
When both limits have been exceeded on a member, the member begins to purge the change history, starting with the oldest changes.
For more information on purge limit attributes, see the server.conf specification file.
Captain election and out-of-sync members
During captain election, it is important to ensure that out-of-sync members do not become captain. By default, the cluster attempts to prevent this situation from occurring.
An out-of-sync member lacks an up-to-date baseline configuration. If it becomes captain, it cannot manage the baseline for the cluster.
See Prevent out-of-sync members from becoming captain.
Troubleshoot the baseline configuration
The monitoring console provides information on the state of the baseline configuration across all cluster members. See Troubleshoot baseline consistency.
Replication quarantine for large CSV lookups
Search head cluster replication of large CSV lookup files can cause cluster out-of-sync issues. For this reason, Splunk Enterprise 9.4.0 and higher provides replication quarantine for large CSV lookups.
If a lookup exceeds 5GB, the cluster automatically puts that lookup into quarantine and suspends replication of the lookup to other cluster members. When the size of the quarantined lookup falls below the 5GB limit, the cluster automatically removes the lookup from quarantine, and replication of the lookup across the cluster resumes.
The splunkd health report indicates if any CSV lookup files are in quarantine across the cluster. You can use the health report to monitor quarantined lookups and access information to help you remediate oversized lookup files.
For detailed information on replication quarantine for large CSV lookups, see Quarantining large CSV lookup files in search head clusters in the Knowledge Manager Manual.
How configuration changes propagate across the search head cluster | Use the deployer to distribute apps and configuration updates |
This documentation applies to the following versions of Splunk® Enterprise: 9.4.0
Feedback submitted, thanks!