Delays due to coordination between cluster members
Coordination between the captain and other cluster members sometimes creates latency of up to 1.5 minutes. For example, when you save a search job, Splunk Web might not update the job's state for a short period of time. Similarly, it can take a minute or more for the captain to orchestrate the complete deletion of jobs.
In addition, when an event triggers the election of a new captain, there will be an interval of one to two minutes while the election completes. During this time, search heads can service only ad hoc job requests.
Limit to number of active alerts
The search head cluster can handle approximately 5000 active, unexpired alerts. To stay within this boundary, use alert throttling or limit alert retention time. See the Alerting Manual.
Site failure can prevent captain election
If the cluster is deployed across two sites and the site with a majority of members goes down or is otherwise inaccessible, the cluster cannot elect a new captain.
Therefore, in the case of a two-site cluster, it is vital that you put the majority of your members on the site that you consider primary.
Why the majority of members must be on the primary site
If you are deploying the cluster across two sites, put a majority of the cluster members on the site that you consider primary. This ensures that the cluster can continue to function as long as that site is running.
Under certain circumstances, such as when a member leaves or joins the cluster, the cluster holds an election in which it chooses a new captain. The success of this election process requires that a majority of all cluster members agree on the new captain. Therefore, the proper functioning of the cluster requires that a majority of members be running at all times. See "Captain election."
In the case of a cluster running across two sites, if one site fails, the remaining site can elect a new captain only if it holds a majority of members. Similarly, if there is a network disruption between the sites, only the site with a majority can elect a new captain. By assigning the majority of members to your primary site, you maximize its availability.
What happens when the site with the majority fails
If the site with a majority of members fails or is otherwise inaccessible, the remaining members on the minority site cannot elect a new captain. Captain election requires the vote of a majority of members, but only a minority of members are running. The cluster does not function.
To remediate, you must either resurrect the downed majority site or bring down the remaining cluster members and reconfigure the cluster.
Consequences of a non-functioning cluster
If the cluster cannot elect a captain, its members will continue to function as independent search heads. However, they will only be able to service ad hoc searches. Scheduled searches and alerts will not run, because, in a cluster, the scheduling function is relegated to the captain. In addition, configurations and search artifacts will not be replicated during this time.
What happens when there is a network interruption between sites
If the network between sites fails, the members on each site will attempt to elect a captain. However, only a site that holds a majority of the total members will succeed. That site can continue to function as the cluster indefinitely. Once the other sites reconnect, their members will rejoin the cluster.
Clusters with more than two sites
If there are more than two sites, the cluster can function only if a majority of members across the sites are still able to communicate and elect a captain. For example, if you have site1 with five members, site2 with eight members, and site3 with four members, the cluster can survive the loss of any one site, because you will still have a majority of members (at least nine) among the remaining two sites. However, if you have site1 with six members, site2 with two members, and site3 with three members, the cluster can only function as long as site1 remains alive, because you need at least six members to constitute a majority.
Overview of search head pooling
This documentation applies to the following versions of Splunk® Enterprise: 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15