Job scheduling on search head clusters
Job scheduling on search head clusters requires an understanding of two aspects of job allocatoin:
- How the captain allocates saved search jobs to cluster members
- How the cluster enforces quotas for concurrent searches
How the captain allocates saved searches
The captain schedules saved search jobs, allocating them to the various cluster members according to load-based heuristics. Essentially, it attempts to assign each job to the member currently with the least search load.
The captain can allocate saved search jobs to itself. It does not, however, allocate scheduled real time searches to itself.
If a job fails on one member, the captain reassigns it to a different member. The captain reassigns the job only once, as multiple failures are unlikely to be resolvable without intervention on the part of the user. For example, a job with a bad search string will fail no matter how many times the cluster attempts to run it.
You can designate a member as "ad hoc only." In that case, the captain will not schedule jobs on it. You can also designate the captain functionality as "ad hoc only." The current captain then will never schedule jobs on itself. Since the role of captain can move among members, this setting ensures that captain functionality does not compete with scheduled searches. See Configure a cluster member to run ad hoc searches only.
Note: The captain does not have insight into the actual CPU load on each member's machine. It assumes that all machines in the cluster are provisioned homogeneously, with the same number and type of cores, and so forth.
For details of your cluster's scheduler delegation process, view the Search Head Clustering: Scheduler Delegation dashboard in the monitoring console. See Use the monitoring console to view search head cluster status.
How the cluster handles concurrent search quotas
The search head cluster, like non-clustered search heads, enforces several types of concurrent search limits:
- Scheduler concurrency limit. This limit is the maximum number of searches that the scheduler can run concurrently, across all members of the cluster. In search head clustering, a centralized scheduler on the captain handles scheduling for all cluster members. See How the cluster determines the scheduler concurrency limit.
- User/role search quotas. These quotas determine the maximum number of concurrent historical searches (combined scheduled and ad hoc) allowable for a specific user/role. These quotas are configured with
srchJobsQuotaand related settings in
authorize.conf. See the authorize.conf spec file for details on all the settings that control these quotas.
- Overall search quota. This quota determines the maximum number of historical searches (combined scheduled and ad hoc) that the cluster can run concurrently. This quota is configured with
max_searches_per_cpuand related settings in
limits.conf. See the limits.conf spec file for details on all the settings that control these quotas. Also, see the discussion of
max_hist_searchesin How the cluster determines the scheduler concurrency limit.
The search head cluster enforces the scheduler concurrency limit on a cluster-wide basis. It enforces the user/role quotas and overall search quota on either a cluster-wide or a member-by-member basis.
How the cluster determines the scheduler concurrency limit
The scheduler for the entire cluster runs on the captain. To determine the scheduler concurrency limit, the captain takes the
max_hist_searches value and multiplies that value by the number of members able to run scheduled searches. It then multiplies the result by
max_searches_perc (default 50%) to determine the scheduler limit:
max_hist_searches * (# of members available to run scheduled searches) * max_searches_perc
max_hist_searchesvalue determines the maximum number of historical searches that can run concurrently on a search head. Historical searches include all searches that run against historical data; that is, both non-realtime scheduled searches and ad hoc searches.
max_hist_searchesis not a setting but rather a calculated value:
max_hist_searches = max_searches_per_cpu x number_of_cpus + base_max_searches
- For more information on the method used to calculate
max_hist_searches, see the entry for
max_searches_per_cpuin the limits.conf spec file,
- The captain uses the number of CPUs on its own machine to calculate
max_hist_searches, employing the assumption that all cluster members are provisioned identically.
- Members available to run scheduled searches are members that are both:
- in the "Up" state
- not configured as "ad hoc only"
limits.confembodies an assumption regarding the ratio of scheduled searches (which the scheduler handles) versus ad hoc searches. The default value of 50%, for example, asumes a 1:1 ratio between the two types of historical searches.
For example, assume a seven-member cluster in which all seven members are "Up" but two members are configured as "ad hoc only," resulting in five available members. Assume also that the
max_hist_searches value is 10 and that
max_searches_perc is set to its default value of 50%. The calculation to derive the maximum number of scheduled searches that can run concurrently on the cluster is:
10(maximum historical searches) * 5(members available to run scheduled searches) * 50%
The result is that the captain can schedule up to 25 concurrent scheduled searches across the cluster.
For information on determining the state of a member, see Show cluster status.
For information on "ad hoc only" members, see Configure a cluster member to run ad hoc searches only.
For information on the scheduler concurrency limit, outside the context of a search head cluster specifically, see How the report scheduler determines the concurrent scheduled report limit.
How the cluster enforces quotas
Although each quota type (user/role or overall) has its own attribute for setting its enforcement behavior, the behavior itself works the same for each quota type.
If you configure the cluster to enforce quotas on a member-by-member basis, each individual member uses the base quota settings to determine whether to allow a search to run. No cluster-wide enforcement of searches occurs.
If you configure the cluster to enforce quotas on a cluster-wide basis, the captain determines the search quota by multiplying the base concurrent search quota by the total number of cluster members in the "Up" state. This number includes all "Up" members that are capable of running searches, including those configured as "ad hoc only."
The captain uses the computed cluster-wide quota to determine whether to allow a scheduled search to run. No member-specific enforcement of searches occurs, except in the case of ad hoc searches, as described in Cluster-wide search quotas and ad hoc searches.
In the case of user/role quotas, the captain multiplies the base concurrent search quota allocated to a user/role by the number of "Up" cluster members to determine the cluster-wide quota for that user/role. For example, in a seven-member cluster, it multiplies the value of
srchJobsQuota by 7 to determine the number of concurrent historical searches for the user/role.
Note that a search running on a member will also fail if
srchDiskQuota is exceeded for the user on that member.
Similarly, in the case of overall search quotas, the captain multiples the base overall search quota by the number of "Up" members to determine the cluster-wide quota for all searches.
When determining the number of cluster-wide concurrent searches, the captain includes both scheduled searches and ad hoc searches running on all members. The captain stops a scheduled search from running if it will cause the number of concurrent searches to exceed the cluster-wide search quota. It does not control the initiation of ad hoc searches, however. For more details on this process, see Cluster-wide search quotas and ad hoc searches.
For details of your cluster's search concurrency status, view the Search Head Clustering: Status and Configuration dashboard in the monitoring console. See Use the monitoring console to view search head cluster status.
How the captain determines whether to allow a scheduled search to run
When determining whether to allow a historical scheduled search to run, the scheduler on the captain follows this order:
Does the search exceed the scheduler concurrency limit?
If so, the search does not run.
In the case of cluster-wide enforcement only, does the search exceed the cluster-wide user/role search quota for the user/role running the search?
If so, the search does not run.
In the case of cluster-wide enforcement only, does the search exceed the overall search quota?
If so, the search does not run.
Note: The captain only controls the running of scheduled searches. It has no control over whether ad hoc searches run. Instead, each individual member decides for its own ad hoc searches, based on the individual member search limits. However, the members feed information on their ad hoc searches to the captain, which includes those searches when comparing concurrent searches against the quotas. see Cluster-wide search quotas and ad hoc searches.
Cluster-wide search quotas and ad hoc searches
Each search quota spans both scheduled searches and ad hoc searches. Because of the way that the captain learns about ad hoc searches, the number of cluster-wide concurrent searches can sometimes exceed the search quota. This is true for both types of search quotas, user/role quotas and overall quotas.
If, for example, you configure the cluster to enforce the overall search quota on a cluster-wide basis, the captain handles quota enforcement by comparing the total number of searches running across all members to the search quota.
So, to enforce quotas, the captain must know two values:
- The overall search quota
- The number of concurrent searches running across all members
The captain calculates the overall search quota by multiplying the base concurrent search quota by the number of "Up" cluster members, as described in How the cluster enforces quotas.
The captain calculates the number of concurrent searches running across all members by adding together the total number of scheduled and ad hoc searches in progress:
- For scheduled searches, it always knows the number of concurrent scheduled searches, because it controls the search scheduling operation.
- For ad hoc searches, it depends on reporting from the individual members. When a new ad hoc search starts, the member running the search informs the captain, and the captain adds that search to the total concurrent search number.
When the number of all searches, both scheduled and ad hoc, reaches the quota, the captain ceases initiating new scheduled searches until the number of searches falls below the quota.
A user always initiates an ad hoc search directly on a member. The member uses its own set of search quotas, without consideration or knowledge of the cluster-wide search quota, to decide whether to allow the search. The member then reports the new ad hoc search to the captain. If the captain has already reached the cluster-wide quota, then a new ad hoc search causes the cluster to temporarily exceed the quota. This results in the captain reporting more searches than the number allowable by the search quota.
Configure quota enforcement behavior
You configure user/role-based quota enforcement behavior separately from overall search quota enforcement behavior.
Configure user/role-based quota enforcement behavior
Configure user/role-based quota enforcement behavior with the
shc_role_quota_enforcement setting, under the
[scheduler] stanza in limits.conf.
To enforce these quotas on a member-by-member basis, leave this attribute set to false, its default value.
To enforce these quotas on a cluster-wide basis instead, set the attribute to true:
For details of this setting, see limits.conf.
Configure overall search quota enforcement behavior
Configure overall search quota enforcement behavior with the
shc_syswide_quota_enforcement setting, under the
[scheduler] stanza in limits.conf.
To enforce this quota on a member-by-member basis, leave this attribute set to false, its default value.
To enforce this quota on a cluster-wide basis instead, set the attribute to true:
For details of this setting, see limits.conf.
Change to the default behavior With 6.5, there was a change in the default behavior for enforcing user/role-based concurrent search quotas.
Deciding which scope of quota enforcement to use
Each approach has its advantages.
The case for cluster-wide enforcement
The captain does not take into account the search user when it assigns a search to a member. Combined with member-enforced quotas, this could result in unwanted and unexpected behavior.
One consequence of the member-by-member behavior is this: If the captain happened to assign most of a particular user's searches to one cluster member, that member could quickly reach the quota for that user, even though other members had not yet reached their limit for the user. This could also occur in the case of role-based quotas.
For example, say you have a three-member cluster, and the search concurrency quota for role X is set to 4. At some point, two members are running four searches for X and one is running only two. The scheduler then dispatches a new search for X that lands on a member that is already running four searches. What happens next depends on whether the cluster is enforcing quotas on a member-by-member or cluster-wide basis:
- With member-by-member enforcement, the member sees that it has already reached the member-specific concurrency limit of 4 for role X. Therefore, it does not run the search. However, the consequences are usually minimal because, if one member cannot run a search, the captain retries the job on a different member. You can configure the number of retries with the
- With cluster-wide enforcement, the member sees that the cluster-wide concurrency limit for role X is 12 (4 * 3 members), but that, currently, there are only 10 (4 + 4 + 2) searches running for role X. Therefore, it runs the search.
The case for member-by-member enforcement
While cluster-wide enforcement has the advantage of allowing full utilization of the search concurrency quotas across the set of cluster members, it has the potential to cause miscalculations that result in oversubscribing or undersubscribing searches on the cluster.
When the captain enforces the cluster-wide search concurrency quotas, it includes both scheduled and ad hoc searches in its calculations.
This can lead to miscalculations due to network latency issues, because the captain must rely on each member to inform it of any ad hoc searches that it is running. If members are slow in responding to the captain, the captain might not be aware of some ad hoc searches, and thus oversubscribe the cluster.
Similarly, latency can cause members to be slow in informing the captain of completion of searches, scheduled or ad hoc, causing the captain to undersubscribe the cluster.
For these reasons, you might find that your needs are better met by using the member-by-member enforcement method.
Configure a cluster member to run ad hoc searches only
This documentation applies to the following versions of Splunk® Enterprise: 8.1.0, 8.1.1, 8.1.2