What is distributed search?
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
- All search nodes must be 4.x
- Compatibility between 4.3 search heads and 4.2 search peers
- Bundle replication warning when running a 4.2 search head against a 4.1.x search peer
What is distributed search?
In distributed search, a Splunk instance called a search head sends search requests to a group of Splunk indexers, which perform the actual searches on their indexes. The search head then merges the results back to the user. In a typical scenario, one search head manages searches on several indexers.
These are some of the key use cases for distributed search:
- Horizonal scaling for enhanced performance. Distributed search provides horizontal scaling by distributing the indexing and searching loads across multiple indexers, making it possible to search and index large quantities of data.
- Access control. You can use distributed search to control access to indexed data. In a typical situation, some users, such as security personnel, might need access to data across the enterprise, while others need access only to data in their functional area.
- Managing geo-dispersed data. Distributed search allows local offices to access their own data, while maintaining centralized access at the corporate level. Chicago and San Francisco can look just at their local data; headquarters in New York can search its local data, as well as the data in Chicago and San Francisco.
- Maximizing data availability. Distributed search, combined with load balancing and cloning from Splunk forwarders, is a key component of high availability solutions.
The Splunk instance that does the searching is referred to as the search head. The Splunk indexers that participate in a distributed search are called search peers or indexer nodes. Together, the search head and search peers constitute the nodes in a distributed search cluster.
A search head can also index and serve as a search peer. However, in performance-based use cases, such as horizontal scaling, it is recommended that the search head only search and not index. In that case, it is referred to as a dedicated search head.
A search head by default runs its searches across all search peers in its cluster. You can limit a search to one or more search peers by specifying the
splunk_server field in your query. See "Search across one or more distributed servers" in the User manual.
You can run multiple search heads across a set of search peers. To coordinate the activity of multiple search heads (so that they share configuration settings, search artifacts, and job management), you need to enable search head pooling.
Some search scenarios
This diagram shows a simple distributed search scenario for horizontal scaling, with one search head searching across three peers:
In this diagram showing a distributed search scenario for access control, a "security" department search head has visibility into all the indexing search peers. Each search peer also has the ability to search its own data. In addition, the department A search peer has access to both its data and the data of department B:
Finally, this diagram shows the use of load balancing and distributed search to provide high availability access to data:
For more information on load balancing, see "Set up load balancing" in this manual.
For information on Splunk distributed searches and capacity planning, see "Dividing up indexing and searching" in the Installation manual.
What search heads send to search peers
When initiating a distributed search, the search head replicates and distributes its knowledge objects to its search peers. Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. The set of data that the search head distributes is called the knowledge bundle.
The indexers use the search head's knowledge bundle to execute queries on its behalf. When executing a distributed search, the indexers are ignorant of any local knowledge objects. They have access only to the objects in the search head's knowledge bundle.
The process of distributing knowledge bundles means that indexers by default receive nearly the entire contents of all the search head's apps. If an app contains large binaries that do not need to be shared with the indexers, you can reduce the size of the bundle by means of the
[replicationBlacklist] stanza in
distsearch.conf. See "Limit knowledge bundle size" in this manual.
The knowledge bundle gets distributed to the
$SPLUNK_HOME/var/run/searchpeers/ directory on each search peer. Because the search head distributes its knowledge, search scripts should not hardcode paths to resources. The knowledge bundle will reside at a different location on the search peer's file system, so hardcoded paths will not work properly.
By default, the search head replicates and distributes the knowledge bundle to each search peer. For greater efficiency, you can instead tell the search peers to mount the knowledge bundle's directory location, eliminating the need for bundle replication. When you mount a knowledge bundle, it's referred to as a mounted bundle. To learn how to mount bundles, read "Mount the knowledge bundle".
All authorization for a distributed search originates from the search head. At the time it sends the search request to its search peers, the search head also distributes the authorization information. It tells the search peers the name of the user running the search, the user's role, and the location of the distributed
authorize.conf file containing the authorization information.
Licenses for distributed deployments
Each indexer node in a distributed deployment must have access to a license pool.
Search heads performing no indexing or only summary indexing can use the forwarder license. If the search head performs any other type of indexing, it must have access to a license pool.
See "Licenses for search heads" in the Installation manual for a detailed discussion of licensing issues.
All search nodes must be 4.x
All search nodes must be running Splunk 4.x to participate in the distributed search. Distributed search is not backwards compatible with Splunk 3.x.
Compatibility between 4.3 search heads and 4.2 search peers
You can run 4.3 search heads across 4.2 search peers, but some 4.3 search-related functionality will not be available. These are the main features that require 4.3 search peers:
- The spath search command.
- Bloom filters.
- Historical backfill for real-time data from the search peers.
Also, note that 4.3-specific stats/chart/timechart functionality is less efficient when used against 4.2.x search peers because the search peers can't provide map/reduce capability for that functionality. The functionality affected includes sparklines and the
Compatibility between 4.2.5+ search heads and pre-4.2.5 search peers
Because of certain feature incompatibilities, pre-4.2.5 search peers can consume 20-30% more CPU resources when deployed with a 4.2.5 or later search head. You might see error messages such as "ConfObjectManagerDB - Ignoring invalid database setting" in
splunkd.log on the search peers.
Bundle replication warning when running a 4.2 search head against a 4.1.x search peer
Bundle replication is the process by which the search head distributes knowledge bundles, containing the search-time configurations, to its search peers. This ensures that all peers run searches using the same configurations, so that, for example, all peers use the same definition of an event type.
Starting with 4.2, bundle replication occurs asynchronously. The search head performs bundle replication in a non-blocking fashion that allows in-progress searches to continue on the search peers. When issuing searches, the search head specifies the bundle version that the peers must use to run those searches. The peers will not start using the newly replicated bundles until the search head confirms that all peers have the latest bundle version.
However, the new 4.2 search head behavior can cause pre-4.2 search peers to get out of sync and use different bundles when running their searches. If you run a 4.2 search head against 4.1.x search peers, you'll get this warning message: "Asynchronous bundle replication might cause (pre 4.2) search peers to run searches with different bundle/config versions. Results might not be correct."
Note: This issue goes away in 4.2.1 search heads. Starting with 4.2.1, the search head will revert to synchronized bundled replication if any of the search peers is pre-4.2.