About distributed search
In distributed search, a Splunk instance called a search head sends search requests to a group of Splunk indexers, which perform the actual searches on their indexes. The search head then merges the results back to the user. In a typical scenario, one search head manages searches on several indexers.
These are some of the key use cases for distributed search:
- Horizontal scaling for enhanced performance. Distributed search facilitates horizontal scaling by providing a way to distribute the indexing and searching loads across multiple Splunk components, making it possible to search and index large quantities of data.
- Access control. You can use distributed search to control access to indexed data. In a typical situation, some users, such as security personnel, might need access to data across the enterprise, while others need access only to data in their functional area.
- Managing geo-dispersed data. Distributed search allows local offices to access their own data, while maintaining centralized access at the corporate level. Chicago and San Francisco can look just at their local data; headquarters in New York can search its local data, as well as the data in Chicago and San Francisco.
A search head can also index and serve as a search peer. However, in performance-based use cases, such as horizontal scaling, it is recommended that the search head only search and not index. In that case, it is referred to as a dedicated search head.
A search head by default runs its searches across all its search peers. You can limit a search to one or more search peers by specifying the
splunk_server field in your query. See "Search across one or more distributed servers" in the Search manual.
You can run multiple search heads across a set of search peers. To coordinate the activity of multiple search heads (so that they share configuration settings, search artifacts, and job management), you need to enable search head pooling.
Some search scenarios
This diagram shows a simple distributed search scenario for horizontal scaling, with one search head searching across three peers:
In this diagram showing a distributed search scenario for access control, a "security" department search head has visibility into all the indexing search peers. Each search peer also has the ability to search its own data. In addition, the department A search peer has access to both its data and the data of department B:
Finally, this diagram shows load balancing with distributed search. There's a dedicated search head and a search head on each indexer. All the search heads can search across the entire set of indexers:
For more information on load balancing, see "Set up load balancing" in this manual.
For information on Splunk distributed searches and capacity planning, see "Dividing up indexing and searching" in the Installation manual.
Search heads and clusters
In index replication, clusters use search heads to search across the set of indexers, or peer nodes. You deploy and configure search heads very differently when they are part of a cluster. To learn more about search heads and clusters, read "Configure the search head" in the Managing Indexers and Clusters Manual.
What search heads send to search peers
When initiating a distributed search, the search head replicates and distributes its knowledge objects to its search peers. Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. The set of data that the search head distributes is called the knowledge bundle.
The indexers use the search head's knowledge bundle to execute queries on its behalf. When executing a distributed search, the indexers are ignorant of any local knowledge objects. They have access only to the objects in the search head's knowledge bundle.
The process of distributing knowledge bundles means that indexers by default receive nearly the entire contents of all the search head's apps. If an app contains large binaries that do not need to be shared with the indexers, you can reduce the size of the bundle by means of the
[replicationBlacklist] stanza in
distsearch.conf. See "Modify the knowledge bundle" in this manual.
The knowledge bundle gets distributed to the
$SPLUNK_HOME/var/run/searchpeers/ directory on each search peer. Because the search head distributes its knowledge, search scripts should not hardcode paths to resources. The knowledge bundle will reside at a different location on the search peer's file system, so hardcoded paths will not work properly.
By default, the search head replicates and distributes the knowledge bundle to each search peer. For greater efficiency, you can instead tell the search peers to mount the knowledge bundle's directory location, eliminating the need for bundle replication. When you mount a knowledge bundle, it's referred to as a mounted bundle. To learn how to mount bundles, read "Mount the knowledge bundle".
All authorization for a distributed search originates from the search head. At the time it sends the search request to its search peers, the search head also distributes the authorization information. It tells the search peers the name of the user running the search, the user's role, and the location of the distributed
authorize.conf file containing the authorization information.
Licenses for distributed deployments
Each indexer node in a distributed deployment must have access to a license pool.
Search heads performing no indexing or only summary indexing can use the forwarder license. If the search head performs any other type of indexing, it must have access to a license pool.
See "Licenses for search heads" in the Installation manual for a detailed discussion of licensing issues.
It's recommended that you upgrade search heads and search peers to any new version at the same time to take full advantage of the latest search capabilities. This section describes the consequences of deploying multi-version distributed search for specific scenarios.
All search nodes must be 4.x or later
All search nodes must be running Splunk 4.x or 5.x to participate in the distributed search. Distributed search is not backwards compatible with Splunk 3.x.
Search nodes and 5.0 features
You need to upgrade both search heads and search peers to version 5.0 to take advantage of search capabilities that are new to 5.0, such as report acceleration.
Compatibility between 4.3 search heads and 4.2 search peers
You can run 4.3 search heads across 4.2 search peers, but some 4.3 search-related functionality will not be available. These are the main features that require 4.3 search peers:
- The spath search command.
- Bloom filters.
- Historical backfill for real-time data from the search peers.
Also, note that 4.3-specific stats/chart/timechart functionality is less efficient when used against 4.2.x search peers because the search peers can't provide map/reduce capability for that functionality. The functionality affected includes sparklines and the
Compatibility between 4.2.5+ search heads and pre-4.2.5 search peers
Because of certain feature incompatibilities, pre-4.2.5 search peers can consume 20-30% more CPU resources when deployed with a 4.2.5 or later search head. You might see error messages such as "ConfObjectManagerDB - Ignoring invalid database setting" in
splunkd.log on the search peers.
Bundle replication warning when running a 4.2 search head against a 4.1.x search peer
Bundle replication is the process by which the search head distributes knowledge bundles, containing the search-time configurations, to its search peers. This ensures that all peers run searches using the same configurations, so that, for example, all peers use the same definition of an event type.
Starting with 4.2, bundle replication occurs asynchronously. The search head performs bundle replication in a non-blocking fashion that allows in-progress searches to continue on the search peers. When issuing searches, the search head specifies the bundle version that the peers must use to run those searches. The peers will not start using the newly replicated bundles until the search head confirms that all peers have the latest bundle version.
However, the new 4.2 search head behavior can cause pre-4.2 search peers to get out of sync and use different bundles when running their searches. If you run a 4.2 search head against 4.1.x search peers, you'll get this warning message: "Asynchronous bundle replication might cause (pre 4.2) search peers to run searches with different bundle/config versions. Results might not be correct."
Note: This issue goes away in 4.2.1 search heads. Starting with 4.2.1, the search head will revert to synchronized bundled replication if any of the search peers is pre-4.2.
Heavy and light forwarder capabilities
Install a dedicated search head
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18