Overview of parallel reduce search processing

High-cardinality searches are searches that must match, filter, and aggregate extremely large numbers of unique field values. User IDs, session IDs, and telephone numbers are examples of fields that tend to be high in cardinality. Searches that compute aggregates over high-cardinality fields can be slow to complete. If high-cardinality searches in your Splunk platform deployment are slow, you can use parallel reduce search processing to help them complete quicker.

In a typical distributed search process, there are two broad search processing phases: a map phase and a reduce phase. The map phase takes place across the indexers in your deployment. In the map phase, the indexers locate event data that matches the search query and sort it into field-value pairs. When the map phase is complete, indexers send the results to the search head for the reduce phase. During the reduce phase, the search heads process the results through the commands in your search and aggregate them to produce a final result set.

The following diagram illustrates the standard two-phase distributed search process.

The parallel reduce process inserts an intermediate reduce phase into the map-reduce paradigm, making it a three-phase map-reduce-reduce operation. In this intermediate reduce phase, a subset of your indexers serve as intermediate reducers. The intermediate reducers divide up the mapped results and perform reduce operations on those results for certain supported search commands. When the intermediate reducers complete their work, they send the results to the search head, where the final result reduction and aggregation operations take place. The parallel processing of reduction work that otherwise would be done entirely by the search head can result in faster completion times for high-cardinality searches that aggregate large numbers of search results.

The following diagram illustrates the three-phase parallel reduce search process.

Parallel reduce prerequisites

To enable parallel reduce search processing, you need the following prerequisites in place:

Prerequisite	Details	For more information see
A distributed search environment.	Parallel reduce search processing requires a distributed search deployment architecture.	About distributed search
An environment where the indexers are at a single site.	Parallel reduce search processing is not site-aware. Do not use it if your indexers are in a multisite indexer cluster, or if you have non-clustered indexers spread across several sites.
Splunk platform version 7.1.0 or later for all participating machines.	Upgrade all Splunk instances that participate in the parallel reduce process to version 7.1.0 or later. Participating instances include all indexers and search heads.	How to upgrade Splunk Enterprise in the Installation Manual
Internal search head data forwarded to the indexer layer.	The parallel reduce search process ignores all data on the search head. If you plan to run parallel reduce searches, the best practice is to forward all search head data to the indexer layer.	Best Practice: Forward search head data to the indexer layer
A low to medium average indexer load.	Parallel reduce search processes add a significant amount of indexer load. If you attempt to run parallel reduce searches in an already overloaded indexer system, you might encounter slow performance. If you run an indexer cluster, you might see skipped heartbeats between peer nodes and the cluster master.	See Use the monitoring console to view index and volume status, in Managing Indexers and Clusters of Indexers
All indexers configured to allow secure communication with intermediate reducers.	Admins must set an identical `pass4SymmKey` security key in the `[parallelreduce]` stanza of `server.conf` for all indexers. This security key enables communication between indexers and intermediate reducers.	Configure your indexers to communicate with intermediate reducers
Users with roles that include the `run_multi_phased_searches` capability.	Users must have the `run_multi_phased_searches` capability to use the `redistribute` command. The `redistribute` command applies parallel reduce search processing to a search.	Apply parallel reduce processing to searches

Next steps

Learn how to configure your deployment for parallel reduce search processing. See Configure parallel reduce search processing.

Related answers from Splunk Community

Overview of parallel reduce search processing

Parallel reduce prerequisites

Next steps

Comments

Overview of parallel reduce search processing

Was this topic useful?