Splunk® Enterprise

Distributed Search

Download manual as PDF

Download topic as PDF

Overview of parallel reduce search processing

High-cardinality searches are searches that must match, filter, and aggregate extremely large numbers of unique field values. User IDs, session IDs, and telephone numbers are examples of fields that tend to be high in cardinality. Searches that compute aggregates over high-cardinality fields can be slow to complete. If high-cardinality searches in your Splunk platform deployment are slow, you can use parallel reduce search processing to help them complete quicker.

In a typical distributed search process, there are two broad search processing phases: a map phase and a reduce phase. The map phase takes place across the indexers in your deployment. In the map phase, the indexers locate event data that matches the search query and sort it into field-value pairs. When the map phase is complete, indexers send the results to the search head for the reduce phase. During the reduce phase, the search heads process the results through the commands in your search and aggregate them to produce a final result set.

The following diagram illustrates the standard two-phase distributed search process.

This diagram is titled Standard Two-Phase Search Process. This diagram is described by the paragraph preceding this image.

The parallel reduce process inserts an intermediate reduce phase into the map-reduce paradigm, making it a three-phase map-reduce-reduce operation. In this intermediate reduce phase, a subset of your indexers serve as intermediate reducers. The intermediate reducers divide up the mapped results and perform reduce operations on those results for certain supported search commands. When the intermediate reducers complete their work, they send the results to the search head, where the final result reduction and aggregation operations take place. The parallel processing of reduction work that otherwise would be done entirely by the search head can result in faster completion times for high-cardinality searches that aggregate large numbers of search results.

The following diagram illustrates the three-phase parallel reduce search process.

This diagram is titled Three-Phase Parallel Reduce Search Process. This diagram is described by the text in the paragraph preceding the image.

Parallel reduce prerequisites

To enable parallel reduce search processing, you need the following prerequisites in place:

Prerequisite Details For more information see
A distributed search environment. Parallel reduce search processing requires a distributed search deployment architecture. About distributed search
An environment where the indexers are at a single site. Parallel reduce search processing is not site-aware. Do not use it if your indexers are in a multisite indexer cluster, or if you have non-clustered indexers spread across several sites.
Splunk platform version 7.1.0 or later for all participating machines. Upgrade all Splunk instances that participate in the parallel reduce process to version 7.1.0 or later. Participating instances include all indexers and search heads. How to upgrade Splunk Enterprise in the Installation Manual
Internal search head data forwarded to the indexer layer. The parallel reduce search process ignores all data on the search head. If you plan to run parallel reduce searches, the best practice is to forward all search head data to the indexer layer. Best Practice: Forward search head data to the indexer layer
A low to medium average indexer load. Parallel reduce search processes add a significant amount of indexer load. If you attempt to run parallel reduce searches in an already overloaded indexer system, you might encounter slow performance. If you run an indexer cluster, you might see skipped heartbeats between peer nodes and the cluster master. See Use the monitoring console to view index and volume status, in Managing Indexers and Clusters of Indexers
All indexers configured to allow secure communication with intermediate reducers. Admins must set an identical pass4SymmKey security key in the [parallelreduce] stanza of server.conf for all indexers. This security key enables communication between indexers and intermediate reducers. Configure your indexers to communicate with intermediate reducers
Users with roles that include the run_multi_phased_searches capability. Users must have the run_multi_phased_searches capability to use the redistribute command. The redistribute command applies parallel reduce search processing to a search. Apply parallel reduce processing to searches

Next steps

Learn how to configure your deployment for parallel reduce search processing. See Configure parallel reduce search processing.

PREVIOUS
Use the monitoring console to view distributed search status
  NEXT
Configure parallel reduce search processing

This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.3.0


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters