Distribute indexing and searching
This topic discusses the reasons to distribute the components of your Splunk platform deployment.
Concepts of distributed indexing and searching
Designing a scalable architecture for the Splunk platform requires knowledge of the Splunk instance roles, and how they were intended to scale.
The two most common roles are the search head and the indexer. They represent the roles that carry the burden of managing user objects, searching, parsing, and data storage.
A Search head is responsible for:
- Hosting users.
- Storing user created objects.
- Scheduling searches and alerting.
- Providing visual feedback through dashboards and views.
- Enforcing access controls.
An Indexer is responsible for:
- Accepting data streams from forwarders.
- Parsing the data.
- Writing the data into buckets.
- Maintaining the buckets.
- Accepting search requests from the search heads.
- Searching the buckets and streaming results back to the search head.
A search head's tasks are primarily CPU bound. As more users and more apps are added to a search head, the concurrent search load climbs quickly and hits a limit. The limit represents the aggregate search load across all users and apps to a search head's CPU cores.
Adding search heads to the deployment increases the aggregate CPU resources, increasing the aggregate search concurrency and the number of active users and apps supported in the environment.
An indexer's tasks are primarily I/O bound. As more forwarders are added to the network, there are more concurrent data streams to accept, and more data to parse before writing. In addition, the search requests require I/O access and processor time to analyze, collect, and return the requested data. As the data streams increase in volume and the concurrent search requests climb higher, the indexer hits a limit. The limit represents the aggregate search load from all search heads and indexing load from forwarders to an indexer's I/O capacity.
Adding indexers to the deployment increases the total aggregate I/O capacity and storage available to save data, reduces the data volume per indexer load, and reduces the impact of the search load by spreading it across more indexers.
Scaling the Splunk platform
A typical Splunk platform deployment plan is based upon 2 points: indexed data volume per day, and estimated search load. User counts are often used as a proxy for search load. For example, one active user with admin-level search concurrency can sustain the same load on the Splunk platform deployment as several users at a lower role level.
Most Splunk implementations are built around a handful of users and a few apps searching hundreds of gigabytes of data. In that scenario, adding indexers is the preferred method of scaling. The same rule applies when implementing a indexer cluster.
As the search head gains more users, the CPU limitations will become apparent as searches may skip, and users experience slower search result speed. Adding another search head to your distributed deployment does not guarantee improved search performance. As the user count increases, indexers must be added to maintain search performance. For a table with scaling guidelines, see Summary of performance recommendations in this manual.
When planning for a user count at 50 or more, consider implementing a search head cluster to absorb a high level of users while adding redundancy to the search tier.
As your indexers consume data, they store it in buckets, which are the individual elements of an index. As more data comes in, the number of buckets increases. As the number of buckets increases, the indexer must manage the buckets by "rolling" them to make room for new incoming data. This procedure takes up I/O cycles, which reduces the resources available to fetch events for search requests. The impact is noticeable for index buckets that hold smaller amounts of data.
Avoid configuring many indexes comprised of small buckets. For examples utilizing the
maxDataSize bucket setting, see
indexes.conf.example in the Splunk Enterprise Admin manual.
The number and types of search also impact indexer performance. Most search types leverage an indexer's disk subsystem, but a few will use more CPU. For information on simultaneous searches, see Accommodate concurrent users and searches in this manual.
If the hardware allocated for indexers exceeds the reference machine specifications, consider reviewing and implementing one of the Parallelization settings to improve the performance for specific use cases.
Use the Distributed Management console (DMC) to monitor and track resource usage across the Splunk platform environment. For more details, see About the distributed management console in Monitoring Splunk Enterprise.
Estimate your storage requirements
Accommodate many simultaneous searches
This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11