Hardware capacity planning for your Splunk deployment
Splunk is a flexible product that meets almost any scale and redundancy requirement in the course of its operation. Taking advantage of that flexibility requires careful planning. This chapter discusses high level hardware guidance for Splunk deployments and describes how Splunk uses hardware resources in various situations.
Before deciding on your hardware outlay for Splunk:
1. Be sure to review "Components of a Splunk Deployment" in this manual for a description of all of the elements of a Splunk installation.
2. Next, learn about the type of hardware that comprises a "single indexer" by reading "Reference hardware."
3. Finally, read the remaining topics in this chapter to learn how Splunk operations impact performance and how to maximize that performance.
Dimensions of a Splunk deployment
In some cases, a single indexer can handle the load of both searching and indexing.
There are scenarios where you must consider adding infrastructure to your Splunk deployment for maximum efficiency and performance. Below is a list of things that significantly impact Splunk's performance:
1. The amount of incoming data. The more data you send to Splunk, the more time Splunk needs to index it into results that you can search, report and generate alerts on.
2. The amount of indexed data. As the amount of data stored in Splunk's index goes up, the server that indexes that data requires additional bandwidth both to store the data and provide results for searches.
3. The number of concurrent users. If more than one person at a time uses an instance of Splunk, that instance requires more resources for those users to do searches and create reports and dashboards.
4. The number of saved searches. If you plan on running a lot of saved searches, Splunk needs capacity to perform those searches promptly and efficiently. The more saved searches you run in a given period of time, the more resources are required.
5. The types of search you employ. Almost as important as the number of saved searches is the types of search that you run against a Splunk system. There are several different types of search, each of which affects how the indexer responds to search requests.
6. Whether or not you run Splunk apps. Splunk apps and solutions can have unique performance, deployment, and configuration considerations. If you plan on running apps, make sure you consider the resource requirements of the app(s) you are using. Refer to the installation and deployment section of your app or solution's documentation for additional information. Additionally, read "Hardware capacity planning for a distributed Splunk deployment" to learn how to properly size your environment for an app's increased resource requirements.
How do these dimensions impact overall performance?
Follow the links above to determine how each of the dimensions impacts performance on a reference indexer.
While these factors impact the basic sizing requirements of your Splunk deployment on the whole, it's important to understand that addressing each of them individually does not guarantee peak efficiency for your Splunk deployment. You must discover how these factors correlate with one another in your specific application in order to realize maximum performance.
For example, if your Splunk deployment calls for a low amount of indexing but has a high number of concurrent users, it has significantly different resource needs than a setup with a low number of concurrent users and a high amount of daily indexing volume. Additionally, as both user count and amount of indexed data rise, you must distribute the environment across multiple servers to maintain a similar performance level. Search types complicate matters further, as some are bound by available CPU resources, and others are bound by the speed of the disk subsystem.
When should I scale my Splunk deployment?
To best answer this question you must understand how the above Splunk deployment dimensions apply to your specific use case. Ask yourself these questions, then refer to the performance questionnaire later in this chapter to help ascertain when you should add more hardware resources:
- How much data do you expect to index daily?
- How much data do you need to retain?
- How many users do you expect to search through the data at any one time?
- Do you plan to use certain specific searches more than once?
- Do you want or need to use a Splunk app to present or manipulate your data?
The key to a well-performing installation is to develop a plan early in the deployment cycle to account for both your initial outlay of hardware resources, as well as the addition of resources when the deployment scales up.
You can read about capacity planning for a distributed deployment at "Hardware capacity planning for a distributed Splunk deployment" in the Distributed Deployment Manual.
More ways to secure Splunk Enterprise
How incoming data affects Splunk performance
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18