Splunk® Enterprise

Installation Manual

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Hardware capacity planning for your Splunk deployment

Splunk is a very flexible product that can be deployed to meet almost any scale and redundancy requirement. However, that doesn't remove the need for care and planning. This article discusses high level considerations for Splunk deployments, including sizing issues.

After you've worked through the general layout of your Splunk search topology, the other sections in this document can explain more thoroughly how to implement them, along with the formal Admin guide for Splunk.

Reference hardware

Let's consider a common, commodity hardware server as our standard:

  • Intel x86-64-bit chip architecture
  • Standard Linux or Windows 64-bit distribution
  • 2 CPU, 4 core per CPU, 2.5-3Ghz per core
  • 8GB RAM
  • 4x300GB SAS hard disks at 10,000 rpm each in RAID 10
    • capable of 800 IO operations / second (IOPS)
  • standard 1Gb Ethernet NIC, optional 2nd NIC for a management network

For the purposes of this discussion this will be our single server unit. Note that the only exceptional item here is the disk array. Splunk is often constrained by disk I/O first, so always consider that first when selecting your hardware.

Performance checklist

The first step to deciding on a reference architecture is sizing - can your Splunk handle the load? For the purposes of this guide we assume that managing forwarder connections and configurations (but not their data!) to be free. Therefore we need to look at index volume and search load.

Question 1: Do you need to index more than 2GB per day?

Question 2: Do you need more than 2 concurrent users?

If the answer to both questions is 'NO' then your Splunk instance can safely share one of the above servers with other services, with the caveat that Splunk be allowed sufficient disk I/O on the shared box. If you answered yes, continue.

Question 3: Do you need to index more than 100GB per day?

Question 4: Do you need to have more than 4 concurrent users?

If the answer to both questions is 'NO', then a single dedicated Splunk server of our reference architecture should be able to handle your workload.

Question 5: Do you need more than 500GB of storage?

At a high level, total storage is calculated as follows:

  daily average rate x retention policy x 1/2

You can generally safely use this simple calculation method. If you want to base your calculation on the specific type(s) of data that you'll be feeding into Splunk, you can use the method described in "Estimate your storage requirements" in this manual.

Splunk can generally, including indexes, store raw data at approximately half the original size thanks to compression. Given allowances for operating system and disk partitioning, that suggests about 500GB of usable space. In practical terms, that's ~6 months of fast storage at 5GB/day, or 10 days at 100GB/day.

If you need more storage, you can either opt for more local disks for fast access (required for frequent searching) or consider attached or network storage (acceptable for occasional searching). Low-latency connections over NFS or CIFS are acceptable for searches over long time periods where instant search returns can be compromised to lower cost per GB. Shares mounted over WAN connections and standby storage such as tape are never acceptable.

Beyond 100GB per day

If you have requirements greater than 100GB/day or 4 concurrent users, you'll want to leverage Splunk's scale-out capabilities. That involves using distributed search to run searches in parallel across multiple indexers at once, and possibly load balancing the incoming data with load-balanced Splunk forwarders.

While Splunk does continue to scale linearly across multiple indexers, at this scale other considerations often become important. Larger user counts, more forwarders, and more scheduled searches begin to overshadow indexing throughput in your deployment design. Also, at this scale it is very likely that you will have high availability or redundancy requirements, which are covered in greater detail in "High availability reference architecture".

Dividing up indexing and searching

At daily volumes above 100GB/day, it makes sense to slightly modify our reference hardware to reflect the differing needs of indexers and search heads. Dedicated search heads do not need disk I/O, nor much local storage. However they are far more CPU bound than indexers. Therefore we can change our recommendations to:

Dedicated search head

  • Intel 64-bit chip architecture
  • Standard Linux or Windows 64-bit distribution
  • 4 CPU, 4 core per CPU, 2.5-3Ghz per core
  • 4GB RAM
  • 2 300GB SAS hard disks at 10,000 rpm each in RAID 0
  • standard 1Gb Ethernet NIC, optional 2nd NIC for a management network

Given that a search head will be CPU bound, if fewer, more performant servers are desired, adding more and faster CPU cores is best.

Note: The guideline of 1 core per active user still applies. Don't forget to account for scheduled searches in your CPU allowance as well.

Indexer

  • Intel 64-bit chip architecture
  • Standard Linux or Windows 64-bit distribution
  • 2 CPU, 4 core per CPU, 2.5-3Ghz per core
  • 8GB RAM
  • 8 300GB SAS hard disks at 10,000 rpm each in RAID 10
    • capable of 1200 IO operations / second (IOPS)
  • standard 1Gb Ethernet NIC, optional 2nd NIC for a management network

The indexers will be busy both writing new data and servicing the remote requests of search heads. Therefore disk I/O is the primary bottleneck.

At these daily volumes, likely local disk will not provide cost effective storage for the time frames that speedy search is desired, suggesting fast attached storage or networked storage. While there are too many types of storage to be prescriptive, here are guidelines to consider:

  • indexers do many bulk reads
  • indexers do many disk seeks

Therefore...

  • more disks (specifically, more spindles) are better
  • total throughput of the entire system is important, but...
  • disk to controller ratio should be higher, similar to a database

Ratio of indexers to search heads

Technically, there is no practical Splunk limitation on the number of search heads an indexer can support, or the number of indexers a search head can search against. However systems limitations suggest a ratio of approximately 8 to 1 for most use cases. That is a rough guideline however; if you have many searchers compared to your total data volume, more search heads make sense, for example. In general, the best use of a separate search head is to populate summary indexes. This search head will then act like an indexer to the primary search head that users log into.

Accommodating many simultaneous searches

A common question for a large deployment is: how do I account for many concurrent users? Let's take as an example a system that may have at peak times 48 concurrent searches. The short answer is that we can accommodate 48 simultaneous searches on a cluster of indexers and search heads where each machine has enough RAM to prevent swapping. Assuming that each search takes 200MB of RAM per system, that is roughly 10GB additional RAM (beyond indexing requirements). This is because CPU will degrade gracefully with more concurrent jobs but once the working set of memory for all processes exceeds the physical RAM, performance drops catastrophically with swapping.

The caveat here is that a search's run time will be longer in proportion to the number of free cores when no searches were running. For example, suppose the indexers were doing nothing before the searches arrived and have 8 cores each. Suppose the first (of identical searches) takes 10s to complete. Then the first 8 searches will each take 10s to complete since there is no contention. However, since there are only 8 cores, if there are 48 searches running, each search will take 48/8 = 6x longer than if only 1-8 searches were running. So now, every search takes ~1 minute to complete.

This leads to the observation that the most important thing to do here is add indexers. Indexers do the bulk of the work in search (reading data off disk, decompressing it, extracting knowledge and reporting). If we want to return to the world of 10s searches, we use 6 indexers (one search head is probably still fine, though it may be appropriate to set aside a search head for summary index creation) and searches 1-8 now take 10/6 = 1.6s and with 48 searches, each takes 10s.

Unfortunately, the system isn't typically idle before searches arrive. If we are indexing 150 GB/day, at peak times, we probably are using 4 of the 8 cores doing indexing. That means that the first 4 searches take 10s, and having 48 searches running takes 48/4 = 12x longer, or 2 min to complete each.

Now one might say: let me put sixteen cores per indexer rather than eight and avoid buying some machines. That makes a little bit of sense, but is not the best choice. The number of cores doesn't help searches 1-16 in this case; they still take 10s. With 48 searches, each search will take 48/16 = 3x longer, which is indeed better than 6x. However, it's usually not too much more expensive to buy two 8 core machines, which has advantages: the first few searches will now just take 5s (which is the most common case) and we now have more aggregate I/O capacity (doubling the number of cores does nothing for I/O, adding servers does).

The lesson here is to add indexers. Doing so reduces the load on any system from indexing, to free cores for search. Also, since the performance of almost all types of search scale with the number of indexers, searches will be faster, which mitigates the effect of slowness from resource sharing. Additionally making every search faster, we will often avoid the case of concurrent searches with concurrent users. In realistic situations, with hundreds of users, each user will run a search every few minutes, though not at the exact same time as other users. By reducing the search time by a factor of 6 (by adding more indexers), the concurrency factor will be reduced (not necessarily by 6x, but by some meaningful factor). This in turn, lowers the concurrency related I/O and memory contention.

Summary of performance recommendations

Daily Volume Number of Search Users Recommended Indexers Recommended Search Heads
< 2GB/day < 2 1, shared N/A
2GB/day to 100GB/day up to 4 1 N/A
200GB/day up to 8 2 1
300GB/day up to 12 3 1
400GB/day up to 8 4 1
500GB/day up to 16 5 2
1TB/day up to 24 10 2
20TB/day up to 100 100 24
60TB/day up to 100 300 32

Note that these are approximate guidelines only. You should feel free to modify based on the discussion here for your specific use case, and to contact Splunk for more guidance if needed.

Performance considerations

Splunk has three primary roles - indexer, searcher and forwarder. In many cases a single Splunk instance performs both indexing and searching. Although a Splunk indexer can also perform forwarding, in most cases, it makes more sense to use a separate Splunk instance, the universal forwarder, to handle forwarding. All roles have their own performance requirements and bottlenecks.

  • Indexing, while relatively resource inexpensive, is often disk I/O bound.
  • Searching can be both CPU and disk I/O bound.
  • Forwarding uses few resources and is rarely a bottleneck.

As you can see, disk I/O is frequently the limiting factor in Splunk performance. It deserves extra consideration in your planning. That also makes Splunk a poor virtualization candidate, unless dedicated disk access can be arranged.

CPU

  • Allow 1 CPU core for every 1MB/s of indexing volume
  • Allow 1 CPU core for Splunk's optimization routines for every 2MB/s of indexing volume
  • Allow 1 CPU per active searcher (be sure to account for scheduled searches)

Disk I/O

  • Assume 50 Iopps per 1 MB/s of indexing volume
  • Allow 50 Iopps for splunk's optimize routines
  • Allow 100 Iopps per search, or an average of 200 Iopps per search user

Memory

  • allow 200-300MB for indexing
  • allow 500MB per concurrent search user
  • allow 1GB for the operating system to accommodate OS caching

Total storage

  • Allow 15% overhead for OS and disk partitioning
    • On this system there is ~500GB of usable storage
  • Conservatively Splunk can, including indexes, compress original logs by ~50%
    • Compression rates vary based on the data

Based on these estimates, this machine will be disk I/O bound if there are too many active users or too many searches per user. That is the most likely limitation for this hardware, possibly followed by CPU if the searches are highly computational in nature, such as many uses of stats or eval commands in a single search.

Applied performance

With the information above, it is possible to estimate required hardware for most Splunk use cases by considering the following:

  • The amount of daily indexed volume (disk I/O, CPU)
  • The required retention period (total storage)
  • The number of concurrent search users (disk I/O, CPU)

Although not all search users consume the same amount of resources, consider these very rough guidelines:

  • Dashboard-heavy users trigger many searches at once
  • Dashboards also suggest many scheduled searches
  • Searching for rare events across large datasets (for example, all time) is disk I/O intensive
  • Calculating summary information is CPU intensive
    • If done over long time intervals, can also be disk I/O intensive
  • Alerts and scheduled searches run even if no one sees their results

What does that mean in real life?

  • Executive users with many dashboards and summaries require both CPU and disk I/O
  • Operations users searching over recent and small datasets require less resources
  • Forensic and compliance users searching over long timeframes require disk I/O
  • Alerting and scheduled searches over short timeframes are inexpensive; over long timeframes potentially very expensive.

Answers

Have questions? Visit Splunk Answers to see what questions and answers other Splunk users had about hardware and Splunk.

PREVIOUS
Components of a Splunk deployment
  NEXT
High availability reference architecture

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters