When sizing your Splunk environment's hardware needs, a reference machine helps you understand when it is time to scale and distribute the deployment. Following is an example of such a machine. Refer to this configuration as the standard for the remainder of this chapter.
The reference machine described below produces the following index and search performance metrics for a given sample of data:
- Up to 5.8 megabytes per second (500 GB per day) of raw indexing performance, provided no other Splunk activity is occurring.
- Up to 50,000 events per second for dense searches
- Up to 5,000 events per second for sparse searches
- Up to 2 seconds per index bucket for super-sparse searches
- From 10 to 50 buckets per second for rare searches with bloom filters
To find out more about the types of searches and how they affect Splunk performance, read "How search types affect Splunk performance" in this manual.
- Intel x86 64-bit chip architecture
- 2 CPUs, 4 cores per CPU (8 cores total), at least 2.5 Ghz per core
- 8 GB RAM
- Standard 1 Gb Ethernet NIC, optional 2nd NIC for a management network
- Standard 64-bit Linux or Windows distribution
The reference computer's disk subsystem should be capable of handling a high number of averaged Input/Output Operations Per Second (IOPS).
IOPS are a measurement of how much data throughput a hard drive can produce. Since a hard drive reads and writes at different speeds, there are IOPS numbers for disk reads and writes. The average IOPS is the blend between those two figures.
The more average IOPS a hard drive can produce, the more data it can index and search in a given period of time. While many variable items factor into the amount of IOPS that a hard drive can produce, the three most important elements are:
- its rotational speed (in revolutions per minute)
- its average latency (the amount of time it takes to spin its platters half a rotation)
- its average seek time (the amount of time it takes to retrieve a requested block of data.)
To get the most IOPS out of a hard drive, always choose those drives that have high rotational speeds and low average latency and seek times. Every drive manufacturer provides this information (and some provide much more).
For additional information on IOPS and how to calculate them, review the following articles:
- "Getting the hang of IOPS (http://www.symantec.com/connect/articles/getting-hang-iops-v13) on Symantec's Connect Community.
- "Analyzing I/O performance in Linux (http://www.cmdln.org/2010/04/22/analyzing-io-performance-in-linux) on CMDLN.ORG (A sysadmin blog).
For this application, we use eight 146-gigabyte, 15,000 RPM serial-attached SCSI (SAS) HDs in a Redundant Array of Independent Disks (RAID) 1+0 fault tolerance scheme as the disk subsystem. Each hard drive is capable of about 200 average IOPS. The combined array produces a little over 800 IOPS.
Important: Splunk is often constrained by disk I/O first, so always consider disk infrastructure first when specifying your hardware.
Splunk performs fastest when deployed directly on to bare-metal hardware, as described above. However, Splunk can and does deliver on virtual equipment. What's more, we fully support deploying Splunk on virtual hardware.
Using the bare metal hardware as a baseline, Splunk generally indexes data about 30% slower on a virtual machine (VM) than it does on a standard reference machine. Search performance is on par with the real-world hardware.
This is a best-case scenario that does not account for resource contention with other active VMs on the same physical server. It also does not take into account certain vendor-specific I/O enhancement techniques (such as Direct I/O or Raw Device Mapping).
Splunk in the cloud
There are two different ways to run Splunk in the cloud: Splunk Enterprise (the version you download from Splunk's website) in your own cloud environment, or Splunk's new cloud-based solution, Splunk Storm.
Splunk Enterprise in the cloud
While you can run Splunk in the cloud, there are various concerns that you must be aware of when doing so. In addition to the security concerns of running Splunk in a public cloud, you must also note that performance degrades significantly compared to bare-metal hardware. Using that benchmark as a baseline again, Splunk indexing performance on a cloud-based computer is roughly half that of a real one. Searching suffers, too - results return anywhere from 15 to 20 percent slower than on a physical machine.
Splunk Storm is for users who prefer to consume services over installing software. Splunk Storm is the power of Splunk delivered as a pay-as-you-go service. Splunk Storm is for users who prefer not to manage their own servers or software installations.
The tradeoff is that Splunk Storm isn't as full-featured as Splunk Enterprise and doesn't provide as much flexibility when it comes to user management. Advanced Splunk features such as alerting, summary indexing, custom sourcetypes and data preview, field extractions based on delimiters, lookups, apps, LDAP integration, and a fully functional REST API are not yet available in Splunk Storm.
How Splunk calculates disk storage
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18