Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

System requirements and other deployment considerations

Clusters are groups of Splunk indexers, so, for the most part, you just need to adhere to the system requirements for Splunk indexers. For detailed software and hardware requirements for Splunk indexers, read "System requirements" in the Installation Manual. The current topic notes additional requirements for clusters.

Hardware requirements

Each component of the cluster (master node, peer nodes, and search head) must run on its own, separate machine or VM. Other than that, the hardware requirements are basically the same as for any Splunk instance, as described in "Reference hardware" in the Installation Manual.

Note: The hardware storage needs of the master node are obviously lower than those specified in the "Reference hardware" topic, since the master does not index external data.

Splunk version compatibility

You can implement clustering on any group of Splunk indexers, version 5.0 or above. All cluster components, including the search head, must be running the same version of Splunk.

Required Splunk instances

Each cluster node must reside on its own Splunk instance.Therefore, the cluster must consist of at least (replication factor + 2) Splunk instances: a minimum of replication factor number of peer nodes, plus one master node and one or more search heads. For example, if you want to deploy a cluster with a replication factor of 3, you must set up at least five Splunk instances: three peers, one master, and one search head. To learn more about the replication factor, read "Replication factor" in this manual.

The size of your cluster depends on other factors besides the replication factor, such as the amount of data you need to index. See "Deployment overview" in this manual for details.

Network requirements

All nodes of your cluster should reside on a high speed network where each node can access every other node. This includes master node, peer nodes, and search head.

The nodes do not necessarily need to be on the same subnet, or even in the same data center (assuming you have an extremely fast connection between the data centers). You can adjust the various cluster timeout settings in server.conf. For help in configuring timeout settings, contact Splunk Professional Services.

Note: With sufficiently high-quality connections, it is possible to deploy the cluster across data centers. However, in the current version of Splunk, the cluster is not site-aware. For example, in a scenario where you have peer nodes spread across two data centers, you cannot specify that one replicated copy of the cluster data reside on nodes in one data center and a second copy reside on nodes in a second data center. When the master determines how some set of data gets replicated across the cluster, it does not take peer location into consideration.

Storage considerations

When considering storage requirements for your clustered indexes, there are two things that you need to look at differently compared to non-clustered indexes:

  • The increase in capacity, across the set of peer nodes, necessary to handle the multiple copies of data.
  • The type of storage hardware to use.

Clusters use the usual settings for managing index storage, as described in "Configure index storage".

Determine your storage requirements

It's important to ensure you have enough disk space to accommodate the volume of data your peer nodes will be processing. For a general discussion of Splunk data volume and how to estimate your storage needs, refer to "Estimating your storage requirements" in the Installation Manual. That topic provides information on how to estimate storage for non-clustered indexers, so you need to supplement its guidelines to account for the extra copies of data that a cluster stores.

With a cluster, in addition to considering the volume of incoming data, you must consider the replication factor and search factor to arrive at your total storage requirements across the set of peer nodes. With a replication factor of 3, you are storing three copies of your data. You will need extra storage space to accommodate these copies, but you will not need three times as much storage. Replicated copies of non-searchable data are smaller than copies of searchable data, because they include only the data and not the associated index files. So, for example, if your replication factor is 3 and your search factor is 2, you will need more than two, but less than three, times the storage capacity compared to storing the same data on non-clustered indexers.

Exactly how much less storage your non-searchable copies require takes some additional investigation on your part. The index files excluded by non-searchable copies can vary greatly in size, depending on factors described in "Estimating your storage requirements".

Important: A master is not aware of the amount of storage on individual peer nodes, and therefore it does not take available storage into account when it makes decisions about which peer node should receive a particular set of replicated data. It also makes arbitrary decisions about which peer should make some set of replicated data searchable (in cases where the search factor is 2 or greater). Therefore, you must ensure that each peer node has sufficient storage not only for the data originating on that peer, but also for any replicated copies of data that might get streamed to it from other peers. You should continue to monitor storage usage throughout the life of the cluster.

Storage requirement examples

As a ballpark figure, incoming syslog data, once it has been compressed and indexed in Splunk, occupies approximately 50% of its original size:

  • 15% for the rawdata file.
  • 35% for associated index files.

In practice, this estimate can vary substantially, based on the factors described in "Estimating your storage requirements" in the Installation Manual.

Assume you have 100GB of syslog data coming into Spunk. In the case of a non-clustered indexer, that data would occupy approximately 50GB (50% of 100GB) of storage on Splunk. However, in the case of clusters, storage calculations must factor in the replication factor and search factor to arrive at total storage requirements across all the cluster peers. (As mentioned earlier, you cannot easily predict exactly how much storage will be required on any specific peer.)

Here are two examples of estimating cluster storage requirements, both assuming 100GB of incoming syslog data, resulting in 15GB for each set of rawdata and 35GB for each set of index files:

  • 3 peer nodes, with replication factor = 3; search factor = 2: This requires a total of 115GB across all peer nodes (averaging 38GB/peer), calculated as follows:
    • Total rawdata = (15GB * 3) = 45GB.
    • Total index files = (35GB * 2) = 70 GB.
  • 5 peer nodes, with replication factor = 5; search factor = 3: This requires a total of 180GB across all peer nodes (averaging 36GB/peer), calculated as follows:
    • Total rawdata = (15GB * 5) = 75GB.
    • Total index files = (35GB * 3) = 105 GB.

Storage hardware

You designate the locations of hot/warm and cold buckets with the homePath and coldPath attributes, respectively, in indexes.conf. See "Configure index storage" for more information. Clusters have very different requirements for the type of storage used at these locations, compared to non-clustered indexers.

On a non-clustered indexer, by specifying separate partitions for hot/warm buckets and cold buckets, you can designate different types of storage for each. This is useful because cold buckets are typically accessed less frequently than hot/warm buckets and therefore can be located on slower disk arrays. Also, Splunk doesn't usually need to perform index processing on cold buckets. See "Use multiple partitions for index data" for details on this.

On a cluster, however, this approach is not recommended. The storage used for the coldPath location should have the same performance characteristics as that used for homePath storage. This is because all replicated copies of buckets reside in the peers' coldPath directories. It doesn't matter whether they're hot, warm, or cold. If you use slower storage for the coldPath location, it will slow the overall performance of your cluster.

Clusters require strongly performing storage for the coldPath location in order to handle the needs of cluster operations. For example, some of the buckets in the coldPath location will be replicated hot bucket copies still being written to. Other buckets will be replicated warm copies, and the search head might be accessing them frequently. In addition, depending on how the cluster is configured and what occurs subsequently (in terms of peers going offline, etc.), the peer might need to convert bucket copies from non-searchable to searchable, entailing a considerable amount of processing on the coldPath data.

You must also make sure to set aside enough coldPath storage on each peer to handle all replicated copies of hot and warm buckets, in addition to the usual cold bucket storage needs.

Note: It's only the hot/warm replicated bucket copies that reside in the coldPath location. The hot/warm original bucket copies reside in their peers' homePath location, as usual. In addition, naturally, all cold bucket copies reside in coldPath.

Licensing information

As with any Splunk deployment, your licensing requirements are driven by the volume of data your indexers process. Contact your Splunk sales representative to purchase additional license volume. Refer to "How licensing works" in the Admin Manual for more information about Splunk licensing.

There are just a few license issues that are specific to index replication:

  • All cluster members, including masters, peers, and search heads, need to be in an Enterprise license pool, even if they're not expected to index any data.
  • Cluster members must share the same licensing configuration.
  • Only incoming data counts against the license; replicated data does not.
  • You cannot use index replication with a free license.

Deployment server and clusters

Do not use deployment server with cluster peers.

The deployment server is not supported as a means to distribute configurations or apps to cluster peers. To distribute configurations across the set of cluster peers, use the configuration bundle method outlined in the topic "Update common peer configurations".

PREVIOUS
READ THIS FIRST: Key differences between clustered and non-clustered Splunk deployments
  NEXT
Enable the master node

This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters