Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Replication factor

As part of setting up the cluster, you specify the number of copies of data that you want the cluster to maintain. The data is divided into buckets, and the cluster maintains multiple copies of each bucket. Each bucket copy is stored on a separate peer node. The number of data/bucket copies is called the cluster's replication factor.

The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, for example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of your data available on the remaining peer(s).

By increasing the replication factor, you can tolerate more peer node failures. With a replication factor of 2, you can tolerate just one node failure; with a replication factor of 3, you can tolerate two concurrent failures, and so on.

The trade-off is that you need to store and process all those copies of data. Although the replicating activity doesn't consume much processing power, still, as the replication factor increases, you need to run more indexers and provision more storage for the indexed data. On the other hand, since data replication itself requires little processing power, you can take advantage of the multiple indexers in a cluster to ingest and index more data. Each indexer in the cluster can function as both originating indexer ("source peer") and replication target ("target peer"). It can index incoming data and also store copies of data from other indexers in the cluster.

In the following diagram, one peer is receiving data from a forwarder, which it processes and then streams to two other peers.The cluster will contain three complete copies of the peer's data, one copy on each peer.

Replication factor 3.png

Note: This diagram represents a highly simplified version of peer replication, where all data is entering the system through a single peer. There are a few issues that add complexity to a real-life scenario:

  • In most clusters, each of the peer nodes would be functioning as both source and target peer, receiving external data from a forwarder, as well as replicated data from other peers.
  • To accommodate horizontal scaling, a cluster with a replication factor of 3 could consist of many more peers than three. At any given time, each source peer would be streaming copies of its data to two target peers, but each time it started a new hot bucket, its set of target peers could potentially change.

Later topics in this chapter describe in detail how clusters process data.

PREVIOUS
Basic concepts for advanced users
  NEXT
Search factor

This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters