About clusters and index replication
Clusters are groups of Splunk indexers configured to replicate each others' data, so that the system keeps multiple copies of all data. This process is known as index replication. By maintaining multiple, identical copies of Splunk data, clusters prevent data loss while promoting data availability for searching.
Splunk clusters feature automatic failover from one indexer to the next. This means that, if one or more indexers fail, incoming data continues to get indexed and indexed data continues to be searchable.
The key benefits of index replication are:
- Data availability. An indexer is always available to handle incoming data, and the indexed data is available for searching.
- Data fidelity. You never lose any data. You have assurance that the data sent to Splunk is exactly the same data that gets stored in Splunk and that a search can later access.
- Data recovery. Your system can tolerate downed indexers without losing data or losing access to data.
The key trade-off in index replication is between the benefits of data availability/recovery and the costs of storage (and, to a minor degree, increased processing load). The degree of data recovery that the cluster possesses is directly proportional to the number of copies of data it maintains. But maintaining more copies of data means higher storage requirements. To manage this trade-off to match the needs of your enterprise, you can configure the number of copies of data you want the cluster to maintain. This is known as the replication factor.
You can also use clusters to scale indexing capacity, even in situations where index replication is not a requirement. See "Use clusters to scale indexing" for details.
Parts of a cluster
A cluster is a group of Splunk nodes that, working in concert, provide a redundant indexing and searching capability. There are three types of nodes in a cluster:
- A single master node to manage the cluster.
- Several peer nodes to index and maintain multiple copies of the data and to search the data later.
- One or more search heads to coordinate searches across the set of peer nodes.
The master node manages the cluster. It coordinates the replicating activities of the peer nodes and tells the search head where to find data. It also helps manage the configuration of peer nodes and orchestrates remedial activities if a peer goes down.
The peer nodes receive and index incoming data, just like non-clustered, stand-alone indexers. Unlike stand-alone indexers, however, peer nodes also replicate data from other nodes in the cluster. A peer node can index its own incoming data while simultaneously storing copies of data from other nodes. You must have at least as many peer nodes as the replication factor. That is, to support a replication factor of 3, you need three peer nodes.
The search head runs searches across the set of peer nodes. You must use a search head to manage searches across the peer nodes. You enable the search head at the same time you enable the rest of the cluster.
For most purposes, it's recommended that you use forwarders to get data into the cluster.
Here's a diagram of a basic cluster, containing three peer nodes and supporting a replication factor of 3:
This shows a simple deployment, similar to a small-scale non-clustered deployment, with some forwarders sending load-balanced data to a group of indexers (peer nodes), and the indexers sending search results to a search head. There are two additions that you don't find in a non-clustered deployment:
- The indexers are streaming copies of their data to other indexers.
- The master node, while it doesn't participate in any data streaming, coordinates a range of activities involving the search peers and the search head.
How to set up a cluster
Clusters are easy to set up. The process is similar to what you do to set up a group of stand-alone indexers. Basically, you install the indexers and perform a bit of configuration.
The main difference is that you also need to identify and enable the cluster nodes. You designate one indexer as the master node and the other indexers as peer nodes. You need at least as many peer nodes as the size of your replication factor. To increase indexing capacity for horizontal scaling, you just add more peer nodes.
You also need to set up a search head to manage searches across the peers and consolidate the results for the user.
You enable nodes and the search head in the same way that you configure any settings in Splunk: through Splunk Manager or the CLI, or directly, by editing configuration files.
See the chapter in this manual called "Deploy clusters" for detailed information.
How to search a cluster
You search a cluster the same way you would search any non-clustered group of indexers. You submit your searches through a search head.
What happens behind the scenes is a bit different, though. Once you've submitted your search, the search head consults the master node to determine which peer nodes have the data that's needed to process the search. The search head then distributes the search tasks directly to those nodes. The nodes do their part and send their results back to the search head, which then consolidates the results and sends the results back to Splunk Web. From the user's standpoint, it's no different than searching any stand-alone indexer or non-clustered group of indexers.
Before you go any further
Clusters are easy to set up and use, but you need to have a good grounding in the basics of Splunk indexing and deployment first. Before you continue, make sure you know this stuff:
- How to configure indexers. In particular, see "How Splunk stores indexes", along with the other topics in this manual that describe managing indexes.
- What a search head does. For an introduction to distributed search, see "About distributed search" in the Distributed Deployment manual.
- How to use a forwarder to get data into an indexer. See "Use forwarders" in the Getting Data In manual.
Migrating from a non-clustered Splunk deployment?
Clustered indexers have several different requirements from non-clustered Splunk indexers. It's important that you be aware of these issues before you migrate your indexers. For details, see "Key differences between clustered and non-clustered Splunk deployments". Once you've read that material, go to "Migrate non-clustered indexers to a clustered environment" for details on the actual migration process.
Restore archived indexed data
Basic cluster architecture
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18