Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

About clusters and index replication

Clusters are groups of Splunk indexers configured to replicate each others' data, so that the system keeps multiple copies of all data. This process is known as index replication. By maintaining multiple, identical copies of Splunk data, clusters prevent data loss while promoting data availability for searching.

Splunk clusters feature automatic failover from one indexer to the next. This means that, if one or more indexers fail, incoming data continues to get indexed and indexed data continues to be searchable.

The key benefits of index replication are:

  • Data availability. An indexer is always available to handle incoming data, and the indexed data is available for searching.
  • Data fidelity. You never lose any data. You have assurance that the data sent to Splunk is exactly the same data that gets stored in Splunk and that a search can later access.
  • Data recovery. Your system can tolerate downed indexers without losing data or losing access to data.

The key trade-off in index replication is between the benefits of data availability/recovery and the costs of storage (and, to a minor degree, increased processing load). The degree of data recovery that the cluster possesses is directly proportional to the number of copies of data it maintains. But maintaining more copies of data means higher storage requirements. To manage this trade-off to match the needs of your enterprise, you can configure the number of copies of data you want the cluster to maintain. This is known as the replication factor.

You can also use clusters to scale indexing capacity, even in situations where index replication is not a requirement. See "Use clusters to scale indexing" for details.

Parts of a cluster

A cluster is a group of Splunk nodes that, working in concert, provide a redundant indexing and searching capability. There are three types of nodes in a cluster:

  • A single master node to manage the cluster.
  • Several peer nodes to index and maintain multiple copies of the data and to search the data later.
  • One or more search heads to coordinate searches across the set of peer nodes.

The master node manages the cluster. It coordinates the replicating activities of the peer nodes and tells the search head where to find data. It also helps manage the configuration of peer nodes and orchestrates remedial activities if a peer goes down.

The peer nodes receive and index incoming data, just like non-clustered, stand-alone indexers. Unlike stand-alone indexers, however, peer nodes also replicate data from other nodes in the cluster. A peer node can index its own incoming data while simultaneously storing copies of data from other nodes. You must have at least as many peer nodes as the replication factor. That is, to support a replication factor of 3, you need three peer nodes.

The search head runs searches across the set of peer nodes. You must use a search head to manage searches across the peer nodes. You enable the search head at the same time you enable the rest of the cluster.

For most purposes, it's recommended that you use forwarders to get data into the cluster.

Here's a diagram of a basic cluster, containing three peer nodes and supporting a replication factor of 3:

Simplified basic cluster.png

This shows a simple deployment, similar to a small-scale non-clustered deployment, with some forwarders sending load-balanced data to a group of indexers (peer nodes), and the indexers sending search results to a search head. There are two additions that you don't find in a non-clustered deployment:

  • The indexers are streaming copies of their data to other indexers.
  • The master node, while it doesn't participate in any data streaming, coordinates a range of activities involving the search peers and the search head.

How to set up a cluster

Clusters are easy to set up. The process is similar to what you do to set up a group of stand-alone indexers. Basically, you install the indexers and perform a bit of configuration.

The main difference is that you also need to identify and enable the cluster nodes. You designate one indexer as the master node and the other indexers as peer nodes. You need at least as many peer nodes as the size of your replication factor. To increase indexing capacity for horizontal scaling, you just add more peer nodes.

You also need to set up a search head to manage searches across the peers and consolidate the results for the user.

You enable nodes and the search head in the same way that you configure any settings in Splunk: through Splunk Manager or the CLI, or directly, by editing configuration files.

See the chapter in this manual called "Deploy clusters" for detailed information.

How to search a cluster

You search a cluster the same way you would search any non-clustered group of indexers. You submit your searches through a search head.

What happens behind the scenes is a bit different, though. Once you've submitted your search, the search head consults the master node to determine which peer nodes have the data that's needed to process the search. The search head then distributes the search tasks directly to those nodes. The nodes do their part and send their results back to the search head, which then consolidates the results and sends the results back to Splunk Web. From the user's standpoint, it's no different than searching any stand-alone indexer or non-clustered group of indexers.

Before you go any further

Clusters are easy to set up and use, but you need to have a good grounding in the basics of Splunk indexing and deployment first. Before you continue, make sure you know this stuff:

  • How to configure indexers. In particular, see "How Splunk stores indexes", along with the other topics in this manual that describe managing indexes.
  • What a search head does. For an introduction to distributed search, see "About distributed search" in the Distributed Deployment manual.
  • How to use a forwarder to get data into an indexer. See "Use forwarders" in the Getting Data In manual.

Migrating from a non-clustered Splunk deployment?

Clustered indexers have several different requirements from non-clustered Splunk indexers. It's important that you be aware of these issues before you migrate your indexers. For details, see "Key differences between clustered and non-clustered Splunk deployments". Once you've read that material, go to "Migrate non-clustered indexers to a clustered environment" for details on the actual migration process.

PREVIOUS
Restore archived indexed data
  NEXT
Basic cluster architecture

This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17


Comments

sorry for the duplicate and the incomplete last line. Here is the last item with the missing part.<br />Only incoming data counts against the license; replicated data does not.<br />· You cannot use index replication with a free license.

Fg4
October 30, 2012

From the PDF:<br />All cluster members, including masters, peers, and search heads, need to<br />be in an Enterprise license pool, even if they're not expected to index any<br />data.<br />·<br />· Cluster members must share the same licensing configuration.<br />· Only incoming data counts against the license; replica

Fg4
October 30, 2012

Dustinudy - <br /><br />Please look here: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Systemrequirements#Licensing_information

Sgoodman, Splunker
October 30, 2012

How does this apply to licensing?

Dustinudy
October 30, 2012

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters