This topic describes the main steps to deploying clusters. Subsequent topics describe these steps in detail.
Before you attempt to deploy a cluster, you must be familiar with several areas of Splunk administration:
- How to configure indexers. In particular, see "How Splunk stores indexes", along with the other topics in this manual that describe managing indexes.
- What a search head does. For an introduction to distributed search, see "About distributed search" in the Distributed Deployment Manual. Note, however, that cluster search heads are configured a bit differently from how they're described in that topic. Those differences are outlined later in this manual, in the topic "Configure the search head" .
- How to use a forwarder to get data into an indexer. See "Use forwarders" in the Getting Data In Manual.
Migrating from a non-clustered Splunk deployment?
Clustered indexers have several different requirements from non-clustered Splunk indexers. It's important that you be aware of these issues before you migrate your indexers. For details, see "Key differences between clustered and non-clustered Splunk deployments". Once you've read that material, go to "Migrate non-clustered indexers to a clustered environment" for details on the actual migration process.
Important: Before migrating an indexer from non-clustered to clustered, be certain of your needs. The process goes in one direction only. There is no supported procedure for converting an indexer from clustered to non-clustered.
Deploy a cluster
When you deploy a cluster, you enable and configure the cluster master and the peer nodes that perform the indexing. You also enable a search head to search data in the cluster. In addition, you usually set up forwarders to send data to nodes in the cluster. Here's a diagram of a small cluster, showing the various components that you deploy:
These are the key steps in deploying clusters:
1. Identify your requirements:
a. Understand your data availability and failover needs. See "About clusters".
b. Decide what replication factor you want to implement. The replication factor is the number of copies of raw data that the cluster maintains. Your optimal replication factor depends on factors specific to your environment, but essentially involves a trade-off between failure tolerance and storage capacity. A higher replication factor means that more copies of the data will reside on more peer nodes, so your cluster can tolerate more node failures without loss of data availability. But it also means that you'll need more nodes and more storage to handle the additional data. For more information, see "Replication factor".
Warning: Make sure you start by choosing the right replication factor for your needs. It is inadvisable to increase the replication factor once the cluster contains a significant amount of data. The cluster would then need to perform a large amount of bucket copying to match the increased replication factor, slowing significantly the overall performance of your cluster while the copying is occurring.
c. Decide what search factor you want to implement. The search factor tells the cluster how many searchable copies of indexed data to maintain. This helps determine the speed with which a cluster can recover from a downed node. A higher search factor allows the cluster to recover more quickly, but it also requires more storage space and processing power. For most environments, the default search factor value of 2 represents the right trade-off, allowing searches to usually continue with little interruption when a node goes down. For more information, see "Search factor".
Warning: Make sure you start by choosing the right search factor for your needs. It is inadvisable to increase the search factor once the cluster contains a significant amount of data. The cluster would then need to perform a large amount of processing (transforming non-searchable bucket copies into searchable copies) to match the increased search factor, and this will have an extremely adverse effect on the overall performance of your cluster while the processing is occurring.
d. Identify other factors that also determine the size of your cluster; for example, the quantity of data you'll be indexing. It usually makes sense to keep all your indexers in a single cluster, so for horizontal scaling, you'll need to add peer nodes beyond those required by the replication factor. Similarly, depending on the anticipated search load, you might need to configure more than one search head.
e. Study the topic "System requirements and other deployment considerations" for information on other key issues.
2. Install the Splunk cluster instances on your network. At a minimum, you'll need (replication factor + 2) Splunk instances:
- You need at least the replication factor number of peer nodes, but you might want to add more peers to boost indexing capacity, as mentioned in step 1d.
- You also need two more Splunk instances, one for the master node and the other for the search head.
For information on how to install Splunk, read the Installation Manual.
3. Enable clustering on the Splunk instances:
a. Enable the master node. See "Enable the master node".
Important: When the master starts up for the first time, it will block indexing on the peers until you have enabled and restarted the replication factor number of peers.
b. Enable the peer nodes. See "Enable the peer nodes".
c. Enable the cluster search head. It's easier to set up a search head for a cluster than for a non-clustered group of indexers. See "Enable the search head".
4. Complete the peer node configuration:
a. Configure the peers' index settings. This step is necessary only if you need to augment the set of default indexes or apps. In general, all the peers must use the same set of indexes, so if you add indexes (or apps that define indexes) to one peer, you must add them to all peers, using a special cluster-specific distribution method. There might also be other configurations that you need to coordinate across the set of peers. See "Prepare the peers for index replication" for information on how to do this.
b. Configure the peers' data inputs. For most purposes, it's best to use forwarders to send data to the peers, as discussed in "Use forwarders to get your data". As described in that topic, you will usually want to use load-balancing forwarders with indexer acknowledgment enabled.
Once you enable the nodes and set up data inputs for the peers, the cluster automatically begins indexing and replicating the data.
Other deployment scenarios
This chapter also provides guidance on a few other cluster deployment scenarios:
- Add indexers with existing data to a cluster. See "Migrate non-clustered indexers to a clustered environment".
- Employ clusters purely for index scalability, where index replication is not a requirement. See "Use clusters to scale indexing".
Basic cluster architecture
READ THIS FIRST: Key differences between clustered and non-clustered Splunk deployments
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18