High availability reference architecture
Splunk provides the flexibility and capability to handle machine data for any type of computing environment, including the most stringent needs of medium and large enterprises. In some environments, maintaining data integrity and high availability can be of critical importance.
How you define high availability, and the approach you take to implement it, will vary greatly according to the needs of your particular business and the state of your existing system. This topic will help you make the right decisions about how best to deploy Splunk to promote a highly available, highly reliable system. It does not attempt to dictate any single approach to high availability. Rather, it offers a starting point for planning an approach that suits your enterprise.
As part of planning a highly available Splunk deployment, you must also take into account all aspects of your existing system - not only its components and topology, but also its overall reliability and availability. The specifics of your current system will determine how you integrate Splunk into it.
Before reading this topic, you should already be familiar with Splunk deployments and components, as described in "Distributed Splunk overview".
Note: This topic is intended for planning purposes only. It is not meant to serve as a detailed implementation guide. If you want to implement a high availability Splunk deployment, contact Splunk Professional Services for guidance.
The elements of a high availability architecture
Splunk collects data and it queries data. To implement end-to-end Splunk availability, you need to consider both functions.
If you are using Splunk forwarders in a load-balanced configuration, in which you send data alternately to multiple Splunk indexers in a group, then you already have high availability on the data collection side of Splunk. If one indexer goes down, the forwarder will automatically start sending the data to the other indexers in the load-balanced group.
To provide high availability for Splunk's data querying capability, you must maintain availability for:
- The indexer(s)
- The indexed data
The rest of this topic describes ways to maintain high availability for querying Splunk data.
Depending on your requirements, you might also need to consider availability of other components of your Splunk deployment, such as search heads and forwarders. You must also provide high availability for non-Splunk (but Splunk-dependent) aspects of your system, such as your data sources, hardware, and network.
There are two basic choices for implementing high availability for Splunk indexers and data:
- Use a highly reliable storage system
- Use a mirrored cluster of Splunk indexers
High reliability storage
There are a number of ways that you can use an underlying storage system to promote high availability for Splunk. The exact architecture you implement will depend on your existing environment and specific needs.
For example, in a typical SAN-based architecture, you could install your Splunk indexers directly on the SAN and then mount the Splunk volumes on server nodes. If a node goes down, you can remount its volume on another node. The new node takes on the identity of the failed node, with the same configurations and access to the same set of indexed data. You just need to point your search head at that node in place of the old one. You can further configure your SAN to attain the level of redundancy you require.
Data replication across indexer clusters
Another way to achieve high availability of both the indexed data and the indexing/searching capabilities is to create primary and secondary clusters of mirrored indexers. If an indexer in the primary cluster fails, you can reconfigure forwarders and search heads to point to its mirror on the secondary cluster.
Here's an example of this strategy. It starts by showing two forwarders using load balancing to distribute data to the indexers in the primary cluster:
The primary indexers index the data locally and also forward the raw (unindexed) data onwards to secondary indexers, which then index the data a second time:
You now have copies of the indexed data in two places. Each indexer in the secondary cluster contains an exact copy of the data on its corresponding indexer in the primary cluster. You can search against either the primary or the secondary cluster:
If one of the indexers in the primary cluster goes down, the forwarders' load-balancing capability means that they will automatically start sending all their data to the remaining indexer(s) in that cluster. Those indexers will continue to send copies of their data on to their mirrored instances in the secondary cluster:
You can continue to search across the full set of data. You just redirect the search head(s) to point to the secondary instance of the downed indexer. At the same time, the search head can continue to point to the remaining indexer in the primary cluster. Alternatively, you can redirect the search head to point exclusively to the secondary indexers. In either case, searching continues across the entire set of data:
There are many ways you can implement specific aspects of this architecture. For guidance, talk with Splunk Professional Services.
This second solution has the advantage that it depends less on the capabilities of your underlying storage system. On the downside, it requires double the hardware (since you're doubling the indexers), as well as a license for twice the indexing volume (since you're indexing everything twice).
Hardware capacity planning for your Splunk deployment
Estimate your storage requirements
This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7