Distributed Splunk Enterprise overview
This manual describes how to distribute various components of Splunk Enterprise functionality across multiple machines. By distributing Splunk Enterprise, you can scale its functionality to handle the data needs for enterprises of any size and complexity.
In single-machine deployments, one instance of Splunk Enterprise handles the entire end-to-end process, from data input through indexing to search. A single-machine deployment can be useful for testing and evaluation purposes and might serve the needs of department-sized environments. For larger environments, however, where data originates on many machines and where many users need to search the data, you'll want to distribute functionality across multiple Splunk Enterprise instances. This manual describes how to deploy and use Splunk Enterprise in such a distributed environment.
How Splunk Enterprise scales
Splunk Enterprise performs three key functions as it moves data through the data pipeline. First, it consumes data from files, the network, or elsewhere. Then it indexes the data. (Actually, it first parses and then indexes the data, but for purposes of this discussion, we consider parsing to be part of the indexing process.) Finally, it runs interactive or scheduled searches on the indexed data.
You can split this functionality across multiple specialized instances of Splunk Enterprise, ranging in number from just a few to thousands, depending on the quantity of data you're dealing with and other variables in your environment. You might, for example, create a deployment with many instances that only consume data, several other instances that index the data, and one or more instances that handle search requests. These specialized instances are known collectively as components. There are several types of components.
For a typical mid-size deployment, for example, you can deploy lightweight versions of Splunk Enterprise, called forwarders, on the machines where the data originates. The forwarders consume data locally and then forward the data across the network to another Splunk Enterprise component, called the indexer. The indexer does the heavy lifting; it indexes the data and runs searches. It should reside on a machine by itself. The forwarders, on the other hand, can easily co-exist on the machines generating the data, because the data-consuming function has minimal impact on machine performance. This diagram shows several forwarders sending data to a single indexer:
As you scale up, you can add more forwarders and indexers. For a larger deployment, you might have hundreds of forwarders sending data to a number of indexers. You can use load balancing on the forwarders, so that they distribute their data across some or all of the indexers. Not only does load balancing help with scaling, but it also provides a fail-over capability if one of the indexers goes down. The forwarders automatically switch to sending their data to any indexers that remain alive. In this diagram, each forwarder load-balances its data across two indexers:
To coordinate and consolidate search activities across multiple indexers, you can also separate out the functions of indexing and searching. In this type of deployment, called distributed search, each indexer just indexes data and performs searches across its own indexes. A Splunk Enterprise instance dedicated to search management, called the search head, coordinates searches across the set of indexers, consolidating the results and presenting them to the user:
For larger environments, you can deploy a search head cluster, consisting of several search heads sharing configurations, job scheduling, and search artifacts. Here is a diagram of a small search head cluster, with three search heads:
These diagrams illustrate a few basic deployment topologies. You can actually combine the functions of data input, indexing, and search in a great variety of ways. For example, you can set up the forwarders so that they route data to multiple indexers, based on specified criteria. You can also configure forwarders to process data locally before sending the data on to an indexer for storage. In another scenario, you can deploy a single instance that serves as both search head and indexer, searching across not only its own indexes but the indexes on other indexers as well. You can mix-and-match Splunk Enterprise components as needed. The possible scenarios are nearly limitless.
This manual describes how to scale a deployment to fit your exact needs, whether you're managing data for a single department or for a global enterprise... or for anything in between.
Use indexer clusters for data availability
Indexer clusters are groups of Splunk Enterprise indexers configured to replicate each others' data, so that the system keeps multiple copies of all data. This process is known as index replication. By maintaining multiple, identical copies of Splunk Enterprise data, clusters prevent data loss while promoting data availability for searching.
Splunk Enterprise clusters feature automatic failover from one indexer to the next. This means that, if one or more indexers fail, incoming data continues to get indexed and indexed data continues to be searchable.
Besides enhancing data availability, clusters have other key features that you should consider when you're scaling a deployment. For example, they include a capability to coordinate configuration updates easily across all indexers in the cluster. They also include a built-in distributed search capability. For more information on indexer clusters, see "About clusters and index replication" in the Managing Indexers and Clusters of Indexers manual.
Manage your Splunk Enterprise deployment
Splunk Enterprise provides a few key tools to help manage a distributed deployment:
- Deployment server. This component provides a way to centrally manage configurations and content updates across your entire deployment. See "About deployment server" in the Updating Splunk Enterprise Instances manual.
- Distributed management console. This feature can help you manage and troubleshoot your deployment. Read "Configure the distributed management console" in the Admin Manual.
What comes next
The rest of this Overview section covers:
- How data moves through Splunk Enterprise: the data pipeline
- Scale your deployment: Splunk Enterprise components
- Components and roles
It starts by describing the data pipeline, from the point that the data enters Splunk Enterprise to when it becomes available for users to search on. Next, the overview describes how Splunk Enterprise functionality can be split into modular components. It then correlates the available Splunk Enterprise components with their roles in facilitating the data pipeline.
The remaining sections of this manual describe the Splunk Enterprise components in detail, explaining how to use them to create a distributed Splunk Enterprise deployment.
For information on capacity planning based on the scale of your deployment, read the new Capacity Planning manual.
How data moves through Splunk Enterprise: the data pipeline
This documentation applies to the following versions of Splunk® Enterprise: 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15