Scale your deployment: Splunk components
To accommodate your deployment topology and performance requirements, you can allocate the different Splunk roles, such as data input and indexing, to separate Splunk instances. For example, you can have instances that just gather data inputs, which they then forward to another, central instance for indexing. Or you can distribute indexing across several instances that coordinate with a separate instance that processes all search requests. To facilitate the distribution of roles, Splunk can be configured into a range of separate component types, each mapping to one or more of the roles. You create most components by enabling or disabling specific functions of the full Splunk instance.
These are the Splunk component types available for use in a distributed environment:
All components are variations of the full Splunk instance, with certain features either enabled or disabled, except for the universal forwarder, which is its own executable.
The indexer is the Splunk component that creates and manages indexes. The primary functions of an indexer are:
- Indexing incoming data.
- Searching the indexed data.
For larger-scale needs, indexing is split out from the data input function and sometimes from the search management function as well. In these larger, distributed deployments, the Splunk indexer might reside on its own machine and handle only indexing (usually along with parsing), along with searching of its indexed data. In those cases, other Splunk components take over the non-indexing/searching roles. Forwarders consume the data, indexers index and search the data, and search heads coordinate searches across the set of indexers.
For information on indexers, see the Managing Indexers and Clusters manual, starting with the topic "About indexes and indexers".
One role that's typically split off from the indexer is the data input function. For instance, you might have a group of Windows and Linux machines generating data that needs to go to a central Splunk indexer for consolidation. Usually the best way to do this is to install a lightweight instance of Splunk, known as a forwarder, on each of the data-generating machines. These forwarders manage the data input and send the resulting data streams across the network to a Splunk indexer, which resides on its own machine. There are two types of forwarders:
- Universal forwarders. These have a very light footprint and forward only unparsed data.
- Heavy forwarders. These have a larger footprint but can parse, and even index, data before forwarding it.
Note: There is also a third type of forwarder, the light forwarder. The light forwarder is essentially obsolete, having being replaced in release 4.2 by the universal forwarder, which provides similar functionality in a smaller footprint.
For information on forwarders, start with the topic "About forwarding and receiving".
In situations where you have a large amount of indexed data and numerous users concurrently searching on it, it can make sense to distribute the indexing load across several indexers, while offloading the search query function to a separate machine. In this type of scenario, known as distributed search, one or more Splunk components called search heads distribute search requests across multiple indexers.
For information on search heads, see "About distributed search".
To update a distributed deployment, you can use Splunk's deployment server. The deployment server lets you push out configurations and content to sets of Splunk instances (referred to, in this context, as deployment clients), grouped according to any useful criteria, such as OS, machine type, application area, location, and so on. The deployment clients are usually forwarders or indexers. For example, once you've made and tested an updated configuration on a local Linux forwarder, you can push the changes to all the Linux forwarders in your deployment.
The deployment server can cohabit a Splunk instance with another Splunk component, either a search head or an indexer, if your deployment is small (less than around 30 deployment clients). It should run on its own Splunk instance in larger deployments. For more information, see this tech note on the Community Wiki.
For detailed information on the deployment server, see "About deployment server".
Although it's actually an app, not a Splunk component, the deployment monitor has an important role to play in distributed environments. Distributed deployments can scale to forwarders numbering into the thousands, sending data to many indexers, which feed multiple search heads. To view and troubleshoot these distributed deployments, you can use the deployment monitor, which provides numerous views into the state of your forwarders and indexers.
For detailed information on the deployment monitor, read the Deploy and Use Splunk Deployment Monitor App manual.
Where to go next
While the fundamental issues of indexing and event processing remain the same no matter what the size or nature of your distributed deployment, it is important to take into account deployment needs when planning your indexing strategy. To do that effectively, you must also understand how components map to Splunk roles.
For information on hardware requirements for scaling your deployment, see "Hardware capacity planning for your Splunk deployment".
How data moves through Splunk Enterprise: the data pipeline
Components and roles
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18