Splunk® Enterprise

Admin Manual

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Advanced indexing strategy

In a single-machine deployment consisting of just one Splunk instance, the Splunk indexer also handles data input and search requests. However, for mid-to-enterprise scale needs, indexing is typically split out from the data input function and sometimes from the search function as well. In these larger, distributed deployments, the Splunk indexer might reside on its own machine and handle only indexing.

For instance, you can have a set of Windows and Linux machines generating interesting events, which need to go to a central Splunk indexer for consolidation. Usually the best way to do this is to install a lightweight instance of Splunk, known as a forwarder, on each of the event-generating machines. These forwarders handle data input and send the data across the network to the Splunk indexer residing on its own machine.

Similarly, in cases where you have a large amount of indexed data and numerous concurrent users searching on it, it can make sense to split off the search function from indexing. In this type of scenario, known as distributed search, one or more search heads distribute search requests across multiple indexers.

To manage a distributed deployment, Splunk's deployment server lets you push out configurations and content to sets of Splunk instances, grouped according to any arbitrary criteria, such as OS, machine type, application area, location, and so on.

While the fundamental issues of indexing and event processing remain the same for distributed deployments, it is important to take into account deployment needs when planning your indexing strategy.

Forward data to an indexer

This type of deployment involves the use of forwarders, which are Splunk instances that receive data inputs and then consolidate and send the data to a Splunk indexer. Forwarders come in two flavors:

  • Universal forwarders. These maintain a small footprint on their host machine. They perform minimal processing on the incoming data streams before forwarding them on to an indexer, also known as the receiver.
  • Heavy forwarders. These retain most of the functionality of a full Splunk instance. They can parse data before forwarding it to the receiving indexer. (See "How data moves through Splunk" for the distinction between parsing and indexing.) They can also store indexed data locally, in addition to forwarding the parsed data to the receiver for final indexing on that machine as well.

Both types of forwarders tag data with metadata such as host, source, and source type, before forwarding it on to the indexer.

Note: There is also a third type of forwarder, the light forwarder. The light forwarder is essentially obsolete, having being replaced in release 4.2 by the universal forwarder, which provides similar functionality in a smaller footprint.

Forwarders allow you to use resources efficiently while processing large quantities or disparate types of data. They also enable a number of interesting deployment topologies, by offering capabilities for load balancing, data filtering, and routing.

For an extended discussion of forwarders, including configuration and detailed use cases, see "About forwarding and receiving".

Search across multiple indexers

In distributed search, Splunk servers send search requests to other Splunk servers and merge the results back to the user. This is useful for a number of purposes, including horizontal scaling, access control, and managing geo-dispersed data.

The Splunk instance that manages search requests is called the search head. The instances that maintain the indexes and perform the actual searching are called search peers or indexer nodes.

For an extended discussion of distributed search, including configuration and detailed use cases, see What is distributed search?.

Manage distributed deployments

When dealing with distributed deployments consisting potentially of multiple forwarders, indexers, and search heads, the Splunk deployment server greatly eases the process of configuring and updating all Splunk instances. With the deployment server, you can group the distributed Splunk instances (referred to as deployment clients in this context) into server classes.

A server class is a set of Splunk instances that share configurations. Server classes are typically grouped by OS, machine type, application area, location, or other useful criteria. A single deployment client can belong to multiple server classes, so a Linux forwarder residing in the UK, for example, might belong to a Linux server class and a UK server class, and receive configuration settings appropriate to each.

For an extended discussion of deployment management, see About deployment server.

Index time versus search time
About users and roles

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters