Use forwarders to get your data into the indexer cluster
Cluster peer nodes can get their data directly from any of the same sources as a non-clustered indexer. However, if data fidelity matters to you, you will use forwarders to initially consume the data before forwarding it to the peer nodes, rather than ingesting the data directly into the nodes. The node that receives the data from the forwarder is called the receiver or the receiving node.
There are two key reasons for using forwarders to send data to your cluster:
- To ensure that all incoming data gets indexed. By activating the forwarder's optional indexer acknowledgment feature, you can ensure that all incoming data gets indexed and stored on the cluster. It works like this: When a peer receives a block of data from a forwarder, it sends the forwarder an acknowledgment after it successfully indexes the data. If the forwarder does not receive an acknowledgment from the peer, it resends the data. The forwarder continues to resend the data until it gets the acknowledgment. Indexer acknowledgment is the only way to ensure end-to-end data fidelity. See "How indexer acknowledgment works" later in this topic.
- To handle potential node failure. With forwarder load balancing, if one receiving node in the load-balanced group goes down, the forwarder continues to send its data to the remaining peers in the group. Without load balancing, the forwarder has no way to continue sending data if its receiving node goes down. See "How load balancing works" later in this topic.
Important: Before continuing, you must be familiar with forwarders and how to use them to get data into Splunk. For an introduction to forwarders, read "About forwarding and receiving" in the Forwarding Data manual. Subsequent topics in that manual describe all aspects of deploying and configuring forwarders.
To use forwarders to get data into clusters, you must perform two types of configuration:
- Configure the connection from forwarder to peer node.
- Configure the forwarder's data inputs.
Note: This topic assumes you are deploying universal forwarders, but the steps are basically the same for light or heavy forwarders.
Configure the connection from forwarder to peer node
There are three steps to setting up connections between forwarders and peer nodes:
1. Configure the peer nodes to receive data from forwarders.
2. Configure the forwarders to send data to the peer nodes.
3. Enable indexer acknowledgment for each forwarder. This step is required to ensure end-to-end data fidelity. If that is not a requirement for your deployment, you can skip this step.
Once you are finished setting up the connection, you need to configure the data inputs that control the data that streams into the forwarders (and onwards to the cluster). How to do this is the subject of a later section in this topic, "Configure the forwarder's data inputs".
1. Configure the peer nodes to receive data from forwarders
In order for a peer to receive data from forwarders, you must configure the peer's receiving port. For information on how to configure the receiving port, read "Enable a receiver" in the Forwarding Data manual.
Important: One of the ways you can specify the receiving port is by editing the peer's inputs.conf file in
$SPLUNK_HOME/etc/system/local/. For many clusters, you can simplify peer input configuration by deploying a single, identical
inputs.conf file across all the peers. In that case, the receiving port you specify in the common copy of
inputs.conf will supersede any ports you enable on each individual peer. For details on how to create and deploy a common
inputs.conf across all peers, read "Update common peer configurations".
2. Configure the forwarders to send data to the peer nodes
When you set up a forwarder, you specify its receiving peer by providing the peer's IP address and receving port number. For example:
10.10.10.1:9997. You do this in the forwarder's outputs.conf file, as described in "Configure forwarders with outputs.conf" in the Forwarding Data manual. To specify the receiving peer, set the
server attribute, like this:
The receiving port that you specify here is the port configured in step 1.
To set up the forwarder to use load-balancing, so that the data goes to multiple peer nodes in sequence, you configure a load-balanced group of receiving peers. For example, this attribute/value pair in
outputs.conf specifies a load-balanced group of three peers:
To learn more about configuring load balancing, read "Set up load balancing" in the Forwarding Data manual.
Note: There are several other ways that you can specify a forwarder's receiving peer(s). For example:
- You can specify the receiving peer during forwarder deployment (for Windows forwarders only), as described in "Deploy a Windows forwarder manually" in the Forwarding Data manual.
- You can specify the receiver with the CLI command
add forward-server, as described in "Deploy a *nix forwarder manually" in the Forwarding Data manual.
Both of these methods work by modifying the underlying
outputs.conf file. No matter what method you use to specify the receiving peers, you still need to directly edit the underlying
outputs.conf file if you want to turn on indexer acknowledgment, as described in the next step.
3. Enable indexer acknowledgment for each forwarder
Note: This step is required to ensure end-to-end data fidelity. If that is not a requirement for your deployment, you can skip this step.
To ensure that the cluster receives and indexes all incoming data, you must turn on indexer acknowledgment for each forwarder.
To configure indexer acknowledgment, set the
useACK attribute in each forwarder's
For detailed information on configuring indexer acknowledgment, read "Protect against loss of in-flight data" in the Forwarding Data manual.
Important: For indexer acknowledgment to work properly, the forwarders' wait queues must be configured to the optimal size. For forwarders at version 5.0.4 or above, the system handles this automatically. For earlier version forwarders, follow the instructions in the version of the "Protect against loss of in-flight data" topic for that forwarder version. Specifically, read the subtopic on adjusting the
Example: A load-balancing forwarder with indexer acknowledgment
Here's a sample
outputs.conf configuration for a forwarder that's using load balancing to send data in sequence to three peers in a cluster. It assumes that each of the peers has previously been configured to use 9997 for its receiving port:
[tcpout] defaultGroup=my_LB_peers [tcpout:my_LB_peers] autoLBFrequency=40 server=10.10.10.1:9997,10.10.10.2:9997,10.10.10.3:9997 useACK=true
The forwarder starts by sending data to one of the peers listed for the
server attribute. After 40 seconds, it switches to another peer, and so on. If, at any time, it doesn't receive acknowledgment from the current receiving node, it resends the data, this time to the next available node.
Configure the forwarder's data inputs
Once you've specified the connection between the forwarder and the receiving peer(s), you must specify the data inputs to the forwarder, so that the forwarder has data to send to the cluster. You usually do this by editing the forwarder's
inputs.conf file. Read the Getting Data In manual, starting with "What Splunk can index" for detailed information on configuring data inputs. The topic in that manual entitled "Use forwarders" provides an introduction to specifying data inputs on forwarders.
How indexer acknowledgment works
In brief, indexer acknowledgment works like this: The forwarder sends data continuously to the receiving peer, in blocks of approximately 64kB. The forwarder maintains a copy of each block in memory until it gets an acknowledgment from the peer. While waiting, it continues to send more data blocks.
If all goes well, the receiving peer:
1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file system.
2. streams copies of the raw data to each of its target peers.
3. sends an acknowledgment back to the forwarder.
The acknowledgment assures the forwarder that the data was successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.
If the forwarder does not receive the acknowledgment, that means there was a failure along the way. Either the receiving peer went down or that peer was unable to contact its set of target peers. The forwarder then automatically resends the block of data. If the forwarder is using load-balancing, it sends the block to another receiving node in the load-balanced group. If the forwarder is not set up for load-balancing, it attempts to resend data to the same node as before.
Important: To ensure end-to-end data fidelity, you must explicitly enable indexer acknowledgment for each forwarder that's sending data to the cluster, as described earlier in this topic. If end-to-end data fidelity is not a requirement for your deployment, you can skip this step.
For more information on how indexer acknowledgment works, read "Protect against loss of in-flight data" in the Forwarding Data manual.
How load balancing works
In load balancing, the forwarder distributes incoming data across several receiving peer nodes. Each node gets a portion of the total data, and together the receiving nodes get all the data.
Splunk forwarders perform "automatic load balancing". The forwarder routes data to different nodes based on a specified time interval. For example, assume you have a load-balanced group consisting of three peer nodes: A, B, and C. At some specified interval, such as every 30 seconds, the forwarder switches the data stream to another node in the group, selected at random. So, the forwarder might switch from node B to node A to node C, and so on. If one node is down, the forwarder immediately switches to another.
Note: To expand on this a bit, each of the forwarder's inputs has its own data stream. At the specified interval, the forwarder switches the data stream to the newly selected node, if it's safe to do so. If it cannot safely switch the data stream to the new node, it keeps the connection to the previous node open and continues to send the data stream to that node until it has been safely sent.
Load balancing, in conjunction with indexer acknowledgment, is of key importance in a clustered deployment because it helps ensure that you don't lose any data in case of node failure. If a forwarder does not receive indexer acknowledgment from the node it is sending data to, it resends the data to the next available node in the load-balanced group.
For more information on forwarder load balancing, read "Set up load balancing" in the Forwarding Data manual. For information on how load balancing works with indexer acknowledgment, read "Protect against loss of in-flight data" in the Forwarding Data manual.
Configure inputs directly on the peers
If you decide not to use forwarders to handle your data inputs, you can set up inputs on each peer in the usual fashion; for example, by editing
inputs.conf. For information on configuring inputs, read "Configure your inputs" in the Getting Data In Manual.
Prepare the peers for index replication
Use indexer clusters to scale indexing
This documentation applies to the following versions of Splunk® Enterprise: 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15