Use forwarders to get your data
Cluster peer nodes can get their data directly from any of the same sources as a non-clustered indexer. However, if data fidelity matters to you, you will use load-balancing forwarders to initially consume the data before forwarding it to the peer nodes, rather than ingesting the data directly into the nodes. The node that receives the data from the forwarder is called the receiver or the receiving node.
There are two key reasons for using forwarders, and particularly load-balancing forwarders, to send data to your cluster:
- To ensure that all incoming data gets indexed. By activating the forwarder's optional indexer acknowledgment feature, you can ensure that all incoming data gets indexed and stored on the cluster. It works like this: When a peer receives a block of data from a forwarder, it sends the forwarder an acknowledgment after it successfully indexes the data. If the forwarder does not receive an acknowledgment from the peer, it resends the data. The forwarder continues to resend the data until it gets the acknowledgment. Indexer acknowledgement is the only way to ensure end-to-end data fidelity. See "How indexer acknowledgment works" later in this topic.
- To handle potential node failure. With forwarder load balancing, if one receiving node in the load-balanced group goes down, the forwarder continues to send its data to the remaining peers in the group. Without load balancing, the forwarder has no way to continue sending data if its receiving node goes down. See "How load balancing works" later in this topic.
Important: Before continuing, you must be familiar with forwarders and how to use them to get data into Splunk. For an introduction to forwarders, read "About forwarding and receiving" in the Distributed Deployment Manual. Subsequent topics in that manual describe all aspects of deploying and configuring forwarders.
To use forwarders to get data into clusters, you must perform two types of configuration:
1. Configure the connection from forwarder to peer node.
2. Configure the forwarder's data inputs.
Note: This topic assumes you're using universal forwarders, but the steps are basically the same for light or heavy forwarders.
Configure the connection from forwarder to peer node
There are three steps to setting up connections between forwarders and peer nodes:
1. Configure the peer nodes to receive data from forwarders.
2. Configure the forwarders to send data to the peer nodes.
3. Enable indexer acknowledgment on each forwarder. This step is required to ensure end-to-end data fidelity. If that is not a requirement for your deployment, you can skip this step. However, you will receive warnings that you are not using indexer acknowledgment.
Once you're finished setting up the connection, you then need to configure the data inputs that control the data that streams into the forwarders (and onwards to the cluster). How to do this is the subject of a later section in this topic, "Configure the forwarder's data inputs".
1. Configure the peer nodes to receive data from forwarders
In order for a peer to receive data from forwarders, you must configure the peer's receiving port. For information on how to configure the receiving port, read "Enable a receiver" in the Distributed Deployment Manual.
Important: One of the ways you can specify the receiving port is by editing the peer's inputs.conf file in
$SPLUNK_HOME/etc/system/local/. For many clusters, you can simplify peer input configuration by deploying a single, identical
inputs.conf file across all the peers. In that case, the receiving port you specify in the common copy of
inputs.conf will supersede any ports you enable on each individual peer. For details on how to create and deploy a common
inputs.conf across all peers, read "Update common peer configurations".
2. Configure the forwarders to send data to the peer nodes
When you set up a forwarder, you specify its receiving peer by providing the peer's ip address and receving port number. For example:
10.10.10.1:9997. You do this in the forwarder's outputs.conf file, as described in "Configure forwarders with outputs.conf" in the Distributed Deployment Manual. To specify the receiving peer, set the
server attribute, like this:
The receiving port that you specify here is the port described in step 1.
To set up the forwarder to use load-balancing, so that the data goes to multiple peer nodes in sequence, you specify each receiving peer in the load-balanced group. For example, this attribute/value pair in
outputs.conf specifies a load-balanced group of three peers:
To learn more about configuring load balancing, read "Set up load balancing" in the Distributed Deployment Manual.
Note: There are several other ways that you can specify a forwarder's receiving peer(s). For example:
- You can specify the receiving peer during forwarder deployment (for Windows forwarders only), as described in "Deploy a Windows forwarder manually" in the Distributed Deployment Manual.
- You can specify the receiver with the CLI command
add forward-server, as described in "Deploy a *nix forwarder manually" in the Distributed Deployment Manual.
Both of these methods work by modifying the underlying
outputs.conf file. No matter what method you use to specify the receiving peers, you still need to directly edit the underlying
outputs.conf file to turn on indexer acknowledgment, as described in the next step.
3. Enable indexer acknowledgment on each forwarder
Note: This step is required to ensure end-to-end data fidelity. If that is not a requirement for your deployment, you can skip this step. However, you will receive warnings that you are not using indexer acknowledgment.
To ensure that the cluster receives and indexes all incoming data, you must turn on indexer acknowledgment on each forwarder. You configure this in
outputs.conf by setting the
useACK attribute to
You should also set the forwarder's
maxQueueSize to 7MB to ensure that the forwarder does not get blocked while waiting for acknowledgment from a peer node:
[tcpout] maxQueueSize = 7MB
For detailed information on configuring indexer acknowledgment and the
maxQueueSize setting, read "Protect against loss of in-flight data" in the Distributed Deployment Manual.
Example: A load-balancing forwarder with indexer acknowledgment
Here's a sample
outputs.conf configuration for a forwarder that's using load balancing to send data in sequence to three peers in a cluster. It assumes that each of the peers has previously been configured to use 9997 for its receiving port:
[tcpout] defaultGroup=my_LB_peers maxQueueSize = 7MB [tcpout:my_LB_peers] autoLBFrequency=40 server=10.10.10.1:9997,10.10.10.2:9997,10.10.10.3:9997 useACK=true
The forwarder starts by sending data to one of the peers listed for the
server attribute. After 40 seconds, it switches to another peer, and so on. If, at any time, it doesn't receive acknowledgment from the current receiving node, it resends the data, this time to the next available node.
Configure the forwarder's data inputs
Once you've specified the connection between the forwarder and the receiving peer(s), you must specify the data inputs to the forwarder, so that the forwarder has data to send to the cluster. You usually do this by editing the forwarder's
inputs.conf file. Read the Getting Data In manual, starting with "What Splunk can index" for detailed information on configuring data inputs. The topic in that manual entitled "Use forwarders" provides an introduction to specifying data inputs on forwarders.
How indexer acknowledgment works
In brief, indexer acknowledgment works like this: The forwarder sends data continuously to the receiving peer, in blocks of approximately 64kB. The forwarder maintains a copy of each block in memory until it gets an acknowledgment from the peer. While waiting, it continues to send more data blocks.
If all goes well, the receiving peer:
1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file system.
2. replicates copies of the raw data to its target peers.
3. sends an acknowledgment back to the forwarder.
The acknowledgment assures the forwarder that the data was successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.
Note: The receiving peer sends the acknowledgment to the forwarder after it attempts to replicate the data to its target peers. It sends the acknowledgment whether or not it was successful in replicating the data. What matters is that the receiving peer attempted to replicate the data. Once it has made that attempt, the cluster will be able to recover the data if there's a node failure of either the receiving peer or one of the target peers.
If there's a failure along the way, the forwarder does not receive the acknowledgment. It then automatically resends the block of data. If the forwarder is configured for load-balancing, it sends the block to another receiving node in the load-balanced group. If the forwarder is not set up for load-balancing, it attempts to resend data to the same node as before.
Important: To ensure end-to-end data fidelity, you must explicitly enable indexer acknowledgment on each forwarder that's sending data to the cluster. You configure this in the forwarder's
outputs.conf file, by setting the
useACK attribute to
true. If end-to-end data fidelity is not a requirement for your deployment, you can skip this step. However, you will receive warnings that you are not using indexer acknowledgment.
For more information on how indexer acknowledgment works, read "Protect against loss of in-flight data" in the Distributed Deployment Manual.
How load balancing works
In load balancing, the forwarder distributes incoming data across several receiving peer nodes. Each node gets a portion of the total data, and together the receiving nodes get all the data.
Splunk forwarders perform "automatic load balancing". The forwarder routes data to different nodes based on a specified time interval. For example, assume you have a load-balanced group consisting of three peer nodes: A, B, and C. At some specified interval, such as every 30 seconds, the forwarder switches the data stream to another node in the group, selected at random. So, the forwarder might switch from node B to node A to node C, and so on. If one node is down, the forwarder immediately switches to another.
Note: To expand on this a bit, there is a data stream for each of the inputs that the forwarder is configured to monitor. The forwarder determines if it is safe for a data stream to switch to another node. Then, at the specified interval, it switches the data stream to the newly selected node. If it cannot switch the data stream to the new node safely, it keeps the connection to the previous node open and continues to send the data stream until it has been safely sent.
Load balancing is of key importance in a clustered deployment because it helps ensure that you don't lose any data in case of node failure. If a forwarder does not receive indexer acknowledgment from the node it is sending data to, it resends the data to the next available node in the load-balanced group.
For more information on forwarder load balancing, read "Set up load balancing" in the Distributed Deployment Manual. For information on how load balancing works with indexer acknowledgment, read "Protect against loss of in-flight data" in the Distributed Deployment Manual.
Configure inputs directly on the peers
If you decide not to use forwarders to handle your data inputs, you can set up inputs on each peer in the usual fashion; for example, by editing
inputs.conf. For information on configuring inputs, read "Configure your inputs" in the Getting Data In Manual.
Prepare the peers for index replication
Use clusters to scale indexing
This documentation applies to the following versions of Splunk® Enterprise: 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18