Protect against loss of in-flight data
- Why no acknowledgment?
- How the forwarder deals with failure
- Other reasons the forwarder might close a connection
- The possibility of duplicates
- Configure the wait queue size
Protect against loss of in-flight data
To guard against loss of data when forwarding to an indexer, you can use Splunk's indexer acknowledgment capability. With indexer acknowledgment, the forwarder will resend any data not acknowledged as "received" by the indexer.
This feature is disabled by default, because it can affect performance. You enable it in
outputs.conf, as described later.
Indexer acknowledgment is available for all varieties of forwarders: universal, light, or heavy.
Note: Both forwarders and indexers must be at version 4.2 or higher for acknowledgment to function. Otherwise, the transmission between forwarder and indexer will proceed without acknowledgment.
Indexer acknowledgment and clusters
When using forwarders to send data to peer nodes in a cluster, you should ordinarily enable indexer acknowledgment. To learn more about forwarders and clusters, read "Use forwarders to get your data" in the Managing Indexers and Clusters Manual.
How indexer acknowledgment works when everything goes well
The forwarder sends data continuously to the indexer, in blocks of approximately 64kB. The forwarder maintains a copy of each block in memory, in its wait queue, until it gets an acknowledgment from the indexer. While waiting, it continues to send more data blocks.
If all goes well, the indexer:
1. Receives the block of data.
2. Parses the data.
3. Writes the data to the file system as events (raw data and index data).
4. Sends an acknowledgment to the forwarder.
The acknowledgment tells the forwarder that the indexer received the data and successfully wrote it to the file system. Upon receiving the acknowledgment, the forwarder releases the block from memory.
If the wait queue is of sufficient size, it doesn't fill up while waiting for acknowledgments to arrive. But see this section for possible issues and ways to address them, including how to increase the wait queue size.
How indexer acknowledgment works when there's a failure
When there's a failure in the round-trip process, the forwarder does not receive an acknowledgment. It will then attempt to resend the block of data.
Why no acknowledgment?
These are the reasons that a forwarder might not receive acknowledgment:
- Indexer goes down after receiving the data -- for instance, due to machine failure.
- Indexer is unable to write to the file system -- for instance, because the disk is full.
- Network goes down while acknowledgment is en route to the forwarder.
How the forwarder deals with failure
After sending a data block, the forwarder maintains a copy of the data in its wait queue until it receives an acknowledgment. In the meantime, it continues to send additional blocks as usual. If the forwarder doesn't get acknowledgment for a block within 300 seconds (by default), it closes the connection. You can change the wait time by setting the
readTimeout attribute in
If the forwarder is set up for auto load balancing, it then opens a connection to the next indexer in the group (if one is available) and sends the data to it. If the forwarder is not set up for auto load balancing, it attempts to open a connection to the same indexer as before and resend the data.
The forwarder maintains the data block in the wait queue until acknowledgment is received. Once the wait queue fills up, the forwarder stops sending additional blocks until it receives an acknowledgment for one of the blocks, at which point it can free up space in the queue.
Other reasons the forwarder might close a connection
There are actually three conditions that can cause the forwarder to close the network connection:
- Read timeout. The forwarder doesn't receive acknowledgment within 300 (default) seconds. This is the condition described above.
- Write timeout. The forwarder is not able to finish a network write within 300 (default) seconds. The value is configurable in
- Read/write failure. Typical causes include the indexer's machine crashing or the network going down.
In all these cases, the forwarder will then attempt to open a connection to the next indexer in the load-balanced group, or to the same indexer again if load-balancing is not enabled.
The possibility of duplicates
It's possible for the indexer to index the same data block twice. This can happen if there's a network problem that prevents an acknowledgment from reaching the forwarder. For instance, assume the indexer receives a data block, parses it, and writes it to the file system. It then generates the acknowledgment. However, on the round-trip to the forwarder, the network goes down, so the forwarder never receives the acknowledgment. When the network comes back up, the forwarder then resends the data block, which the indexer will parse and write as if it were new data.
To deal with such a possibility, every time the forwarder resends a data block, it writes an event to its
splunkd.log noting that it's a possible duplicate. The admin is responsible for using the log information to track down the duplicate data on the indexer.
Here's an example of a duplicate warning:
10-18-2010 17:32:36.941 WARN TcpOutputProc - Possible duplication of events with channel=source::/home/jkerai/splunk/current-install/etc/apps/sample_app /logs/maillog.1|host::MrT|sendmail|, streamId=5941229245963076846, offset=131072 subOffset=219 on host=10.1.42.2:9992
Enable indexer acknowledgment
You enable indexer acknowledgment solely on the forwarder. You do not set any attribute on the indexer side; it will send acknowledgments if the forwarder tells it to. (But remember, both the forwarder and the indexer must be at version 4.2 or greater.)
To enable indexer acknowledgment, set the
useACK attribute to
true in the forwarder's
[tcpout:<target_group>] server=<server1>, <server2>, ... useACK=true ...
A value of
useACK=true enables indexer acknowledgment.
By default, this feature is disabled:
Note: You can set
useACK either globally or by target group, at the
[tcpout:<target_group>] stanza levels. You cannot set it for individual servers at the
[tcpout-server: ...] stanza level.
Indexer acknowledgment influence on forwarded data throughput
Indexer acknowledgment can limit and/or reduce forwarder throughput in some scenarios. Here, we describe how this can occur and steps that you should take to avoid this.
If you have enabled indexer acknowledgment on the forwarder through
useACK and the receiving indexer is at version 4.2+, the forwarder will use a wait queue to manage the acknowledgment process. Otherwise, it won't have a wait queue. This section describes how to manage the wait queue for performance.
Because the forwarder sends data blocks continuously and does not wait for acknowledgment before sending the next block, its wait queue will typically maintain many blocks, each waiting for its acknowledgment. The forwarder will continue to send blocks until its wait queue is full, at which point it will stop forwarding. The forwarder then waits until it receives an acknowledgment, which allows it to release a block from its queue and thus resume forwarding.
A wait queue can fill up when something is wrong with the network or indexer; however, it can also fill up even though the indexer is functioning normally. This is because the indexer only sends the acknowledgment after it has written the data to the file system. Any delay in writing to the file system will slow the pace of acknowledgment, leading to a full wait queue.
There are a few reasons that a normal functioning indexer might delay writing data to the file system (and so delay its sending of acknowledgments):
- The indexer is very busy. For example, at the time the data arrives, the indexer might be dealing with multiple search requests or with data coming from a large number of forwarders.
- The indexer is receiving too little data. For efficiency, an indexer only writes to the file system periodically -- either when a write queue fills up or after a timeout of a few seconds. If a write queue is slow to fill up, the indexer will wait until the timeout to write. If data is coming from only a few forwarders, the indexer can end up in the timeout condition, even if each of those forwarders is sending a normal quantity of data. Since write queues exist on a per hot bucket basis, the condition occurs when some particular bucket is getting a small amount of data. Usually this means that a particular index is getting a small amount of data.
To ensure that throughput does not degrade because the forwarder is waiting on the indexer for acknowledgment, you might need to use a larger wait queue size, ensuring it has sufficient space to maintain all blocks in memory while waiting for acknowledgments to arrive. You'll need to experiment with the queue size that's right for your forwarder's specific environment. On the other hand, if you have many forwarders feeding a single indexer, and a moderate number of data sources per forwarder, you may be able to conserve a few megabytes of memory by using a smaller size.
Note: You cannot configure the size of the wait queue directly. Its size is always relative to the size of the in-memory output queue, as described below.
Configure the wait queue size
Important: To optimize the wait queue size for indexer acknowledgment, Splunk recommends that you increase the
The maximum wait queue size is 3x the size of the in-memory output queue, which you set with the
maxQueueSize attribute in
maxQueueSize = [<integer>|<integer>[KB|MB|GB]]
For example, if you set
maxQueueSize to 7MB, the maximum wait queue size will be 21MB.
Note the following:
- This attribute sets the maximum size of the forwarder's in-memory (RAM) output queue. It also determines the maximum size of the wait queue, which is 3x the setting for the output queue.
- If specified as a lone integer (for example,
maxQueueSize=100), it determines the maximum number of queued events (for parsed data) or blocks of data (for unparsed data). A block of data is approximately 64KB. For forwarders sending unparsed data (mainly universal forwarders),
maxQueueSizeis the maximum number of data blocks. For heavy forwarders sending parsed data,
maxQueueSizeis the maximum number of events. Since events are typically much shorter than data blocks, the memory consumed by the output and wait queues on a parsing forwarder will likely be much smaller than on a non-parsing forwarder, if you use this version of the setting.
- If specified as an integer followed by KB, MB, or GB (for example,
maxQueueSize=100MB), it determines the maximum RAM allocated to the output queue and, indirectly, to the wait queue. If configured as
maxQueueSize=100MB, the maximum size of the output queue will be 100MB and the maximum size of the wait queue, if any, will be 300MB.
maxQueueSizedefaults to 500KB. The default wait queue size is 3x that amount: 1500KB.
Although the wait queue and the output queues are configured by the same attribute, they are separate queues.
Important: To ensure that the forwarder does not get blocked while waiting for acknowledgment from an indexer, you should set the
maxQueueSize to 7MB:
[tcpout] maxQueueSize = 7MB
Note the following points regarding this recommendation:
- This assumes that no thruput limit has been set in the forwarder's
- This configuration will cause memory consumption to go up by about 28MB.
Important: If you're enabling indexer acknowledgment, be careful to take into account your system's available memory when setting
maxQueueSize. You'll need to accommodate 4x the
maxQueueSize setting (1x for the output queue + 3x for the wait queue).
When the receiver is a forwarder, not an indexer
You can also use indexer acknowledgment when the receiving instance is an intermediate forwarder, instead of an indexer.
Assume you have an originating forwarder that sends data to an intermediate forwarder, which in turn forwards that data to an indexer. There are two main possibilities to consider:
- The originating forwarder and the intermediate forwarder both have acknowledgment enabled. In this case, the intermediate forwarder waits until it receives acknowledgment from the indexer and then sends acknowledgment back to the originating forwarder.
- The originating forwarder has acknowledgment enabled; the intermediate forwarder does not. In this case, the intermediate forwarder sends acknowledgment back to the originating forwarder as soon as it sends the data on to the indexer. It relies on TCP to safely deliver the data to the indexer. Because it doesn't itself have useACK enabled, the intermediate forwarder cannot verify delivery of the data to the indexer. This use case has limited value and is not recommended. If you use indexer acknowledgment, you should generally enable it on all forwarding tiers. That is the only way to ensure that data gets delivered all the way from the originating forwarder to the indexer.