Splunk® Enterprise

Distributed Deployment Manual

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Protect against loss of in-flight data

To guard against loss of data when forwarding to an indexer, you can use Splunk's indexer acknowledgment capability. With indexer acknowledgment, the forwarder will resend any data not acknowledged as "received" by the indexer.

This feature is disabled by default, because it can affect performance. You enable it in outputs.conf, as described later.

Indexer acknowledgment is available for all varieties of forwarders: universal, light, or heavy.

Note: Both forwarders and indexers must be at version 4.2 or higher for acknowledgment to function. Otherwise, the transmission between forwarder and indexer will proceed without acknowledgment.

Indexer acknowledgment and clusters

When using forwarders to send data to peer nodes in a cluster, you should ordinarily enable indexer acknowledgment. To learn more about forwarders and clusters, read "Use forwarders to get your data" in the Managing Indexers and Clusters Manual.

How indexer acknowledgment works when everything goes well

The forwarder sends data continuously to the indexer, in blocks of approximately 64kB. The forwarder maintains a copy of each block in memory, in its wait queue, until it gets an acknowledgment from the indexer. While waiting, it continues to send more data blocks.

If all goes well, the indexer:

1. Receives the block of data.

2. Parses the data.

3. Writes the data to the file system as events (raw data and index data).

4. Sends an acknowledgment to the forwarder.

The acknowledgment tells the forwarder that the indexer received the data and successfully wrote it to the file system. Upon receiving the acknowledgment, the forwarder releases the block from memory.

If the wait queue is of sufficient size, it doesn't fill up while waiting for acknowledgments to arrive. But see this section for possible issues and ways to address them, including how to increase the wait queue size.

How indexer acknowledgment works when there's a failure

When there's a failure in the round-trip process, the forwarder does not receive an acknowledgment. It will then attempt to resend the block of data.

Why no acknowledgment?

These are the reasons that a forwarder might not receive acknowledgment:

  • Indexer goes down after receiving the data -- for instance, due to machine failure.
  • Indexer is unable to write to the file system -- for instance, because the disk is full.
  • Network goes down while acknowledgment is en route to the forwarder.

How the forwarder deals with failure

After sending a data block, the forwarder maintains a copy of the data in its wait queue until it receives an acknowledgment. In the meantime, it continues to send additional blocks as usual. If the forwarder doesn't get acknowledgment for a block within 300 seconds (by default), it closes the connection. You can change the wait time by setting the readTimeout attribute in outputs.conf.

If the forwarder is set up for auto load balancing, it then opens a connection to the next indexer in the group (if one is available) and sends the data to it. If the forwarder is not set up for auto load balancing, it attempts to open a connection to the same indexer as before and resend the data.

The forwarder maintains the data block in the wait queue until acknowledgment is received. Once the wait queue fills up, the forwarder stops sending additional blocks until it receives an acknowledgment for one of the blocks, at which point it can free up space in the queue.

Other reasons the forwarder might close a connection

There are actually three conditions that can cause the forwarder to close the network connection:

  • Read timeout. The forwarder doesn't receive acknowledgment within 300 (default) seconds. This is the condition described above.
  • Write timeout. The forwarder is not able to finish a network write within 300 (default) seconds. The value is configurable in outputs.conf by setting writeTimeout.
  • Read/write failure. Typical causes include the indexer's machine crashing or the network going down.

In all these cases, the forwarder will then attempt to open a connection to the next indexer in the load-balanced group, or to the same indexer again if load-balancing is not enabled.

The possibility of duplicates

It's possible for the indexer to index the same data block twice. This can happen if there's a network problem that prevents an acknowledgment from reaching the forwarder. For instance, assume the indexer receives a data block, parses it, and writes it to the file system. It then generates the acknowledgment. However, on the round-trip to the forwarder, the network goes down, so the forwarder never receives the acknowledgment. When the network comes back up, the forwarder then resends the data block, which the indexer will parse and write as if it were new data.

To deal with such a possibility, every time the forwarder resends a data block, it writes an event to its splunkd.log noting that it's a possible duplicate. The admin is responsible for using the log information to track down the duplicate data on the indexer.

Here's an example of a duplicate warning:

10-18-2010 17:32:36.941 WARN TcpOutputProc - Possible duplication of events with 
channel=source::/home/jkerai/splunk/current-install/etc/apps/sample_app
/logs/maillog.1|host::MrT|sendmail|, streamId=5941229245963076846, offset=131072 
subOffset=219 on host=10.1.42.2:9992

Enable indexer acknowledgment

You enable indexer acknowledgment solely on the forwarder. You do not set any attribute on the indexer side; it will send acknowledgments if the forwarder tells it to. (But remember, both the forwarder and the indexer must be at version 4.2 or greater.)

To enable indexer acknowledgment, set the useACK attribute to true in the forwarder's outputs.conf:

[tcpout:<target_group>]
server=<server1>, <server2>, ...
useACK=true
...

A value of useACK=true enables indexer acknowledgment.

By default, this feature is disabled: useACK=false

Note: You can set useACK either globally or by target group, at the [tcpout] or [tcpout:<target_group>] stanza levels. You cannot set it for individual servers at the [tcpout-server: ...] stanza level.

Indexer acknowledgment influence on forwarded data throughput

Indexer acknowledgment can limit and/or reduce forwarder throughput in some scenarios. Here, we describe how this can occur and steps that you should take to avoid this.

If you have enabled indexer acknowledgment on the forwarder through useACK and the receiving indexer is at version 4.2+, the forwarder will use a wait queue to manage the acknowledgment process. Otherwise, it won't have a wait queue. This section describes how to manage the wait queue for performance.

Because the forwarder sends data blocks continuously and does not wait for acknowledgment before sending the next block, its wait queue will typically maintain many blocks, each waiting for its acknowledgment. The forwarder will continue to send blocks until its wait queue is full, at which point it will stop forwarding. The forwarder then waits until it receives an acknowledgment, which allows it to release a block from its queue and thus resume forwarding.

A wait queue can fill up when something is wrong with the network or indexer; however, it can also fill up even though the indexer is functioning normally. This is because the indexer only sends the acknowledgment after it has written the data to the file system. Any delay in writing to the file system will slow the pace of acknowledgment, leading to a full wait queue.

There are a few reasons that a normal functioning indexer might delay writing data to the file system (and so delay its sending of acknowledgments):

  • The indexer is very busy. For example, at the time the data arrives, the indexer might be dealing with multiple search requests or with data coming from a large number of forwarders.
  • The indexer is receiving too little data. For efficiency, an indexer only writes to the file system periodically -- either when a write queue fills up or after a timeout of a few seconds. If a write queue is slow to fill up, the indexer will wait until the timeout to write. If data is coming from only a few forwarders, the indexer can end up in the timeout condition, even if each of those forwarders is sending a normal quantity of data. Since write queues exist on a per hot bucket basis, the condition occurs when some particular bucket is getting a small amount of data. Usually this means that a particular index is getting a small amount of data.

To ensure that throughput does not degrade because the forwarder is waiting on the indexer for acknowledgment, you might need to use a larger wait queue size, ensuring it has sufficient space to maintain all blocks in memory while waiting for acknowledgments to arrive. You'll need to experiment with the queue size that's right for your forwarder's specific environment. On the other hand, if you have many forwarders feeding a single indexer, and a moderate number of data sources per forwarder, you may be able to conserve a few megabytes of memory by using a smaller size.

Note: You cannot configure the size of the wait queue directly. Its size is always relative to the size of the in-memory output queue, as described below.

Configure the wait queue size

Important: To optimize the wait queue size for indexer acknowledgment, Splunk recommends that you increase the maxQueueSize attribute.

The maximum wait queue size is 3x the size of the in-memory output queue, which you set with the maxQueueSize attribute in outputs.conf:

maxQueueSize = [<integer>|<integer>[KB|MB|GB]]

For example, if you set maxQueueSize to 7MB, the maximum wait queue size will be 21MB.

Note the following:

  • This attribute sets the maximum size of the forwarder's in-memory (RAM) output queue. It also determines the maximum size of the wait queue, which is 3x the setting for the output queue.
  • If specified as a lone integer (for example, maxQueueSize=100), it determines the maximum number of queued events (for parsed data) or blocks of data (for unparsed data). A block of data is approximately 64KB. For forwarders sending unparsed data (mainly universal forwarders), maxQueueSize is the maximum number of data blocks. For heavy forwarders sending parsed data, maxQueueSize is the maximum number of events. Since events are typically much shorter than data blocks, the memory consumed by the output and wait queues on a parsing forwarder will likely be much smaller than on a non-parsing forwarder, if you use this version of the setting.
  • If specified as an integer followed by KB, MB, or GB (for example, maxQueueSize=100MB), it determines the maximum RAM allocated to the output queue and, indirectly, to the wait queue. If configured as maxQueueSize=100MB, the maximum size of the output queue will be 100MB and the maximum size of the wait queue, if any, will be 300MB.
  • maxQueueSize defaults to 500KB. The default wait queue size is 3x that amount: 1500KB.

Although the wait queue and the output queues are configured by the same attribute, they are separate queues.

Important: In forwarder versions prior to 5.0.4, to ensure that the forwarder does not get blocked while waiting for acknowledgment from an indexer, you should set the maxQueueSize to 7MB:

[tcpout]
maxQueueSize = 7MB

Forwarders version 5.0.4 and later use a maxQueueSize setting of 'auto' which changes to 7MB when useACK is enabled.

Note the following points regarding this recommendation:

  • This assumes that no thruput limit has been set in the forwarder's limits.conf file.
  • This configuration will cause memory consumption to go up by about 28MB.

Important: If you're enabling indexer acknowledgment, be careful to take into account your system's available memory when setting maxQueueSize. You'll need to accommodate 4x the maxQueueSize setting (1x for the output queue + 3x for the wait queue).

When the receiver is a forwarder, not an indexer

You can also use indexer acknowledgment when the receiving instance is an intermediate forwarder, instead of an indexer.

Assume you have an originating forwarder that sends data to an intermediate forwarder, which in turn forwards that data to an indexer. There are two main possibilities to consider:

  • The originating forwarder and the intermediate forwarder both have acknowledgment enabled. In this case, the intermediate forwarder waits until it receives acknowledgment from the indexer and then sends acknowledgment back to the originating forwarder.
  • The originating forwarder has acknowledgment enabled; the intermediate forwarder does not. In this case, the intermediate forwarder sends acknowledgment back to the originating forwarder as soon as it sends the data on to the indexer. It relies on TCP to safely deliver the data to the indexer. Because it doesn't itself have useACK enabled, the intermediate forwarder cannot verify delivery of the data to the indexer. This use case has limited value and is not recommended. If you use indexer acknowledgment, you should generally enable it on all forwarding tiers. That is the only way to ensure that data gets delivered all the way from the originating forwarder to the indexer.
PREVIOUS
Configure forwarders with outputs.conf
  NEXT
Consolidate data from multiple machines

This documentation applies to the following versions of Splunk® Enterprise: 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters