Configure automatic load balancing
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Configure automatic load balancing
You can configure Splunk so that forwarders sending data to indexers are automatically load balanced across multiple Splunk indexers. This means that if an indexer goes down or is otherwise unavailable (network or power issues, for example), a forwarder needing to send data to an indexer will be directed to the next indexer that is available.
You can either define a static list of available indexers in outputs.conf on each forwarder, or specify a single indexer name in outputs.conf on each forwarder and define the list of indexers that respond to that name in DNS. Each option has its benefits:
- Keeping the list of available indexers in your DNS record makes it easier to add or remove indexers as your indexing bandwidth needs change.
- Providing a static list of indexing hosts on each forwarder allows you to specify a different port for each indexer; if you are running multiple Splunk indexers on a single host, you can distribute the forwarding load across them by specifying a different port for each instance.
How it works
Output settings for forwarders are configured in outputs.conf. Since there is no default copy of outputs.conf, you can create a standard outputs.conf that applies to all your forwarders and then copy it into $SPLUNK_HOME/etc/system/local/ on each forwarding instance. You can use a copy of outputs.conf.spec as a starting point, but be sure to thoroughly review and edit it to meet your deployment's needs. The information in outputs.conf.spec tells you what the default values are for the options.
To set up automatic load balancing for a given forwarder, you specify a global configuration stanza in outputs.conf (described below), and then define one or more target groups of indexers. Any configuration you set in the global stanza applies to all receiving indexers unless you specify a different configuration for an individual indexer or target group of indexers. Configurations you define for specific indexers or groups of indexers override anything you've defined at a higher level. You can then enable and configure automatic load-balancing in either the global stanza or on a per-target group basis.
Why multiple target groups? If you have more than one target group of indexers, a given forwarder will send a copy of its data to both target groups. This can be useful in situations where you have two different sets of Splunk indexers. For example, if you are planning to upgrade your Splunk deployment to a newer version, and want to test the new version alongside your existing deployment, this configuration allows you to send the same data to both sets of indexers. Or maybe you just want a backup instance of your indexed data.
Note: If you drop a copy of outputs.conf into $SPLUNK_HOME/etc/system/local/ and don't specify anything but the stanza headings, Splunk will write some default values into the file. The values in the examples below don't necessarily reflect those defaults.
Global configuration
The global configuration is set in the [tcpout] stanza. The global configuration applies unless it is overridden by the configurations for the specific target group(s) or individual servers:
[tcpout] defaultGroup = my_indexers disabled = false prependKeyToRaw = <key>
Things to know:
- If you copy an
outputs.confinto$SPLUNK_HOME/etc/system/local/on a forwarder that just includes this stanza, you've defined a default target group of indexers calledmy_indexersto which this forwarder should send its data and can now define automatic load-balancing and other behaviors for that target group as explained below. - prependKeyToRaw is optional. If you set a key here, Splunk looks in every event for that key. If an event contains this key, the value is prepended to the raw data that is sent out to the indexing server. This ONLY works if
sendCookedData = false. The key/value pair and how it is derived is set in props.conf and transforms.conf. You might want to use this to append <priority> to a syslog event which has been obtained by monitoring a syslog file and sending it out to a syslog server. If you don't have a specific need to identify events in this way, don't include this.
Target group configuration
Set target group-specific details in a stanza named for that target group:
[tcpout:my_indexers] disabled = false autoLB = true autoLBFrequency = 10 server = fflanda-lb.fflanda.com:9995
-
autoLB- If this is set to true, automatic load-balancing is enabled. -
autoLBFrequency- This is the interval in seconds after which a new indexer is randomly chosen from the list of available indexers. -
server- This is either a single indexer, or a list of indexers.- Splunk recommends that you specify the fully qualified domain name here. Splunk resolves names to ip addresses and remembers them to load balances among them.
What the DNS record looks like
If you use fully qualified domain names, and have three indexers for forwarders to distribute their data to, your DNS record would look like this:
fflanda-lb A 10.10.10.1 fflanda-lb A 10.10.10.2 fflanda-lb A 10.10.10.3
and the nslookup output would look like this:
$ nslookup fflanda-lb Server: 127.0.0.1 Address: 127.0.0.1#53 Name: fflanda-lb.fflanda.com Address: 10.10.10.2 Name: fflanda-lb.fflanda.com Address: 10.10.10.3 Name: fflanda-lb.fflanda.com Address: 10.10.10.1
and after the time specified in autoLBFrequency passes, the nslookup output would look like this (the order changes):
$ nslookup fflanda-lb Server: 127.0.0.1 Address: 127.0.0.1#53 Name: fflanda-lb.fflanda.com Address: 10.10.10.3 Name: fflanda-lb.fflanda.com Address: 10.10.10.1 Name: fflanda-lb.fflanda.com Address: 10.10.10.2
- If an indexer is unavailable, Splunk keeps monitoring the indexer. When it does come up, it is added to the list of available indexers again.
- If the connection to an indexer is disrupted while it is in use, the behavior depends: if a chunk of data was already partially sent to the indexer, Splunk will try to send the rest of the chunk to the same indexer. If the connection is disrupted and restored within 5 seconds, subsequent chunks of data are sent over the restored connection. If the connection becomes unavailable for 5 seconds or more, the rest of the data is sent to a newly chosen available indexer.
Optional attributes for target groups
These attributes are optional. You can set these attributes in the global stanza, or on a per-target group basis:
-
sendCookedData=true/false- If true, events are cooked (have been processed by Splunk and are not raw)
- If false, events are raw and untouched prior to sending
- Defaults to true
-
heartbeatFrequency=60- How often in seconds to send a heartbeat packet to the receiver
- Heartbeats are only sent if
sendCookedData=true - Defaults to 30 seconds
Queue settings for target groups
When forwarders send data, the data enters a queue as it leaves the forwarder. If no indexers are available to receive the data, this stanza determines what is done with the queued up data. You can set these attributes in the global stanza, or on a per-target group basis:
-
maxQueueSize=20000- The maximum number of queued events (queue size)
- Defaults to 1000
-
dropEventsOnQueueFull=10- Wait N * 5 seconds before throwing out all new events until the queue has space.
- Setting this to -1 or 0 sets the queue to block when it gets full causing blocking up the processor chain.
- When any target group's queue is blocked, no more data will reach any other target group.
- Using load balanced groups is the best way to alleviate this condition because multiple receivers must be down (or jammed up) before queue blocking occurs.
- Defaults to -1 (do not drop events)
Backoff settings for target groups
If an indexer in a target group becomes unreachable, you can configure the forwarder to retry the connection. If a connection must be retried, a forwarder uses backoffAtStartup or initialBackoff as the number of seconds to wait. After this time expires, the forwarder doubles the number of seconds over and over again until reaching maxBackoff. When this is reached, the forwarder stops doubling the number of seconds inbetween retries and uses the same maxBackoff seconds. It retries at this frequency maxNumberOfRetriesAtHighestBackoff times or forever if that value is -1.
So if the initialBackoff is set to 2 seconds, maxBackoff is set to 20 seconds, and maxNumberOfRetriesAtHighestBackoff is set to -1, then the forwarder will retry the connection at 2, 4, 8, and 16 seconds. The next increment will be 20 seconds, and it will retry at this frequency indefinitely until the connection is made.
-
backoffAtStartup=N- Takes effect only at Splunk startup.
- Defines how many seconds to wait until retrying the first time a retry is needed.
- Defaults to 5 seconds
-
initialBackoff=N- Takes effect for a running Splunk forwarder.
- Defines how many seconds to wait until retrying every time other than the first time a retry is needed.
- Defaults to 2 seconds
-
maxBackoff=N- Specifies the number of seconds before reaching the maximum backoff frequency.
- Defaults to 20
-
maxNumberOfRetriesAtHighestBackoff=N- Specifies the number of times the system should retry after reaching the highest backoff period before stopping completely.
- -1 means to try forever.
- It is suggested that you never change this from the default, or the forwarder will completely stop forwarding to a downed URI at some point.
- Defaults to -1 (forever)
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.