Use indexer discovery to connect forwarders to peer nodes

Indexer discovery streamlines the process of connecting forwarders to peer nodes in indexer clusters. It simplifies the set-up and maintenance of an indexer cluster. See Advantages of the indexer discovery method. Indexer discovery is available only for forwarding to indexer clusters.

Each forwarder queries the manager node for a list of all peer nodes in the cluster. It then uses load balancing to forward data to the set of peer nodes. In the case of a multisite cluster, a forwarder can query the manager for a list of all peers on a single site.

How indexer discovery works

Briefly, the process works like this:

1. The peer nodes provide the manager node with information on their receiving ports.

2. The forwarders poll the manager at regular intervals for the list of available peer nodes. You can adjust this interval. See Adjust the frequency of polling.

3. The manager transmits the peer nodes' URIs and receiving ports to the forwarders.

4. The forwarders send data to the set of nodes provided by the manager.

In this way, the forwarders stay current with the state of the cluster, learning of any peers that have joined or left the cluster and updating their set of receiving peers accordingly.

In the case of a multisite cluster, each forwarder can identify itself as a member of a site. In that case, the manager node transmits a list of all peer nodes for that site only, and the forwarder limits itself to load balancing across that site. See Use indexer discovery in a multisite cluster.

In addition, the peer nodes can use weighted load balancing to adjust the amount of data they send to each peer based on that peer's relative disk capacity. See Use weighted load balancing.

Note: If the manager node goes down, the forwarders will rely on their most recent list of available peer nodes. However, the list does not persist through a forwarder restart. Therefore, if a forwarder restarts while the manager is down, it will not have a list of peer nodes and will not be able to forward data, resulting in potential data loss. Similarly, if a forwarder starts up for the first time, it must wait for the manager to return before it can get a list of peers.

Configure indexer discovery

These are the main steps for setting up connections between forwarders and peer nodes, using indexer discovery:

1. Configure the peer nodes to receive data from forwarders.

2. Configure the manager node to enable indexer discovery.

3. Configure the forwarders.

After you set up the connection, you must configure the data inputs on the forwarders. See Configure the data inputs to each forwarder.

1. Configure the peer nodes to receive data from forwarders

In order for a peer to receive data from forwarders, you must configure the peer's receiving port. One way to specify the receiving port is to edit the peer's inputs.conf file. For example, this setting in inputs.conf sets the receiving port to 9997:

[splunktcp://9997]
disabled = 0

Restart the peer node after making the change.

See Enable a receiver in the Forwarding Data manual.

Caution: When using indexer discovery, each peer node can have only a single configured receiving port. The port can be configured for either splunktcp or splunktcp-ssl, but not for both. You must use the same method for all peer nodes in the cluster: splunktcp or splunktcp-ssl.

You can simplify peer input configuration by deploying a single, identical inputs.conf file across all the peers. The receiving port that you specify in the common copy of inputs.conf will supersede any ports you enable on each individual peer. For details on how to create and deploy a common inputs.conf across all peers, see Update common peer configurations and apps.

When forwarding to a multisite cluster, you can configure the forwarder to send data only to peers in a specified site. See Use indexer discovery in a multisite cluster.

2. Configure the manager node to enable indexer discovery

In server.conf on the manager node, add this stanza:

[indexer_discovery]
pass4SymmKey = <string>
polling_rate = <integer>
indexerWeightByDiskCapacity = <bool>

Note the following:

The pass4SymmKey attribute specifies the security key used with communication between the manager node and the forwarders. Its value must be the same for all forwarders and the manager node. The pass4SymmKey attribute used for indexer_discovery should have a different value from the pass4SymmKey attribute used for communication between the manager and the cluster nodes, which is set in the [clustering] stanza, as described in Configure the security key.
The polling_rate attribute (optional) provides a means to adjust the rate at which the forwarders poll the manager for the latest list of peer nodes. Its value must be an integer between 1 and 10. The default is 10. See Adjust the frequency of polling.
The indexerWeightByDiskCapacity attribute (optional) determines whether indexer discovery uses weighted load balancing. The default is false. See Use weighted load balancing.

3. Configure the forwarders

a. Configure the forwarders to use indexer discovery

On each forwarder, add these settings to the outputs.conf file:

[indexer_discovery:<name>]
pass4SymmKey = <string>
manager_uri = <uri>

[tcpout:<target_group>]
indexerDiscovery = <name>

[tcpout]
defaultGroup = <target_group>

Note the following:

In the [indexer_discovery:<name>] stanza, the <name> references the <name> set in the indexerDiscovery attribute in the [tcpout:<target_group>] stanza.
The pass4SymmKey attribute specifies the security key used with communication between the manager and the forwarders. Its value must be the same for all forwarders and the manager node. You must explicitly set this value for each forwarder.
The <manager_uri> is the URI and management port for the manager node. For example: "https://10.152.31.202:8089".
In the [tcpout:<target_group>] stanza, set the indexerDiscovery attribute, instead of the server attribute that you would use to specify the receiving peer nodes if you were not enabling indexer discovery. With indexer discovery, the forwarders get their list of receiving peer nodes from the manager, not from the server attribute. If both attributes are set, indexerDiscovery takes precedence.

b. Enable indexer acknowledgment for each forwarder

Note: This step is required to ensure end-to-end data fidelity. If that is not a requirement for your deployment, you can skip this step.

To ensure that the cluster receives and indexes all incoming data, you must turn on indexer acknowledgment for each forwarder.

To configure indexer acknowledgment, set the useACK attribute in each forwarder's outputs.conf, in the same stanza where you set the indexerDiscovery attribute:

[tcpout:<target_group>]
indexerDiscovery = <name>
useACK=true

For detailed information on configuring indexer acknowledgment, read Protect against loss of in-flight data in the Forwarding Data manual.

Example

In this example:

The manager node enables indexer discovery.
The manager and forwarders share a security key.
Forwarders will send data to peer nodes weighted by the total disk capacity of the peer nodes' disks.
The forwarders use indexer acknowledgment to ensure end-to-end fidelity of data.

In the manager node's: server.conf:

[indexer_discovery]
pass4SymmKey = my_secret
indexerWeightByDiskCapacity = true

In each forwarder's outputs.conf:

[indexer_discovery:manager1]
pass4SymmKey = my_secret
manager_uri = https://10.152.31.202:8089

[tcpout:group1]
autoLBFrequency = 30
forceTimebasedAutoLB = true
indexerDiscovery = manager1
useACK=true

[tcpout]
defaultGroup = group1

Use indexer discovery in a multisite cluster

In multisite clustering, the cluster is partitioned into sites, typically based on the location of the cluster nodes. See Multisite indexer clusters. When using indexer discovery with multisite clustering, you can configure each forwarder to be site-aware, so that it forwards data to peer nodes only on a single specified site.

When you use indexer discovery with multisite clustering, you must assign a site-id to all forwarders, whether or not you want the forwarders to be site-aware.:

If you want a forwarder to be site-aware, you assign it a site-id for a site in the cluster, such as "site1," "site2," and so on.

If you do not want a forwarder to be site-aware, you assign it the special site-id of "site0". When a forwarder is assigned "site0", it will forward to peers across all sites in the cluster.

Assign a site-id to each forwarder

To assign a site-id, add this stanza to the forwarder's server.conf file:

[general]
site = <site-id>

Note the following:

You must assign a <site-id> to each forwarder sending data to a multisite cluster. This must either be a valid site in the cluster or the special value "site0".
If you want the forwarder to send data only to peers at a specific site, assign the id for that site, such as "site1."
If you want the forwarder to send data to all peers, across all sites, assign a value of "site0".
If you do not assign any id, the forwarder will not send data to any peer nodes.
See also Site values.

Configure the forwarder site failover capability

If you assign a forwarder to a specific site and that site goes down, the forwarder, by default, will not fail over to another site. Instead, it will stop forwarding data if there are no peers available on its assigned site. To avoid this issue, you must configure the forwarder site failover capability.

To configure the forwarder site failover capability, set the forwarder_site_failover attribute in the manager node's server.conf file.

For example:

[clustering]
forwarder_site_failover = site1:site2, site2:site3

This example configures failover sites for site1 and site2. If site1 fails, all forwarders configured to send data to peers on site1 will instead send data to peers on site2. Similarly, if site2 fails, all forwarders explicitly configured to send data to peers on site2 will instead send data to peers on site3.

Note: The failover capability does not relay from site to site. In other words, in the previous example, if a forwarder is set to site1 and site1 goes down, the forwarder will then start forwarding to peers on site2. However, if site2 subsequently goes down, the site1 forwarder will not then failover to site3. Only forwarders explicitly set to site2 will failover to site3. Each forwarder can have only a single failover site.

The forwarders revert to their assigned site, as soon as any peer on that site returns to the cluster. For example, assume that the manager node includes this configuration:

[clustering]
forwarder_site_failover = site1:site2

When site1 goes down, such that there are no peers running on site1, the forwarders assigned to site1 start sending data to peers on site2 instead. This failover condition continues until a site1 peer returns to the cluster. At that point, the forwarders assigned to site1 start forwarding to that peer. They no longer forward to peers on site2.

Use weighted load balancing

When you enable indexer discovery, the forwarders always stream the incoming data across the set of peer nodes, using load balancing to switch the data stream from node to node. This operates in a similar way to how forwarders without indexer discovery use load balancing, but with some key differences. In particular, you can enable weighted load balancing.

In weighted load balancing, the forwarders take each peer's disk capacity into account when they load balance the data. For example, a peer with a 400GB disk receives approximately twice the data of a peer with a 200GB disk.

Important: The disk capacity refers to the total amount of local disk space on the peer, not the amount of free space.

How weighted load balancing works

Weighted load balancing behaves similarly to normal forwarder load balancing. The autoLBFrequency attribute in the forwarder's outputs.conf file still determines how often the data stream switches to a different indexer. However, when the forwarder selects the next indexer, it does so based on the relative disk capacities. The selection itself is random but weighted towards indexers with larger disk capacities.

In other words, the forwarder uses weighted picking. So, if the forwarder has an autoLBFrequency set to 60, then every sixty seconds, the forwarder switches the data stream to a new indexer. If the load balancing is taking place across two indexers, one with a 500GB disk and the other with a 100GB disk, the indexer with the larger disk is five times as likely to be picked at each switching point.

The overall traffic sent to each indexer is based this ratio:

indexer_disk_capacity/total_disk_capacity_of_indexers_combined

For a general discussion of load balancing in indexer clusters, see How load balancing works.

Enable weighted load balancing

The indexerWeightByDiskCapacity attribute in the manager node's server.conf file controls weighted load balancing:

[indexer_discovery]
indexerWeightByDiskCapacity = <bool>

Note the following:

The indexerWeightByDiskCapacity attribute is set to false by default. To enable weighted load balancing, you must set it to true.

Change the advertised disk capacity for an indexer

In some cases, you might want weighted load balancing to treat an indexer as though it has a lower disk capacity than it actually has. You can use the advertised_disk_capacity attribute to accomplish this. For example, if you set that attribute to 50 (signifiying 50%) on an indexer with a 500GB disk, weighted load balancing will proceed as though the actual disk capacity was 250GB.

You set the advertised_disk_capacity attribute in the indexer's server.conf file:

[clustering]
advertised_disk_capacity = <integer>

Note the following:

The advertised_disk_capacity attribute indicates the percentage that will be applied to the indexer's actual disk capacity before it sends the capacity to the manager node. For example, if set to 50 on an indexer with a 500GB disk, the indexer tells the manager that the disk capacity is 250GB.
The value can vary from 10 to 100.
The default is 100.

Adjust the frequency of polling

Forwarders poll the manager node at regular intervals to receive the most recent list of peers. In this way, they become aware of any changes to the set of available peers and can modify their forwarding accordingly. You can adjust the rate of polling.

The frequency of polling is based on the number of forwarders and the value of the polling_rate attribute, configured in the manager's server.conf file. The polling interval for each forwarder follows this formula:

(number_of_forwarders/polling_rate + 30 seconds) * 1000 = polling interval, in milliseconds

Here are some examples:

# 100 forwarders, with the default polling_rate of 10
(100/10 + 30) * 1000 = 40,000 ms., or 40 seconds 

# 10,000 forwarders, with the default polling_rate of 10
(10000/10 + 30) * 1000 = 1,030,000 ms., or 1030 seconds, or about 17 minutes 

# 10,000 forwarders, with the minimum polling_rate of 1
(10000/1 + 30) * 1000 = 10,030,000 ms., or 10,030 seconds, or a bit under three hours

To configure polling_rate, add the attribute to the [indexer_discovery] stanza in server.conf on the manager node:

[indexer_discovery]
polling_rate = <integer>

Note the following:

The polling_rate attribute must be an integer between 1 and 10.
The default is 10.

Configure indexer discovery with SSL

You can configure indexer discovery with SSL. The process is nearly the same as configuring without SSL, with just a few additions and changes:

1. Configure the peer nodes to receive data from forwarders over SSL.

2. Configure the manager node to enable indexer discovery.

3. Configure the forwarders for SSL.

The steps below provide basic configuration information only, focusing on the differences when configuring for SSL. For full details on indexer discovery configuration, see Configure indexer discovery.

1. Configure the peer nodes to receive data from forwarders over SSL

Edit each peer's inputs.conf file to specify the receiving port and to configure the necessary SSL settings:

[splunktcp-ssl://9997]
disabled = 0

[SSL]
serverCert = <path to server certificate>
sslPassword = <certificate password>

Note: When using indexer discovery, each peer node can have only a single receiving port. For SSL, you must configure a port for splunktcp-ssl only. Do not configure a splunktcp stanza.

In addition, confirm that sslRootCAPath is set in each peer's server.conf file.

2. Configure the manager node to enable indexer discovery

In server.conf on the manager node, add this stanza:

[indexer_discovery]
pass4SymmKey = <string>
polling_rate = <integer>
indexerWeightByDiskCapacity = <bool>

This is the same as for configuring a non-SSL set-up.

3. Configure the forwarders for SSL

On each forwarder, add these settings to the outputs.conf file:

[indexer_discovery:<name>]
pass4SymmKey = <string>
manager_uri = <uri>

[tcpout:<target_group>]
indexerDiscovery = <name>
useACK = true
clientCert = <path to  client certificate>
sslPassword = <CAcert password>

[tcpout] 
defaultGroup = <target_group>

In addition, confirm that sslRootCAPath is set in each forwarder's server.conf file.

Related answers from Splunk Community

Use indexer discovery to connect forwarders to peer nodes

How indexer discovery works

Configure indexer discovery

1. Configure the peer nodes to receive data from forwarders

2. Configure the manager node to enable indexer discovery

3. Configure the forwarders

a. Configure the forwarders to use indexer discovery

b. Enable indexer acknowledgment for each forwarder

Example

Use indexer discovery in a multisite cluster

Assign a site-id to each forwarder

Configure the forwarder site failover capability

Use weighted load balancing

How weighted load balancing works

Enable weighted load balancing

Change the advertised disk capacity for an indexer

Adjust the frequency of polling

Configure indexer discovery with SSL

1. Configure the peer nodes to receive data from forwarders over SSL

2. Configure the manager node to enable indexer discovery

3. Configure the forwarders for SSL

Comments

Use indexer discovery to connect forwarders to peer nodes

Was this topic useful?