Configure High Availability on top of Heavy Forwarders

Since version 4.0.0 Splunk DB Connect implements high availability features, this is possible using a etcd cluster to help replicate configuration changes and task coordination. This feature is experimental and still under development. However, it is already functional, and we encourage you to use it to provide feedback and help refine future versions.

Requirements for High Availability

etcd up and running. Review the hardware recommendations guide.
etcd is configured to work as a cluster. We recommend a cluster of at least 3 nodes.
Splunk DB Connect up and running.
Splunk DB Connect is configured to work as a cluster. We recommend at least 3 instances running as a cluster.
Each Splunk DB Connect instance has installed the required JDBC Add-ons. JDBC Add-ons are not replicated.

To ensure the security of data stored by Splunk DB Connect, we recommend enabling authentication and TLS on the etcd side. This guide provides step-by-step instructions for configuring both.

Install and configure the etcd cluster

etcd is a lightweight distributed key-value store. It allows us to replicate configuration changes in a reliable manner. The installation process is simple and does not require previous expertise working with etcd. We recommend installing etcd in the same instances as Splunk DB Connect, so as not to increase infrastructure costs.

Download and install etcd

We recommend reviewing the official documentation related to etcd installation steps in Install etcd. However we will describe what are the procedures to install in a Linux instances (AMD x64), but be aware these steps are sensible to changes.

1. Download etcd

$ ETCD_VERSION=v3.4.34

$ DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download

$ curl -L ${DOWNLOAD_URL}/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -o /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz

2. Unpack etcd

$ mkdir /opt/etcd-${ETCD_VERSION}

$ tar xzvf /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C /opt/etcd-${ETCD_VERSION} --strip-components=1

3. Verify etcd version

$ /opt/etcd-${ETCD_VERSION}/etcd --version

$ /opt/etcd-${ETCD_VERSION}/etcdctl version

Configure etcd to work as a cluster

Review the official documentation related to etcd cluster steps in etcd clustering. However we will describe what are the procedures to configure etcd as a cluster in Linux instances (AMD x64), but be aware these steps are sensible to changes. The service to make sure etcd is able to restart after an unexpected event. During the configuration provide an ip address for each instance that will join to the cluster. You will create the service for each instance. 1. Update ip address to hostname mapping

$ sudo nano /etc/hosts

<node-1-ip> etcd-node-1
<node-2-ip> etcd-node-2
<node-3-ip> etcd-node-3

2. Create etcd service

$ sudo nano /etc/systemd/system/etcd.service

[Unit]
Description=etcd cluster for Splunk DB Connect
After=network.target
[Service]
User=root
Type=notify
ExecStart=/opt/etcd-v3.4.34/etcd \
  --name etcd-node-<1..3> \
  --initial-advertise-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
  --listen-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
  --listen-client-urls http://<node-ip>:<client-request-port | 2379>,http://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \
  --advertise-client-urls http://<node-ip>:<client-request-port | 2379> \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster etcd-node-1=http://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=http://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=http://<node-3-ip>:<peer-commuication-port | 2380> \
  --initial-cluster-state new \
  --data-dir /var/lib/etcd
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

3. Start etcd service

$ sudo systemctl daemon-reload
$ sudo systemctl enable etcd
$ sudo systemctl start etcd

$ systemctl status etcd.service

4. Verify cluster status

$ ETCDCTL_API=3

$ /opt/etcd-v3.4.34/etcdctl --endpoints=http://<node-1-ip>:<client-request-port | 2379>,http://<node-2-ip>:<client-request-port | 2379>,http://<node-3-ip>:<client-request-port | 2379> endpoint health

Configure authentication in etcd

Review the official documentation related to etcd authentication in Authentication Guides. We will define ENDPOINTS environment variable to avoid verbosity and allow reusability.

$ ENDPOINTS=http://<node-1-ip>:<client-request-port | 2379>,http://<node-2-ip>:<client-request-port | 2379>,http://<node-3-ip>:<client-request-port | 2379>

1. Create user/role root for administrative purposes

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} role add root

When adding a new user, you will be asked for the password.

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user add root

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user grant-role root root

2. Enable authentication

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} auth enable

You can validate the authentication using the health API

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} --user root:<root-password> endpoint health

Create custom role with read/write access

Previously we created a user/role root, but it is intended to be used for administrative purposes, for security reasons we will create a new role and user with more specific access. 1. Create role dbx

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} --user=root:<root-password> role add dbx

2. Add privileges to the role. In this case we will give read/write access to all keys with prefix dbx. Splunk DB Connect use dbx as prefix.

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} role grant-permission dbx --prefix=true readwrite dbx

3. Create user dbx and grant role

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user add dbx

$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user grant-role dbx dbx

Configure TLS in etcd

We recommend reviewing the official documentation related to TLS in Transport security model. However we will describe what are the procedures to configure TLS in etcd, but be aware these steps are sensible to changes.

1. Obtain/generate certificates

You will need to obtain or generate TLS certificates for each etcd member. Therefore, you should be able to provide the private key, certificates, and CA. Make sure the certificates contains the Subject Alternative Name (SAN) field, specifying domain names, IP addresses, etc.

2. Enable TLS for client (DB Connect) to etcd communication

Note that now the protocol for advertise-client-urls and listen-client-urls is HTTPS instead of HTTP. We also add cert-file and key-file attributes, with a path to the certificate and the private key, respectively.

/opt/etcd-v3.4.34/etcd \
  --name etcd-node-<1..3> \
  --cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \
  --key-file=/path-to-certs/etcd-node-<1..3>-key.pem \
  --initial-advertise-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
  --listen-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
  --listen-client-urls https://<node-ip>:<client-request-port | 2379>,https://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \
  --advertise-client-urls https://<node-ip>:<client-request-port | 2379> \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster etcd-node-1=http://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=http://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=http://<node-3-ip>:<peer-commuication-port | 2380> \
  --initial-cluster-state new \
  --data-dir /var/lib/etcd

3. Enable TLS for peer (etcd) to peer (etcd) communication

Note that the protocol for initial-advertise-peer-urls, listen-peer-urls and initial-cluster is HTTPS instead of HTTP. peer-cert-file and peer-key-file attributes, with a path to the certificate and the private key have been added.

/opt/etcd-v3.4.34/etcd \
  --name etcd-node-<1..3> \
  --cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \
  --key-file=/path-to-certs/etcd-node-<1..3>-key.pem \
  --peer-cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \
  --peer-key-file=/path-to-certs/etcd-node-<1..3>-key.pem \
  --initial-advertise-peer-urls https://<node-ip>:<peer-commuication-port | 2380> \
  --listen-peer-urls https://<node-ip>:<peer-commuication-port | 2380> \
  --listen-client-urls https://<node-ip>:<client-request-port | 2379>,https://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \
  --advertise-client-urls https://<node-ip>:<client-request-port | 2379> \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster etcd-node-1=https://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=https://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=https://<node-3-ip>:<peer-commuication-port | 2380> \
  --initial-cluster-state new \
  --data-dir /var/lib/etcd

Maintenance

To keep your etcd cluster running at its optimal capacity, you may need to apply some specific configurations described in the maintenance guide.

Configure DB Connect to work as a cluster

To make Splunk DB Connect work as a cluster add etcd cluster member information in Splunk DB Connect > Configuration > Settings > High Availability Cluster.

Go to Splunk DB Connect > Configuration > Settings > High Availability Cluster
Specify user and password, in case the authentication is enabled for the etcd cluster.
Click Add and enter host and port information for each etcd cluster member
Click Save

After saving the configuration is validated and the status information is shown for each etcd member. You will see text similar to This is an Active-Passive cluster, at this moment this node is configured as Passive. Beware the High Availability cluster supported by Splunk DB Connect works as Active-Passive mode, it means the workload (data ingestion) will happen on a single node, in this case the Active one. Note: manual changes on *.conf files won't be replicated automatically.

Enable TLS

To allow DB Connect to communicate with etcd using TLS you must switch on the TLS Enabled option. Be aware that in case you use a self-signed certificate you will need to add the CA or the certificate itself to the KeyStore in Splunk DB Connect > Configuration > Settings > Keystore.

Reconciliation Options

At this moment Splunk DB Connect contains minimal reconciliation features. It means when a node in the cluster is down and new configurations are added to Splunk DB Connect, once the node is up, it won't contain the new configuration, so we need to do it manually. However, there are configurations that are always reconciled:

Checkpoint
KeyStore

Import configurations

This reconciliation option allows you to synchronize one specific instance with the other cluster members; any local configuration that has not been replicated may be lost.

Export configurations

This reconciliation option allows you to replicate local configuration to other cluster members; any configuration made on other members may be lost.

Use Cases for Splunk DB Connect High Availability Cluster

At this moment we provide High Availability, but not workload distribution / load balancing.

High Availability will benefit you if:

You have configured redundancy across multiple servers to cover Splunk DB Connect downtimes.
The server where you run Splunk DB Connect has a high downtime rate.
You want to minimize the risk of having delays or losing your data.

High Availability won't benefit you if:

If you have split Splunk DB Connect into multiple servers to handle high volume of data ingestion and for better performance. At this moment we do not support workload distribution / load balancing.

Upgrade scenarios for High Availability

If you have Splunk DB Connect installed in multiple instances and you want them to become a cluster to provide High Availability, either those instances are redundant (same configuration) or they have different configuration.

Install and configure the etcd cluster.
Configure Splunk DB Connect to work as a cluster.
Review the requirements section and make sure you meet them.
Choose the instance with the configurations that you want to replicate to others instances. Then go to Configurations > Settings > High Availability Cluster and click Export configurations.
Go to the other instances. Then go to Configurations > Settings > High Availability Cluster and click Import configurations.
Done

Scaling the High Availability cluster

For more resiliency or to scale, set up a new server with a Heavy Forwarder and Splunk DB Connect.

Add information to the cluster about the new member. Make sure it runs on one of the nodes that already belong to the cluster.
```
$ sudo /opt/etcd-${ETCD_VERSION}/etcdctl member add etcd-node-<1..3> --peer-urls=http://<node-ip
```
Follow the steps in Install and configure the etcd cluster to add a new etcd member to the cluster. Make sure you replace --initial-cluster-state new for --initial-cluster-state existing and you include the new member in --initial-cluster, it is only necessary for the new member.
Follow the steps in Configure DB Connect to work as a cluster to configure the new Splunk DB Connect instance.
Go to Configurations > Settings > High Availability Cluster and click Import configurations.

Replicating configurations

What is replicated

Identities
Connections
Inputs
Checkpoints (for Inputs)
Certificates (stored in the KeyStore)

What is not replicated

Outputs and Lookups (as these features are not supported on Heavy Forwarders).
HTTP Event Collector configuration
Logging configuration
General settings
JDBC drivers

Troubleshooting High Availability

Data is not being ingested after configure the Splunk Cloud HEC

HTTP Event Collector configurations are not replicated. So, by default the local HEC is used, but if you need to configure an external HEC, you will need to do it for each Splunk DB Connect instance.

Data configuration are not the same in each Splunk DB Connect instance

Make sure the etcd cluster is up and running, you can review the status of the members in Splunk DB Connect > Configuration > Settings > High Availability Cluster. Data reconciliation is limited for now. If you see configurations that were not replicated, use the manual reconciliation option. Go to Splunk DB Connect > Configuration > Settings > High Availability Cluster and click Import configurations.

How to review the logs related to High Availability

High availability logs follow the pattern feature=<feature-name> component=<component-name> action=<action-name>.

While the feature will be cluster, there are several components defined. The main components you can look for are related to data replication (feature=cluster component=*_listener) and cluster-(feature=cluster component=*_publisher).

These logs are related to the events that are sent to etcd and received for other DB Connect instances, it gives a full overview about what is happening in the cluster. However, most of these logs are at DEBUG level.

Other useful components are leader_election and client_provider.

Related answers from Splunk Community

Configure High Availability on top of Heavy Forwarders

Requirements for High Availability

Install and configure the etcd cluster

Download and install etcd

Configure etcd to work as a cluster

Configure authentication in etcd

Create custom role with read/write access

Configure TLS in etcd

1. Obtain/generate certificates

2. Enable TLS for client (DB Connect) to etcd communication

3. Enable TLS for peer (etcd) to peer (etcd) communication

Maintenance

Configure DB Connect to work as a cluster

Enable TLS

Reconciliation Options

Import configurations

Export configurations

Use Cases for Splunk DB Connect High Availability Cluster

High Availability will benefit you if:

High Availability won't benefit you if:

Upgrade scenarios for High Availability

Scaling the High Availability cluster

Replicating configurations

What is replicated

What is not replicated

Troubleshooting High Availability

Data is not being ingested after configure the Splunk Cloud HEC

Data configuration are not the same in each Splunk DB Connect instance

How to review the logs related to High Availability

Comments

Configure High Availability on top of Heavy Forwarders

Was this topic useful?