Configure High Availability on top of Heavy Forwarders
Since version 4.0.0
Splunk DB Connect implements high availability features, this is possible using a etcd cluster to help replicate configuration changes and task coordination. This feature is experimental and still under development. However, it is already functional, and we encourage you to use it to provide feedback and help refine future versions.
Requirements for High Availability
- etcd up and running. Review the hardware recommendations guide.
- etcd is configured to work as a cluster. We recommend a cluster of at least 3 nodes.
- Splunk DB Connect up and running.
- Splunk DB Connect is configured to work as a cluster. We recommend at least 3 instances running as a cluster.
- Each Splunk DB Connect instance has installed the required JDBC Add-ons. JDBC Add-ons are not replicated.
To ensure the security of data stored by Splunk DB Connect, we recommend enabling authentication and TLS on the etcd side. This guide provides step-by-step instructions for configuring both.
Install and configure the etcd cluster
etcd is a lightweight distributed key-value store. It allows us to replicate configuration changes in a reliable manner. The installation process is simple and does not require previous expertise working with etcd. We recommend installing etcd in the same instances as Splunk DB Connect, so as not to increase infrastructure costs.
Download and install etcd
We recommend reviewing the official documentation related to etcd installation steps in Install etcd. However we will describe what are the procedures to install in a Linux instances (AMD x64), but be aware these steps are sensible to changes.
1. Download etcd
$ ETCD_VERSION=v3.4.34
$ DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
$ curl -L ${DOWNLOAD_URL}/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -o /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
2. Unpack etcd
$ mkdir /opt/etcd-${ETCD_VERSION}
$ tar xzvf /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C /opt/etcd-${ETCD_VERSION} --strip-components=1
3. Verify etcd version
$ /opt/etcd-${ETCD_VERSION}/etcd --version
$ /opt/etcd-${ETCD_VERSION}/etcdctl version
Configure etcd to work as a cluster
Review the official documentation related to etcd cluster steps in etcd clustering. However we will describe what are the procedures to configure etcd as a cluster in Linux instances (AMD x64), but be aware these steps are sensible to changes. The service to make sure etcd is able to restart after an unexpected event. During the configuration provide an ip address for each instance that will join to the cluster. You will create the service for each instance. 1. Update ip address to hostname mapping
$ sudo nano /etc/hosts
<node-1-ip> etcd-node-1 <node-2-ip> etcd-node-2 <node-3-ip> etcd-node-3
2. Create etcd service
$ sudo nano /etc/systemd/system/etcd.service
[Unit] Description=etcd cluster for Splunk DB Connect After=network.target [Service] User=root Type=notify ExecStart=/opt/etcd-v3.4.34/etcd \ --name etcd-node-<1..3> \ --initial-advertise-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \ --listen-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \ --listen-client-urls http://<node-ip>:<client-request-port | 2379>,http://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \ --advertise-client-urls http://<node-ip>:<client-request-port | 2379> \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster etcd-node-1=http://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=http://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=http://<node-3-ip>:<peer-commuication-port | 2380> \ --initial-cluster-state new \ --data-dir /var/lib/etcd Restart=always RestartSec=5 [Install] WantedBy=multi-user.target
3. Start etcd service
$ sudo systemctl daemon-reload $ sudo systemctl enable etcd $ sudo systemctl start etcd
$ systemctl status etcd.service
4. Verify cluster status
$ ETCDCTL_API=3
$ /opt/etcd-v3.4.34/etcdctl --endpoints=http://<node-1-ip>:<client-request-port | 2379>,http://<node-2-ip>:<client-request-port | 2379>,http://<node-3-ip>:<client-request-port | 2379> endpoint health
Configure authentication in etcd
Review the official documentation related to etcd authentication in Authentication Guides. We will define ENDPOINTS
environment variable to avoid verbosity and allow reusability.
$ ENDPOINTS=http://<node-1-ip>:<client-request-port | 2379>,http://<node-2-ip>:<client-request-port | 2379>,http://<node-3-ip>:<client-request-port | 2379>
1. Create user/role root
for administrative purposes
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} role add root
When adding a new user, you will be asked for the password.
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user add root
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user grant-role root root
2. Enable authentication
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} auth enable
You can validate the authentication using the health API
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} --user root:<root-password> endpoint health
Create custom role with read/write access
Previously we created a user/role root
, but it is intended to be used for administrative purposes, for security reasons we will create a new role and user with more specific access.
1. Create role dbx
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} --user=root:<root-password> role add dbx
2. Add privileges to the role. In this case we will give read/write access to all keys with prefix dbx
. Splunk DB Connect use dbx
as prefix.
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} role grant-permission dbx --prefix=true readwrite dbx
3. Create user dbx
and grant role
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user add dbx
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user grant-role dbx dbx
Configure TLS in etcd
We recommend reviewing the official documentation related to TLS in Transport security model. However we will describe what are the procedures to configure TLS in etcd, but be aware these steps are sensible to changes.
1. Obtain/generate certificates
You will need to obtain or generate TLS certificates for each etcd member. Therefore, you should be able to provide the private key, certificates, and CA. Make sure the certificates contains the Subject Alternative Name (SAN) field, specifying domain names, IP addresses, etc.
2. Enable TLS for client (DB Connect) to etcd communication
Note that now the protocol for advertise-client-urls
and listen-client-urls
is HTTPS instead of HTTP.
We also add cert-file
and key-file
attributes, with a path to the certificate and the private key, respectively.
/opt/etcd-v3.4.34/etcd \ --name etcd-node-<1..3> \ --cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \ --key-file=/path-to-certs/etcd-node-<1..3>-key.pem \ --initial-advertise-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \ --listen-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \ --listen-client-urls https://<node-ip>:<client-request-port | 2379>,https://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \ --advertise-client-urls https://<node-ip>:<client-request-port | 2379> \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster etcd-node-1=http://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=http://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=http://<node-3-ip>:<peer-commuication-port | 2380> \ --initial-cluster-state new \ --data-dir /var/lib/etcd
3. Enable TLS for peer (etcd) to peer (etcd) communication
Note that the protocol for initial-advertise-peer-urls
, listen-peer-urls
and initial-cluster
is HTTPS instead of HTTP.
peer-cert-file
and peer-key-file
attributes, with a path to the certificate and the private key have been added.
/opt/etcd-v3.4.34/etcd \ --name etcd-node-<1..3> \ --cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \ --key-file=/path-to-certs/etcd-node-<1..3>-key.pem \ --peer-cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \ --peer-key-file=/path-to-certs/etcd-node-<1..3>-key.pem \ --initial-advertise-peer-urls https://<node-ip>:<peer-commuication-port | 2380> \ --listen-peer-urls https://<node-ip>:<peer-commuication-port | 2380> \ --listen-client-urls https://<node-ip>:<client-request-port | 2379>,https://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \ --advertise-client-urls https://<node-ip>:<client-request-port | 2379> \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster etcd-node-1=https://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=https://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=https://<node-3-ip>:<peer-commuication-port | 2380> \ --initial-cluster-state new \ --data-dir /var/lib/etcd
Maintenance
To keep your etcd cluster running at its optimal capacity, you may need to apply some specific configurations described in the maintenance guide.
Configure DB Connect to work as a cluster
To make Splunk DB Connect work as a cluster add etcd cluster member information in Splunk DB Connect > Configuration > Settings > High Availability Cluster.
- Go to Splunk DB Connect > Configuration > Settings > High Availability Cluster
- Specify
user
andpassword
, in case the authentication is enabled for the etcd cluster. - Click Add and enter
host
andport
information for each etcd cluster member - Click Save
After saving the configuration is validated and the status information is shown for each etcd member. You will see text similar to This is an Active-Passive cluster, at this moment this node is configured as Passive
.
Beware the High Availability cluster supported by Splunk DB Connect works as Active-Passive mode, it means the workload (data ingestion) will happen on a single node, in this case the Active one.
Note: manual changes on *.conf
files won't be replicated automatically.
Enable TLS
To allow DB Connect to communicate with etcd using TLS you must switch on the TLS Enabled option. Be aware that in case you use a self-signed certificate you will need to add the CA or the certificate itself to the KeyStore in Splunk DB Connect > Configuration > Settings > Keystore.
Reconciliation Options
At this moment Splunk DB Connect contains minimal reconciliation features. It means when a node in the cluster is down and new configurations are added to Splunk DB Connect, once the node is up, it won't contain the new configuration, so we need to do it manually. However, there are configurations that are always reconciled:
- Checkpoint
- KeyStore
Import configurations
This reconciliation option allows you to synchronize one specific instance with the other cluster members; any local configuration that has not been replicated may be lost.
Export configurations
This reconciliation option allows you to replicate local configuration to other cluster members; any configuration made on other members may be lost.
Use Cases for Splunk DB Connect High Availability Cluster
At this moment we provide High Availability, but not workload distribution / load balancing.
High Availability will benefit you if:
- You have configured redundancy across multiple servers to cover Splunk DB Connect downtimes.
- The server where you run Splunk DB Connect has a high downtime rate.
- You want to minimize the risk of having delays or losing your data.
High Availability won't benefit you if:
- If you have split Splunk DB Connect into multiple servers to handle high volume of data ingestion and for better performance. At this moment we do not support workload distribution / load balancing.
Upgrade scenarios for High Availability
If you have Splunk DB Connect installed in multiple instances and you want them to become a cluster to provide High Availability, either those instances are redundant (same configuration) or they have different configuration.
- Install and configure the etcd cluster.
- Configure Splunk DB Connect to work as a cluster.
- Review the requirements section and make sure you meet them.
- Choose the instance with the configurations that you want to replicate to others instances. Then go to Configurations > Settings > High Availability Cluster and click Export configurations.
- Go to the other instances. Then go to Configurations > Settings > High Availability Cluster and click Import configurations.
- Done
Scaling the High Availability cluster
For more resiliency or to scale, set up a new server with a Heavy Forwarder and Splunk DB Connect.
- Add information to the cluster about the new member. Make sure it runs on one of the nodes that already belong to the cluster.
$ sudo /opt/etcd-${ETCD_VERSION}/etcdctl member add etcd-node-<1..3> --peer-urls=http://<node-ip
- Follow the steps in Install and configure the etcd cluster to add a new etcd member to the cluster. Make sure you replace
--initial-cluster-state new
for--initial-cluster-state existing
and you include the new member in--initial-cluster
, it is only necessary for the new member. - Follow the steps in Configure DB Connect to work as a cluster to configure the new Splunk DB Connect instance.
- Go to Configurations > Settings > High Availability Cluster and click Import configurations.
Replicating configurations
What is replicated
- Identities
- Connections
- Inputs
- Checkpoints (for Inputs)
- Certificates (stored in the KeyStore)
What is not replicated
- Outputs and Lookups (as these features are not supported on Heavy Forwarders).
- HTTP Event Collector configuration
- Logging configuration
- General settings
- JDBC drivers
Troubleshooting High Availability
Data is not being ingested after configure the Splunk Cloud HEC
HTTP Event Collector configurations are not replicated. So, by default the local HEC is used, but if you need to configure an external HEC, you will need to do it for each Splunk DB Connect instance.
Data configuration are not the same in each Splunk DB Connect instance
Make sure the etcd cluster is up and running, you can review the status of the members in Splunk DB Connect > Configuration > Settings > High Availability Cluster. Data reconciliation is limited for now. If you see configurations that were not replicated, use the manual reconciliation option. Go to Splunk DB Connect > Configuration > Settings > High Availability Cluster and click Import configurations.
High availability logs follow the pattern feature=<feature-name> component=<component-name> action=<action-name>
.
While the feature will be cluster
, there are several components defined. The main components you can look for are related to data replication (feature=cluster component=*_listener
) and cluster-(feature=cluster component=*_publisher
).
These logs are related to the events that are sent to etcd and received for other DB Connect instances, it gives a full overview about what is happening in the cluster. However, most of these logs are at DEBUG
level.
Other useful components are leader_election
and client_provider
.
Monitor Splunk DB Connect health | Configuration file reference |
This documentation applies to the following versions of Splunk® DB Connect: 4.0.0
Feedback submitted, thanks!