AWS BYOL high availability

Initial publication: June 26, 2024
Last reviewed: June 7, 2024

AWS offers a broad cloud computing platform with high availability and service at scale. Splunk administrators can take advantage of the flexibility of AWS to modify, scale, and migrate their deployment on demand and as their business requirements change. Splunk uses the term BYOL (bring your own license) to refer to customers who manage their own deployments in a cloud service provider, such as AWS, using their Splunk Enterprise license.

Architecture overview

The following diagram represents a high-level architecture of a Splunk Enterprise AWS BYOL deployment leveraging native cloud capabilities for high availability and scale.

Indexers are spread across three different availability zones (in a single region) to help ensure high availability using Splunk multisite clustering.
SHC instances are also spread across different availability zones (in a single region) and are fronted by a load balancer so users can use a single endpoint for UI access.
Splunk SmartStore allows the separation of compute and storage resources, leveraging S3 for cost-effective and performant long-term data retention.
Cluster manager redundancy is achieved as a pair of instances in separate zones to cover for a zone loss or outage.

Benefits and descriptions

All existing SVA patterns can be implemented within AWS.
Data created within AWS (or already in AWS) can be locally ingested, saving network egress costs.
Indexers and Search Heads can be scaled quickly and easily through AWS automation services external to the Splunk platform.
Instance specifications can be adjusted as needed for changes in business needs and performance.

Search tier

SHC (Search Head Cluster) allows for high availability of the Splunk search tier by clustering Splunk search heads and replicating search and user objects as needed. A single member acts as the captain that is selected during startup through an election process. This member maintains replication state and handles scheduled search jobs. The search head deployer (SHC-D) is an instance that exists outside of the cluster and contains the apps and configurations needed for the search head cluster. The SHC-D is not a mission critical component that is needed for a functioning cluster or requires redundancy. https://docs.splunk.com/Documentation/Splunk/latest/DistSearch/AboutSHC
ELB (Elastic Load Balancer) is an AWS service that may be applied to balance user sessions across a search head cluster. You should enable session affinity (sticky sessions) and use application-controlled session affinity. https://aws.amazon.com/elasticloadbalancing/
Autoscaling may be applied to handle instance failures or instances in an unhealthy state. AWS can relaunch and replace these instances automatically, reducing the need for manual intervention. This feature can also protect against availability zone failures and disaster recovery If an instance is lost, this feature can be used to automatically replace for provisioning. https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html
Federated Search can be leveraged to execute unified search across multiple Splunk environments. This ability allows users to search across multiple, separate, complete Splunk software deployments without the complexity of distributed search. These separate Splunk deployments can exist in a public cloud, private cloud, on-premises, etc. https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutfederatedsearch

Indexing tier

Indexer Clustering prevents data loss while promoting data availability for searching. Splunk Enterprise will index multiple copies of the data based on configured search and replication factors. By having multiple copies spread across the multisite cluster, there is no data loss and minimal service disruption due to failed Indexers. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Aboutclusters
SmartStore allows for scaling storage and compute resources separately on your Splunk Indexers. SmartStore relies on AWS S3 for bulk data storage while leveraging instance types with high-performance local storage for data caching and search. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/AboutSmartStore
HA Cluster Manager (CM) adds resiliency to the cluster manager instance by providing a mechanism to run multiple cluster managers in an active/standby design. The failover can be configured for automatic or manual, and all activity is synced allowing for a quick and easy transition between members. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/CMredundancy *Autoscaling may be applied to handle instance failures or instances in an unhealthy state. AWS can relaunch and replace these instances automatically, reducing the need for manual intervention. This feature can also protect against availability zone failures and disaster recovery. https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html

Data ingestion tier

Indexer Discovery is a capability that simplifies forwarding configuration for Splunk forwarders. Each forwarder queries the manager node for a list of all peer nodes in the cluster. It then uses load balancing to forward data to the set of peer nodes. This works well with an AWS-based Splunk deployment where peer information can change as instances are redeployed. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/indexerdiscovery
ELB (Elastic Load Balancer) may be applied to balance HEC and httpout data connections between the forwarding tier and the indexing tier. https://aws.amazon.com/elasticloadbalancing/
Splunk Forwarders send application and system data to Splunk Enterprise securely, efficiently, and scalably. https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Typesofforwarders
Splunk HTTP Event Collector can also be leveraged to send data over HTTP(s) when a Splunk forwarder is not used. https://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector
Ingest Actions allow you to route, filter, and mask data easily and quickly. https://docs.splunk.com/Documentation/Splunk/latest/Data/DataIngest

Limitations

SmartStore
- Multisite cluster across AWS Regions is currently unsupported. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/MultisiteSmartStore#Public_cloud_provider_hosted.2C_within_a_single_region
AWS Graviton Processors
- This Architecture is currently unsupported by Splunk Enterprise.

Related answers from Splunk Community

AWS BYOL high availability

Architecture overview

Benefits and descriptions

Search tier

Indexing tier

Data ingestion tier

Limitations

Comments

AWS BYOL high availability

Was this topic useful?