AWS BYOL high availability
Initial publication: June 26, 2024
Last reviewed: June 7, 2024
AWS offers a broad cloud computing platform with high availability and service at scale. Splunk administrators can take advantage of the flexibility of AWS to modify, scale, and migrate their deployment on demand and as their business requirements change. Splunk uses the term BYOL (bring your own license) to refer to customers who manage their own deployments in a cloud service provider, such as AWS, using their Splunk Enterprise license.
Architecture overview
The following diagram represents a high-level architecture of a Splunk Enterprise AWS BYOL deployment leveraging native cloud capabilities for high availability and scale.
- Indexers are spread across three different availability zones (in a single region) to help ensure high availability using Splunk multisite clustering.
- SHC instances are also spread across different availability zones (in a single region) and are fronted by a load balancer so users can use a single endpoint for UI access.
- Splunk SmartStore allows the separation of compute and storage resources, leveraging S3 for cost-effective and performant long-term data retention.
- Cluster manager redundancy is achieved as a pair of instances in separate zones to cover for a zone loss or outage.
Benefits and descriptions
- All existing SVA patterns can be implemented within AWS.
- Data created within AWS (or already in AWS) can be locally ingested, saving network egress costs.
- Indexers and Search Heads can be scaled quickly and easily through AWS automation services external to the Splunk platform.
- Instance specifications can be adjusted as needed for changes in business needs and performance.
Search tier
- SHC (Search Head Cluster) allows for high availability of the Splunk search tier by clustering Splunk search heads and replicating search and user objects as needed. A single member acts as the captain that is selected during startup through an election process. This member maintains replication state and handles scheduled search jobs. The search head deployer (SHC-D) is an instance that exists outside of the cluster and contains the apps and configurations needed for the search head cluster. The SHC-D is not a mission critical component that is needed for a functioning cluster or requires redundancy. https://docs.splunk.com/Documentation/Splunk/latest/DistSearch/AboutSHC
- ELB (Elastic Load Balancer) is an AWS service that may be applied to balance user sessions across a search head cluster. You should enable session affinity (sticky sessions) and use application-controlled session affinity. https://aws.amazon.com/elasticloadbalancing/
- Autoscaling may be applied to handle instance failures or instances in an unhealthy state. AWS can relaunch and replace these instances automatically, reducing the need for manual intervention. This feature can also protect against availability zone failures and disaster recovery If an instance is lost, this feature can be used to automatically replace for provisioning. https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html
- Federated Search can be leveraged to execute unified search across multiple Splunk environments. This ability allows users to search across multiple, separate, complete Splunk software deployments without the complexity of distributed search. These separate Splunk deployments can exist in a public cloud, private cloud, on-premises, etc. https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutfederatedsearch
Indexing tier
- Indexer Clustering prevents data loss while promoting data availability for searching. Splunk Enterprise will index multiple copies of the data based on configured search and replication factors. By having multiple copies spread across the multisite cluster, there is no data loss and minimal service disruption due to failed Indexers. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Aboutclusters
- SmartStore allows for scaling storage and compute resources separately on your Splunk Indexers. SmartStore relies on AWS S3 for bulk data storage while leveraging instance types with high-performance local storage for data caching and search. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/AboutSmartStore
- HA Cluster Manager (CM) adds resiliency to the cluster manager instance by providing a mechanism to run multiple cluster managers in an active/standby design. The failover can be configured for automatic or manual, and all activity is synced allowing for a quick and easy transition between members. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/CMredundancy *Autoscaling may be applied to handle instance failures or instances in an unhealthy state. AWS can relaunch and replace these instances automatically, reducing the need for manual intervention. This feature can also protect against availability zone failures and disaster recovery. https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html
Data ingestion tier
- Indexer Discovery is a capability that simplifies forwarding configuration for Splunk forwarders. Each forwarder queries the manager node for a list of all peer nodes in the cluster. It then uses load balancing to forward data to the set of peer nodes. This works well with an AWS-based Splunk deployment where peer information can change as instances are redeployed. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/indexerdiscovery
- ELB (Elastic Load Balancer) may be applied to balance HEC and httpout data connections between the forwarding tier and the indexing tier. https://aws.amazon.com/elasticloadbalancing/
- Splunk Forwarders send application and system data to Splunk Enterprise securely, efficiently, and scalably. https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Typesofforwarders
- Splunk HTTP Event Collector can also be leveraged to send data over HTTP(s) when a Splunk forwarder is not used. https://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector
- Ingest Actions allow you to route, filter, and mask data easily and quickly. https://docs.splunk.com/Documentation/Splunk/latest/Data/DataIngest
Limitations
- SmartStore
- Multisite cluster across AWS Regions is currently unsupported. https://docs.splunk.com/Documentation/Splunk/latest/Indexer/MultisiteSmartStore#Public_cloud_provider_hosted.2C_within_a_single_region
- AWS Graviton Processors
- This Architecture is currently unsupported by Splunk Enterprise.
SmartStore for Splunk platform | Federated Search for Splunk platform |
This documentation applies to the following versions of Splunk® Validated Architectures: current
Feedback submitted, thanks!