Best practices for configuring your Splunk Cloud Platform environment for disaster recovery
Cross-Region Disaster Recovery is in the Early Access release phase. In the Early Access release phase, Splunk products might have limitations on customer access, features, maturity, and regional availability. Additionally, its documentation might receive frequent updates, or be incomplete or incorrect. For additional information on Early Access, contact your Splunk representative.
Both you and Splunk share responsibility for disaster recovery. Splunk does not perform any kind of disaster recovery of components that exist outside of Splunk Cloud Platform. Ensuring the continuity of external Splunk components is your responsibility. You must evaluate the disaster recovery of your data/event forwarding, network egress, and firewall infrastructure that resides outside of Splunk Cloud. For example, if a Universal Forwarder runs in a cloud solution provider (CSP) region, you must properly implement its failover to achieve end to end system resiliency.
To achieve minimal disruption in event of a disaster, follow these best practices when configuring and managing your Cross-Region Disaster Recovery-enabled Splunk Cloud Platform (SCP) deployment.
Refresh Splunk Cloud Platform IP address cache after a failover
When Splunk declares a qualified regional disaster and begins failing over your Splunk Cloud Platform deployment to a secondary cloud service provider region, it updates the DNS for your deployment to point to the secondary site. It's possible you might have cached the original IP network address to your deployment, either in your browser, an application, or a network component. To ensure access to your SCP environment when it has failed over, refresh this cache as quickly as possible so that data ingestion and searches route to the new set of IP addresses.
Use indexer acknowledgment on forwarders
Indexer acknowledgment is active on the ingestion path for Splunk Cloud Platform instances that use Cross-Region Disaster Recovery. Where applicable, use indexer acknowledgment on forwarders to buffer incoming data at the forwarders. This acknowledgment ensures that the forwarding tier saves the data if Splunk Cloud Platform cannot accept it due to failure before the ingestion is redirected to the secondary site. If possible, do not use intermediate universal forwarders (IUF) if you want to buffer data on forwarders, as IUFs are not good candidates for indexer acknowledgment.
Buffer incoming data during disaster recovery operations
During a failover, there is a period of time when Splunk Cloud Platform cannot ingest data. During that time, Splunk Cloud Platform does not perform indexer acknowledgment of the event data. You must configure your data collection and forwarding tiers to buffer this data. Allow for up to 4 hours of storage buffering at the data collection tier to ensure that you don't lose data.
If you use the AWS Data Firehose data-streaming service to send data into Splunk Cloud Platform through HTTP Event Collector, confirm that you have turned on indexer acknowledgment for HEC data inputs. As well, confirm that you have turned on persistent input queues on the forwarders that send that data. See the following topics for more information:
- Configure Amazon Kinesis Firehose to send data to the Splunk platform
- Use persistent queues to help prevent data loss
Repopulate dashboard configurations
Splunk does not replicate the results of previously run searches to the secondary CSP region. As a result, any dashboards that use the results of previously run saved searches do not populate after a failover until the next scheduled run of the saved search.
Where applicable, when you design dashboards in your Splunk Cloud Platform environment, use the ref
reference attribute for the search
Simple XML dashboard element rather than the loadjob
search command. If you use the ref
reference attribute, the search runs and populates the dashboard until the next scheduled run of the saved search.
Cross-region disaster recovery service level agreements and limitations | Implement Cross-Region Disaster Recovery in your Splunk Cloud Platform environment |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!