Splunk Cloud Platform

Recover from a Disaster with Cross-region Disaster Recovery (Early Access)

Best practices for configuring your Splunk Cloud Platform environment for disaster recovery

Cross-region disaster recovery is in the Early Access release phase. In the Early Access release phase, Splunk products might have limitations on customer access, features, maturity, and regional availability. Additionally, its documentation might receive frequent updates, or be incomplete or incorrect. For additional information on Early Access, contact your Splunk representative.

Both you and Splunk share responsibility for disaster recovery. Splunk does not perform any kind of disaster recovery of components that exist outside of Splunk Cloud Platform. Ensuring the continuity of external Splunk components is your responsibility. You must evaluate the disaster recovery of your data/event forwarding, network egress, and firewall infrastructure that resides outside of Splunk Cloud. For example, if a Universal Forwarder runs in a cloud solution provider (CSP) region, you must properly implement its failover to achieve end to end system resiliency.

To achieve minimal disruption in event of a disaster, follow these best practices when configuring and managing your cross-region disaster recovery (CRDR)-enabled Splunk Cloud Platform (SCP) deployment.

Refresh Splunk Cloud Platform IP address cache after a failover

When Splunk declares a qualified regional disaster and begins failing over your Splunk Cloud Platform deployment to a secondary cloud service provider region, it updates the DNS for your deployment to point to the secondary site. It's possible you might have cached the original IP network address to your deployment, either in your browser, an application, or a network component. To ensure access to your SCP environment when it has failed over, refresh this cache as quickly as possible so that data ingestion and searches route to the new set of IP addresses.

Use indexer acknowledgment on forwarders

Indexer acknowledgment is active on the ingestion path for Splunk Cloud Platform instances with the CRDR service active. Where applicable, use indexer acknowledgment on forwarders to buffer incoming data at the forwarders. This acknowledgment ensures that the forwarding tier saves the data if Splunk Cloud Platform cannot accept it due to failure before the ingestion is redirected to the secondary site. If possible, do not use intermediate universal forwarders (IUF) if you want to buffer data on forwarders, as IUFs are not good candidates for indexer acknowledgment.

Buffer incoming data during disaster recovery operations

During a failover, there is a period of time when Splunk Cloud Platform cannot ingest data. During that time, Splunk Cloud Platform does not perform indexer acknowledgment of the event data. You must configure your data collection and forwarding tiers to buffer this data. Allow for up to 4 hours of storage buffering at the data collection tier to ensure that you don't lose data.

Repopulate dashboard configurations

Splunk does not replicate the results of previously run searches to the secondary CSP region. As a result, any dashboards that use the results of previously run saved searches do not populate after a failover until the next scheduled run of the saved search.

Where applicable, when you design dashboards in your Splunk Cloud Platform environment, use the ref reference attribute for the search Simple XML dashboard element rather than the loadjob search command. If you use the ref reference attribute, the search runs and populates the dashboard until the next scheduled run of the saved search.

Confirm size of new indexes

If you create new indexes in your SCP environment, wait at least 30 minutes after you create the index before you send data to it. If you begin sending data to your indexes sooner, that data might end up in the "last chance" index instead of the index you specify. The "last chance" index is the index of last resort that Splunk configures for Splunk Cloud Platform instances.

After Splunk fails over your SCP environment, it enforces the maximum size limits for indexes. Any historical data that is outside of the index size limits is deleted, oldest first. Where possible, confirm that the size you configured for indexes meets the use case requirements.

Last modified on 12 April, 2024
Cross-region disaster recovery service level agreements and limitations   Implement cross-region disaster recovery in your Splunk Cloud Platform environment

Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters