Best practices for configuring your Splunk Cloud Platform environment for disaster recovery

Cross-Region Disaster Recovery is in the Early Access release phase. In the Early Access release phase, Splunk products might have limitations on customer access, features, maturity, and regional availability. Additionally, its documentation might receive frequent updates, or be incomplete or incorrect. For additional information on Early Access, contact your Splunk representative.

Both you and Splunk share responsibility for disaster recovery. Splunk does not perform any kind of disaster recovery of components that exist outside of Splunk Cloud Platform. Ensuring the continuity of external Splunk components is your responsibility. You must evaluate the disaster recovery of your data/event forwarding, network egress, and firewall infrastructure that resides outside of Splunk Cloud. For example, if a Universal Forwarder runs in a cloud solution provider (CSP) region, you must properly implement its failover to achieve end to end system resiliency.

To achieve minimal disruption in event of a disaster, follow these best practices when configuring and managing your Cross-Region Disaster Recovery-enabled Splunk Cloud Platform (SCP) deployment.

Refresh Splunk Cloud Platform IP address cache after a failover

When Splunk declares a qualified regional disaster and begins failing over your Splunk Cloud Platform deployment to a secondary cloud service provider region, it updates the DNS for your deployment to point to the secondary site. It's possible you might have cached the original IP network address to your deployment, either in your browser, an application, or a network component. To ensure access to your SCP environment when it has failed over, refresh this cache as quickly as possible so that data ingestion and searches route to the new set of IP addresses.

Use indexer acknowledgment on forwarders

Indexer acknowledgment is active on the ingestion path for Splunk Cloud Platform instances that use Cross-Region Disaster Recovery. Where applicable, use indexer acknowledgment on forwarders to buffer incoming data at the forwarders. This acknowledgment ensures that the forwarding tier saves the data if Splunk Cloud Platform cannot accept it due to failure before the ingestion is redirected to the secondary site. If possible, do not use intermediate universal forwarders (IUF) if you want to buffer data on forwarders, as IUFs are not good candidates for indexer acknowledgment.

Buffer incoming data during disaster recovery operations

During a failover, there is a period of time when Splunk Cloud Platform cannot ingest data. During that time, Splunk Cloud Platform does not perform indexer acknowledgment of the event data. You must configure your data collection and forwarding tiers to buffer this data. Allow for up to 4 hours of storage buffering at the data collection tier to ensure that you don't lose data.

If you use the AWS Data Firehose data-streaming service to send data into Splunk Cloud Platform through HTTP Event Collector, confirm that you have turned on indexer acknowledgment for HEC data inputs. As well, confirm that you have turned on persistent input queues on the forwarders that send that data. See the following topics for more information:

Repopulate dashboard configurations

Splunk does not replicate the results of previously run searches to the secondary CSP region. As a result, any dashboards that use the results of previously run saved searches do not populate after a failover until the next scheduled run of the saved search.

Where applicable, when you design dashboards in your Splunk Cloud Platform environment, use the ref reference attribute for the search Simple XML dashboard element rather than the loadjob search command. If you use the ref reference attribute, the search runs and populates the dashboard until the next scheduled run of the saved search.

Best practices for configuring your Splunk Cloud Platform environment for disaster recovery

Refresh Splunk Cloud Platform IP address cache after a failover

Use indexer acknowledgment on forwarders

Buffer incoming data during disaster recovery operations

Repopulate dashboard configurations

Comments

Best practices for configuring your Splunk Cloud Platform environment for disaster recovery

Was this topic useful?