Store expired Splunk Cloud Platform data in a Splunk-managed archive
Dynamic Data Active Archive (DDAA) lets you move your data from your Splunk Cloud Platform indexes to a Splunk-managed archive. You can use DDAA to maintain access to older data for compliance purposes. You specify archiving at the index level by creating an archiving rule for a specified index. This give you the flexibility to archive only the specific data that you need to maintain.
You can configure Splunk Cloud Platform to automatically archive the data from an index when the data either reaches a specified maximum size or the end of the Splunk Cloud Platform searchable retention period for an index. You can restore archived data to your Splunk Cloud Platform environment for searching within the configured archival retention time period.
You can manually clear restored data or let it auto-expire from searchable storage after 30 days. You can also track archived and restored data storage consumption, as well as the growth and expiration of your archived data.
Dynamic Data Active Archive moves data from your Splunk Index to a Splunk-maintained archive, and subsequently back from the Splunk-maintained archive to the Splunk Index in a secure and tamper-resistant manner.
How Dynamic Data Active Archive works
Data is moved to the archive when the index meets a configured size or time threshold. When that threshold is met, Splunk Cloud Platform attempts to move the data to the archive location. If an error occurs, if there are connection issues, Splunk Cloud Platform attempts to move the data every 15 minutes until it can successfully move it.
It can take up to 48 hours from the archive initiation for the archiving process to complete.
If an error occurs, the error is logged to the splunkd.log
. Splunk Cloud Platform does not delete data from the Splunk Cloud Platform environment until it has successfully moved the data to the archive. If you need to restore the data so that it is searchable, you can restore the data to your Splunk Cloud Platform environment. You can then search the data and delete it when you have finished.
When you restore archived data to Splunk Cloud Platform, it does not count against the indexing license volume for the Splunk Cloud Platform deployment.
Dynamic Data Active Archive Performance
Restoring large amounts of archived data can impact performance. Splunk Cloud Platform has checks in place to help you determine if the amount of data you want to restore is too large, and it provides a warning when the data size may impact performance. Splunk Cloud Platform will block you from restoring large amounts of data that could potentially have an extremely negative impact on performance. If this occurs, select a smaller time range.
Configure archive settings for an index
This section shows you how to configure archive settings for a specific index.
Managing archive settings requires the indexes_edit
capability. All archive changes appear in the audit.log
file.
Setting incorrect or inadequate data retention values can result in a loss of data. If you have any questions about correctly setting the searchable and archive retention values for your Splunk Cloud Platform deployment, contact your Splunk account representative.
For more information on Splunk Cloud Platform data retention settings and policies and the DDAS and DDAA subscription options, see:
- Manage data retention settings
- Storage section in the Splunk Cloud Platform Service Description
Configure archiving for an index
- In Splunk Cloud, go to Settings > Indexes.
- Click New Index to create a new index or click Edit in the Actions column for an existing index.
- In the Max raw data size field, specify the maximum amount of raw data allowed before data is removed from the index and archived.
- In the Dynamic Data Storage field, select Splunk Archive.
- Set the Searchable retention (days) and Archive Retention Period values. Note the following:
- Searchable retention (days) holds the Dynamic Data Active Searchable (DDAS) or searchable storage value. This is the searchable retention period, and is considered warm storage.
- Dynamic Data Storage > Splunk Archive > Archive Retention Period holds the Dynamic Data Active Archive (DDAA), or archive storage value, and is considered cold storage. You can specify this value in years, months, or days. The maximum archive retention period is 3650 days (10 years). Specify a value within this range.
- The archive retention period is the total amount of time that Splunk retains your data. The archive retention period includes the searchable retention period. For example, if you want Splunk Cloud Platform to retain your data for a total of 365 days, but you want that data searchable for the first 90 days, set the searchable retention period to 90 days and the archive retention period to 365 days (not 365-90 days).
- When specifying the archive retention period value, you must specify a value that is greater than the searchable retention period. For example, if you set Searchable retention (days) to 90 days, you must set the Archive Retention Period to a value greater than 90 days, such as 180 days.
- Click Save.
You cannot enable both DDAA and DDSS at the same time for the same index. If you enable DDAA for an index, then later decide to change the index settings to use either DDSS or no storage, you must contact Splunk Support if you want to retain the archived data.
Disable archiving for an index
- Go to Settings > Indexes.
- Click Edit in the Actions column for the index you want to manage.
- In the Dynamic Data Storage field, select Self Storage to move data to self-storage location when it expires or No Additional Storage to delete data as it expires.
- Click Save. When data in this index expires, it is deleted.
Disabling archiving for an index marks the existing archived data with a status of delete
. Deleted archive data will be permanently erased 30 days after the deletion date. Be aware that disabling archiving for an index does not affect the time or size of the data retention policy for the index.
If you disable archiving for an index in error, contact Splunk Support as soon as possible. If you have a support contract, file a new case using the Splunk Support Portal. Otherwise, contact Splunk Customer Support.
Restore archived data to Splunk Cloud Platform
DDAA lets you restore indexed data from the Splunk archive. Data in DDAA can be restored to Dynamic Data Active Searchable (DDAS) to be searched. After restoring data, you can search it like any other data.
You restore data based on the time period for the data you want to search. For example, you might want to restore data for a period of one day. When you pick a date from the date-picker, DDAA treats it as 12 AM UTC of the selected date. So, if you want to restore one day's worth of archived data, (for example, on 07/10/2018) you must specify 07/10/2018 in the 'from' field and 07/11/2018 in the 'to' field.
By default, restored data is searchable for a period of one month. Splunk automatically removes the data after this period. Splunk does not remove data from the archive.
The archival process can take up to 48 hours to complete and the restoration process can take up to 24 hours to complete. Because the complete archival and restoration cycle can take up to 72 hours to complete, be sure to plan any data restoration processes accordingly.
How restoring data works
When you restore data to Splunk Cloud Platform from the archive, a copy of the archived data is moved back to the Splunk Cloud Platform environment. To ensure your data is safe, Splunk Cloud Platform never moves or deletes the original archived data. This method of temporary data restoration ensures that you can never mistakenly delete your archived data.
When you restore archived data to an index in your Splunk Cloud Platform instance, it does not count against the retention periods configured for data in your index. Restored data exists outside of the constraints of retention periods and size limits and does not affect the retention of your existing index data.
When you restore data, Splunk Cloud Platform checks several conditions to ensure that you do not experience performance issues and that you do not duplicate data and cause your queries to return incorrect results:
- Check for overlapping data. Splunk Cloud Platform does not restore data if you have already restored data in that same time range. This is to ensure you do not restore duplicate data, which would cause inaccurate search results. For example, if you specify that you want to restore data from 07/01/2018-07/03/2018, but you have already restored data from 07/01/2018-07/02/2018, Splunk Cloud Platform will prevent your data restore. In this case, it is recommended you restore the data that falls outside of the range of the data you have already restored. In this example, you would restore data from 07/03/2018-07/04/2018.
- Check to ensure data is not likely to cause performance issues. Splunk Cloud Platform checks the size of the data you want to restore and presents you with a warning if the size of the data may cause performance issues. If the size of that data is very likely to cause performance issues, Splunk Cloud Platform will prevent you from restoring the data.
During the data restoration process, the Splunk platform retrieves all buckets that contain events necessary for the specified search period. For certain restoration scenarios, this can result in the total size of the restored data being much greater than the total number of restored events. This behavior is normal and to be expected.
After you have restored data, you may notice that events appear in your index that are older than your configured retention period specifies. This restored data will remain in your index for 30 days or until you clear it.
If your attempt to restore archived data fails, verify that the data was not recently archived. Because there is a time period during which data is being transitioned from Splunk Cloud Platform to the archive, you will not be able to restore that data during the processing period. Generally, data moved to the archive is available in approximately 48 hours.
If you want to restore data archived within the last 48 hours, you must explicitly disable the default "Exclude" option for Recently Archived Data in the Restore Archive modal. When set to "Exclude", DDAA skips restoration of data archived within the last 48 hours. See Steps to restore archived data to Splunk Cloud Platform
In addition, ensure that the data is fully archived and the timestamps are correct, or data restoration will fail with the following error message: ''You cannot restore data that was archived less than 48 hours ago. Please try again later''. If you receive this message, you must set the Recently Archived Data mode to "Exclude" to proceed with data restoration. For more information, see Troubleshoot Dynamic Data Active Archive.
What happens when you are finished searching the restored data
After the data is temporarily restored to your Splunk Cloud Platform environment it is available for searching for 30 days. Restored data is a copy of the archived data so you never need to move the data back to the archive, but for best performance, you should remove the temporarily restored data when you have finished searching it.
Temporarily restored data is available only for 30 days. This 30-day time period can't be modified in any way, meaning reduced or extended. Also, this time period restriction applies to all temporarily restored data, regardless of the configuration settings for your deployment's indexers.
Steps to restore archived data to Splunk Cloud Platform
- In Splunk Cloud, go to Settings > Indexes.
- For the index where you want to restore data, click Restore. The menu displays the restore history for the specified index. You can see the history of data restoration and file size for the data restored.
- Use the date picker to select a time range to retrieve.
- Click Check size. Splunk Cloud Platform checks to see if the size of the file might impact performance. If the file size is too large, Splunk Cloud Platform blocks you from restoring data. If there is a potential performance impact, Splunk Cloud Platform displays a warning. Splunk Cloud Platform also prevents you from restoring data that overlaps with existing restored data.
- Enter an email address to send job status notifications. Splunk Cloud Platform will notify you when the restoration is complete.
- (Optional) If your time range includes data archived within the last 48 hours, toggle the Recently Archived Data switch to disable the default Exclude mode. When set to "Exclude" mode, DDAA skips restoration of data archived within the last 48 hours. Note that attempting to restore data that is not fully archived can cause data restoration to fail. For more information, see Troubleshoot Dynamic Data Active Archive.
- Click Restore when you have refined the file size or date range to acceptable limits.
After you initiate data restoration, it can take up to 24 hours before data is restored. If it takes longer than 24 hours, contact Splunk Technical Support.
- To check the status of your data restoration, click Splunk Archive in the Storage Type field to open the Archive page. To view the restore status, click the Restore tab. In the JobStatus field, you can see the status of your job:
- Pending: The job has been submitted, but has not begun processing.
- In progress: The job has been started, and is progressing.
- Success
- Cleared: You've successfully deleted the temporary archive from your index.
- Expired: The restored data has passed the 30 day retention period and has been deleted from the index.
- Failed: If you receive a Failed status, click the > button for the archive to display more details about why the restoration failed.
Steps to remove restored data from Splunk Cloud Platform
Splunk recommends you manually remove restored data when you are finished searching it.
Restored data is a copy of the archived data, so you never need to move the data back to the archive, but for best performance, you should remove the temporarily restored data when you are done searching it.
To remove restored data:
- In Splunk Cloud, go to Settings > Indexes.
- Select the index with data you want to remove and click Restore to open the Restore Archive page.
- For the range of data you want to remove, select Clear in the Actions column.
When the data is successfully removed, the Jobstatus column displays a Cleared status.
Monitor logs during archiving
Splunk generates logs when you archive data and when you restore archived data. You may want to monitor these logs to check for errors during these processes.
Archiving logs
To check for error messages that occur when you are archiving data, you can view the coldstoragearchiver entries in the splunkd.log. You can find these entries by running the following search:
index=_internal source=*/splunkd.log component=coldstoragearchiver
Data restoration logs
To check for error messages that occur when you restore archived data, you can view entries in the splunk_archiver_restoration.log, restoration.log, and python.log. You can find these entries by running the following search:
index=_internal source=*/splunk_archiver_restoration.log
index=_internal source=*/restoration.log
index=_internal source=*/python.log
Manage your archives
You might want to review the status of your archived indexes or understand how much of your entitlement has been used. You can review the status of your archived indexes on the Archived Indexes page.
Steps to review the overall status of your restore requests for the last 90 days
- From Splunk Web, go to Settings > Indexes.
- From the Indexes page, click on a value in the Archive Retention column.
- Click the Restore tab to open the Restore page.
- Review the Restore Summary (90 days) table to see the overall status of your restored data.
Field | Description |
---|---|
Total Restored Data (GB) | The total amount of raw data (uncompressed) that has been restored. This value is updated nightly. |
Total Cleared Data (GB) | The total amount of raw data (uncompressed) that has been deleted from the restored archive. This value is updated nightly. |
Total Expired Data (GB) | The total amount of raw data (uncompressed) that has expired from the restored archive. This value is updated nightly. |
You can view the details for restored archived data from the last 90 days in the table below. For each index, you can see the following details:
Field | Description |
---|---|
Index Name | The name of the restored index. |
Restored Count | The total number of restoration requests, including both successful and failed restore requests. This value also includes cleared and expired restore requests. |
Restored Size (GB) | The total amount of raw data (uncompressed) that has been restored. |
Cleared Count | The total number of restored index requests that have been manually deleted. |
Cleared Size (GB) | The total amount of raw data (uncompressed) that has been manually deleted. |
Expired Count | The total number of restored index requests that have aged out. |
Expired Size | The total amount of restored raw data (uncompressed) that has aged out. |
Steps to review the status of individual restore requests
- From Splunk Web, go to Settings > Indexes.
- From the Indexes page, click on a value in the Archive Retention column.
- Click the Restore tab to open the Restore page.
- Go to the Restore Request History (Last 50 requests) table.
From here, you can see the start time, end time, time of the request, data volume in GB, and the expiration date. To understand the status for each job, check the Job Status field for each index. The following table shows the possible values.
Field | Description |
---|---|
Pending | The request for restoration has been initiated, but has not yet begun. |
In progress | The restoration process has started, but it has not been completed. |
Success | The data has been successfully restored to your index. |
Failure | The restoration failed. Click the > button next to the archive to display more details about the failure. |
Cleared | You have successfully cleared the temporarily restored data. |
Expired | The restored data has passed the 30 day retention threshold. |
After you have reviewed the archived indexes, you can determine what actions you want to take for each archived or restored index. You may want to clear archived data or stop archiving an index. Or you may see that a restoration or archive operation failed and chose to troubleshoot the issue.
Steps to review the overall size and growth of your archived indexes
You might want to review the size and growth of your archived indexes to better understand how much of your entitlement you are consuming. This can help you predict usage and expenses for your archived data.
- From Splunk Web, go to Settings > Indexes.
- From the Indexes page, click on a value in the Archive Retention column.
The Archive Summary page displays the following information:
Field | Description |
---|---|
Total Archive Usage | The total amount of raw data (uncompressed) that is stored in the archive. This number turns red when total archive usage exceeds the total entitlement. This value is updated nightly. |
Total Entitlement | Your total entitlement as determined in your service agreement. |
Total Archive Data Growth (90 Days) | The total amount of raw data (uncompressed) that has been added to the archive in the past 90 days. This value is updated nightly. |
Total Archive Data Expiration (90 Days) | The total amount of raw data (uncompressed) that has aged out of the archive within the past 90-day window. This value is updated nightly. Note that each index has an archive retention setting and the data ages out over time. For example, index A has 2-year archive retention. Every night for that index, Splunk ages out the data that is older than 2 years. |
Steps to review the size and growth of each archived index
You might want to review the size and growth of each index to understand how much it grows over time.
- From Splunk Web, go to Settings > Indexes.
- From the Indexes page, click on a value in the Archive Retention column.
The Archive Summary page displays the following information:
Field | Description |
---|---|
Index Name | Name of the index. |
Current Size (GB) | The current amount of raw data (uncompressed) that is stored in the archive for each index. |
Earliest Event | The earliest event in the archived index. |
Latest Event | The latest event in the archived index. |
90-Day Data Growth (GB) | The amount of raw data (uncompressed) that has been added to the archive in the past 90 days for each index. |
90-Day Data Expiration (GB) | The amount of raw data (uncompressed) that has been removed from the archive after 90 days for each index. |
Troubleshoot Dynamic Data Active Archive
I received an error when attempting to restore data
If an error occurs, the error is logged to the splunkd.log
. When you review the Archive page, if you experience errors, you may want to review the splunkd.log
and specify the coldstoragearchiver
component here: index=_internal source=*/splunkd.log component=coldstoragearchiver
I clicked the Check Size button and nothing happened
When restoring data, I clicked the Check Size button multiple times and nothing happened.
Diagnosis
When restoring a large amount of data, it may take some time for Splunk to verify that the size of the data can be restored without causing performance issues. If you click the Check Size buttons multiple times, it may trigger AWS to block the check process.
Solution
Do not click the Check Size button multiple times if you don't immediately receive feedback.
I archived some of my data. When I attempted to restore it a few hours later, an error message appeared.
When I archived data and attempted to restore it soon after, I received an error message.
Diagnosis
Data can take up to 48 hours to archive. If you attempt to restore the data before this time period completes, the restoration will fail and the following error message appears: You cannot restore data that was archived less than 48 hours ago. Please try again later.
Solution
Make sure the Recently Archived Data switch in the Restore Data modal is set to Exclude. When set to Exclude, DDAA skips restoration of all data archived within the last 48 hours. See Steps to restore archived data to Splunk Cloud Platform. OR
Wait until the 48 hour threshold has been met, and then attempt to restore the data. You can check the status of the archival process as follows:
- Run the following search against the internal log to determine when Splunk Cloud Platform started the archival process:
index=_internal component=ColdStorageArchiver "Successfully executed archiving script"
I'm trying to restore fully archived data, but I'm still receiving an error message about data archival.
I'm trying to restore data and the archival process completed more than 48 hours ago, but I'm receiving the following error message: You cannot restore data that was archived less than 48 hours ago. Please try again later.
Diagnosis
Receiving this error message after the archival process is complete indicates that there are incorrect timestamps in the data.
Solution
Contact Splunk Technical Support for help with correcting the timestamp data.
Store expired Splunk Cloud Platform data in your private archive | Manage indexes on Splunk Cloud Platform Classic Experience |
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403 (latest FedRAMP release), 9.2.2406
Feedback submitted, thanks!