All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.
Data retention policies
Data retention policies are sets of rules that determine how long data remains available for consumption from a message queue. Typically, streamed data is retained in an available state until a specified interval of time has passed or a maximum amount of retained data is exceeded. If a DSP pipeline is down for longer than this data retention period, data loss can occur.
The retention policies that apply to the data being ingested into DSP pipelines vary depending on the source of the data. Different message queues may be used to handle data from different sources, and as a result, different data retention policies apply.
- If the data comes from a source that is supported by the Splunk DSP Firehose or one of its subset ingestion methods (Ingest service, Forwarders service, Collect service, DSP HEC, and Syslog), then the data is subject to the retention policies configured in the Apache Pulsar message bus used by the Splunk DSP Firehose.
- If the data comes from another source, then the data is subject to the retention policies configured in that data source.
The following table describes how retention policies are determined for each type of data source, and where to find more information about each policy:
Type of data source | What determines the data retention policy | For more information |
---|---|---|
Data sources supported by the Splunk DSP Firehose, which include:
|
The configuration of the Apache Pulsar message bus in DSP. | See the Splunk DSP Firehose retention policies section on this page. |
Amazon Kinesis Data Streams | The configuration of the Kinesis data stream. | Search for "Changing the Data Retention Period" in the Amazon Kinesis Data Streams Developer Guide. |
Apache or Confluent Kafka | The configuration of the Kafka topic. If retention policies are not configured on the topic, then default policies on the broker are used instead. | Search for "Topic-Level Configs" in the Apache Kafka documentation, or "Topic Configurations" in the Confluent Kafka documentation. |
Apache Pulsar | The configuration of the namespace that the Pulsar topic belongs to. | Search for "Message retention and expiry" in the Apache Pulsar documentation. |
Google Cloud Pub/Sub | The configuration of the subscription. | Search for "Managing Subscriptions" in the Google Cloud Pub/Sub documentation. |
Microsoft Azure Event Hubs | The configuration of the event hub. | Search for "Azure Event Hubs quotas and limits" and "Create an event hub" in the Event Hubs documentation. |
Splunk DSP Firehose retention policies
Starting in DSP 1.1.0, DSP uses Apache Pulsar as the message bus for all data sent to the Splunk DSP Firehose. This means that all data ingestion methods that are a subset of the Splunk DSP Firehose (Ingest service, Forwarders service, Collect service, DSP HEC, Syslog) use Pulsar as the message bus.
By default, all data received by Splunk DSP Firehose is stored in a Pulsar topic for 24 hours. The oldest data in the topic gets deleted first. You can adjust the data retention policy by following the steps described in the Set the Splunk DSP Firehose retention policy section.
For more information about Apache Pulsar and its data retention policies, search for "Message retention and expiry" in the Apache Pulsar documentation.
Set the Splunk DSP Firehose retention policy
- From the DSP directory of a master node, log in to the Pulsar broker pod.
$ kubectl exec -it broker-0 -n pulsar /bin/bash
- Navigate to the
pulsar/
directory.$ cd /streamlio/pulsar/
- Update the retention policy.
$ ./bin/pulsar-admin namespaces set-retention --time <TIME> --size <SIZE> DSP/default-ingest
Flag Description Examples --time The retention time in minutes, hours, days, or weeks. Set to 0 for no retention and -1 for infinite time retention. Defaults to 24 hours. 100m, 3h, 2d, 5w --size The retention size limit. Set to 0 for no retention or -1 for infinite size retention. Defaults to 0. 10M, 16G, 3T
Get retention policy
You can get the retention policy for a namespace by specifying the namespace. The output will be a JSON object with two keys: retentionTimeInMinutes
and retentionSizeInMB
.
To see the current retention policy:
- From the DSP directory of a master node, log in to the Pulsar broker pod.
$ kubectl exec -it broker-0 -n pulsar /bin/bash
- Navigate to the
pulsar/
directory.$ cd /streamlio/pulsar/
- Run the following command to see the current retention policy.
$ ./bin/pulsar-admin namespaces get-retention DSP/default-ingest
A response containing retentionTimeInMinutes
and retentionSizeInMB
is returned.
{ "retentionTimeInMinutes" : 1440, "retentionSizeInMB" : 0 }
Configure connections to external services | Resizing a cluster by adding or removing nodes |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5, 1.3.0, 1.3.1
Feedback submitted, thanks!