Splunk® Data Stream Processor

Install and administer the Data Stream Processor

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Data retention policies

Starting in DSP 1.1.0, DSP uses Apache Pulsar as the message bus for all data sent to the Splunk Firehose. This means that all data ingestion methods that are a subset of the Splunk Firehose (Ingest service, Forwarders service, Collect service, DSP HEC, Syslog) use Pulsar as the message bus.

All data received by Splunk Firehose is stored in a Pulsar topic for 24 hours. The oldest data in the topic gets deleted first. You can adjust the data retention policy by following the steps described in the "Set retention policy" on this topic.

For more information about Apache Pulsar and its data retention policies, see Message retention and expiry in the Apache Pulsar documentation.

Data loss can occur if a pipeline is down for longer than the retention policy.

Set retention policy

  1. From the DSP directory of a master node, log in to the Pulsar broker pod.
    $ kubectl exec -it broker-0 -n pulsar /bin/bash
  2. Navigate to the pulsar/ directory.
    $ cd /streamlio/pulsar/
  3. Update the retention policy.
    $ ./bin/pulsar-admin namespaces set-retention --time <TIME> --size <SIZE> DSP/default-ingest
    Flag Description Examples
    --time The retention time in minutes, hours, days, or weeks. Set to 0 for no retention and -1 for infinite time retention. Defaults to 24 hours. 100m, 3h, 2d, 5w
    --size The retention size limit. Set to 0 for no retention or -1 for infinite size retention. Defaults to 0. 10M, 16G, 3T

Get retention policy

You can get the retention policy for a namespace by specifying the namespace. The output will be a JSON object with two keys: retentionTimeInMinutes and retentionSizeInMB.

To see the current retention policy:

  1. From the DSP directory of a master node, log in to the Pulsar broker pod.
    $ kubectl exec -it broker-0 -n pulsar /bin/bash
  2. Navigate to the pulsar/ directory.
    $ cd /streamlio/pulsar/
  3. Run the following command to see the current retention policy.
    $ ./bin/pulsar-admin namespaces get-retention DSP/default-ingest

A response containing retentionTimeInMinutes and retentionSizeInMB is returned.

{
  "retentionTimeInMinutes" : 1440,
  "retentionSizeInMB" : 0
}
Last modified on 07 October, 2020
PREVIOUS
Configure connections to external services
  NEXT
About the DSP Dashboards

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters