Resiliency and queueing in Ingest Processor

Ingest Processor has been architected so that deployments with the persistent queue feature enabled are assured that their data is retained and safe. If the Ingest Processor service is down, the data is automatically stored in a queue. The retention period for this queue is 14 days for both the Essentials and Premier subscriptions. Queued data is processed automatically when the service is brought up again.

Queued data is stored on the hard drive of the Ingest Processor index. By default, the queue is configured to hold up to 10,000 batches of events. Depending on which receiver you use, each batch can contain various amounts of events ranging from 1 to 128 events. The amount of data contained and how quickly the queue fills up varies depending on the rate at which the Ingest Processor is receiving data.

If your pipeline uses either the branch or route command and one of the queues for your destination is full, then data may be delivered more than once for the other healthy destinations causing data duplication.

Persistent queues

Ingest Processor supports persistent queues (PQ) alongside in-memory queues to prevent data loss during system congestion. When congestion occurs, Ingest Processor temporarily stores data by writing it to disk. Once the system returns to normal operation, Ingest Processor automatically forwards the stored data from these persistent queues.

Data ingested by Ingest Processor is stored in memory while it passes through the processing pipeline. Once an event has left the processor and is delivered to the exporter, it is queued on disk. If there is an error in processing the data or routing the data to a destination (for example, Splunk Observability Cloud, Amazon S3, or splunkd), the processing and routing is retried for a few times. If the errors continue to persist, the data gets stored in a dead letter queue (DLQ). The event will remain in the disk-backed queue until the exporter successfully sends it to the destination.

Data for a given partition may be processed by multiple pipelines at the same time. Ingest Processor pipelines are unaware of other Ingest Processor pipelines and data is never synchronized or otherwise reconciled between pipelines.

Potential data retention challenges

The persistent queue feature helps to reduce potential data retention challenges in the following scenarios:

Extended Service Downtime
- If service remains down for several days, and so much data gets accumulated that data volume exceeds processing capacity before the end of the retention period, then data could be dropped.
Sudden burst of incoming data.
Processing errors due to issues with dependent services.
Errors due to data schema mismatches.
Slow or temporary unavailability of destinations (Splunk Observability Cloud, splunkd etc.)
Error writing to a data destination due to incorrect or expired tokens or certificates.

Potential error processing path

If the Ingest Processor service runs into errors while processing events or while sending the results to external destinations, then the events that had errors are stored in a dead letter queue (DLQ). The retention period for DLQ is 3 days for Essentials and 14 days for Premier. The data in the DLQ is reprocessed periodically every 24 hours.

Resiliency and queueing in Ingest Processor

Persistent queues

Potential data retention challenges

Potential error processing path

Comments

Resiliency and queueing in Ingest Processor

Was this topic useful?