All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.
Batch Records
This topic describes how to use the function in the Splunk Data Stream Processor.
Description
The Batch Records function batches records by count or milliseconds. Batching records, as opposed to sending each record individually, can increase throughput by reducing the quantity of data sent. However, batching records can also increase latency because records must be held until the batch is ready to be sent.
There are two functions for batching records: Batch Records and Batch Bytes. Use Batch Records when you do not want to serialize your data or you want to perform serialization after batching. Use Batch Bytes when you want to serialize your data before batching.
Function Input/Output Schema
- Function Input
- collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
- collection<record<schema<batch: collection<map<string,any>>>>>
Syntax
- batch_records
- num_events=<long>
- millis=<long>
Required arguments
- num_events
- Syntax: expression<long>
- Description: The maximum number of records to send per batch.
- Default: 100,000,000
- Example: 2000
- millis
- Syntax: expression<long>
- Description: The interval, in milliseconds, at which to send batched records. Cannot exceed 8000 milliseconds (8 seconds).
- Default: 2000 milliseconds (2 seconds).
- Example: 2000
Usage
The following is an example of batched data. Assume that your data looks something like the following snippet, and you've configured your function with the arguments as shown in the SPL2 example.
[ <"name": "record1", "timestamp": "1s">, <"name": "record2", "timestamp": "2s">, <"name": "record3", "timestamp": "2s">, <"name": "record4", "timestamp": "5s">, <"name": "record5", "timestamp": "5s">, ... ]
The batch_records
function sends your records thusly.
[ [ {"name": "record1", "timestamp": "1s"}, {"name": "record2", "timestamp": "2s"} ], [ {"name": "record3", "timestamp": "2s"} ], [ {"name": "record4", "timestamp": "5s"}, {"name": "record5", "timestamp": "5s"} ], ... ]
Example
An example of a common use case follows. These examples assume that you have added the function to your pipeline.
SPL2 Example: Group records into batches of either 2 records or after 2 seconds has passed in each batch
This example assumes that you are in the SPL View.
... | batch_records num_events=2L millis=2000L |...;
Batch Bytes | Bin |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6
Feedback submitted, thanks!