Batch Events
Batches events by count or milliseconds. Batching events, as opposed to sending each event individually, can increase throughput by reducing the quantity of data sent. However, batching events can also increase latency because events must be held until the batch is ready to be sent. Use the Batch Events function just before your sink function to optimize performance.
Because Batch Events sends events in batches, and the Write Splunk Index and Write to Splunk Enterprise sink functions set index per event, the index that you specify in your sink function gets applied to the entire batch of your events. If you want to route your data to different indexes while batching events, you need to create a branch per index you want to send data to. See optimize performance for more information.
- Function Input
- collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
- collection<map<string,any>>
Arguments
Argument | Input | Description | UI example |
---|---|---|---|
num-events | expression<long> | The maximum number of events to send per batch. | 2000 |
millis | expression<long> | The interval, in milliseconds, at which to send batched events. | 2000 |
DSL examples
1. Group events into batches of either 2 events or after 2 seconds has passed in each batch:
batch-events(input, 2, 2000);
The following is an example of batched data. Assume that your data looks something like the following snippet, and you've configured your function with the same arguments as shown previously.
[ <"name": "record1", "timestamp": "1s">, <"name": "record2", "timestamp": "2s">, <"name": "record3", "timestamp": "2s">, <"name": "record4", "timestamp": "5s">, <"name": "record5", "timestamp": "5s">, ... ]
The batch-events
function sends your events thusly.
[ [ {"name": "record1", "timestamp": "1s"}, {"name": "record2", "timestamp": "2s"} ], [ {"name": "record3", "timestamp": "2s"} ], [ {"name": "record4", "timestamp": "5s"}, {"name": "record5", "timestamp": "5s"} ], ... ]
2. Batch events that need to be routed to two different indexes:
events = read-splunk-firehose(); processed-events = [do some processing]; events-for-idx1 = filter(processed-events, eq("sourcetype1", get("source_type"))); events-for-idx2 = filter(processed-events, eq("sourcetype2", get("source_type"))); batched-events1 = batch-events(events-for-idx1, 500, 10000); batched-events2 = batch-events(events-for-idx2, 500, 10000); write-splunk-enterprise(batched-events1, "my-conx-id", "my-index-1", {}); write-splunk-enterprise(batched-events2, "my-conx-id", "my-index-2", {});
Aggregate and Trigger | Bin |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0
Feedback submitted, thanks!