Batch Bytes
This topic describes how to use the function in the .
Description
Batches incoming byte arrays by size, count, or milliseconds and outputs a single byte array concatenated by a user defined separator.
There are two functions for batching records: Batch Bytes and Batch Records. Use Batch Bytes when you need to serialize your data before batching. Use Batch Records when you want to do serialization (if any) after batching. Batch Bytes concatenates the batched byte array payloads delimited by an optional user-defined separator. This means that Batch Bytes is best used for row-based style batched payloads, such as CSV or JSON streaming (concatenated JSON records), but it will not be useful for bulk or columnar formats.
Function Input/Output Schema
- Function Input
- byte[]
- This function takes in byte arrays to be batched.
- Function Output
- byte[]
- This function outputs a byte array that is a concatenation of all byte[] payloads.
Syntax
The required syntax is in bold.
- batch_bytes
- bytes: <byte array>
- separator: <string>
- size: <string> B | KB | MB
- num_events: <long>
- millis: <long>
Required arguments
- bytes
- Syntax: byte[]
- Description: The byte array payload to be batched.
Optional arguments
- separator
- Syntax: <string>
- Description: A delimiter that separates the byte payloads.
- Example in Canvas View: \\n
- size
- Syntax: <string> B | KB | MB
- Description: The maximum size, in bytes, of the emitted batched byte[]. The size of your emitted batched bytes cannot exceed 100 MB.
- Default: 10MB
- Example in Canvas View: 1024B
- num_events
- Syntax: <long>
- Description: The maximum number of payloads per batch.
- Default: 100,000,000
- Example in Canvas View: 2000
- millis
- Syntax: <long>
- Description: The interval, in milliseconds, at which to send batched records.
- Default: 10000 milliseconds (10 seconds).
- Example in Canvas View: 2000
Usage
The following is an example of batched data. Assume that your data looks something like the following snippet, and you've configured your function with the arguments shown in the SPL2 example.
{"event": "my data 1", "index": "syslog", "sourcetype": "syslog"} {"event": "my data 2", "index": "secindex", "sourcetype": "security"} {"event": "my data 3", "index": "proxy", "sourcetype": "squid"} ....
When the batch trigger fires, Batch Bytes
outputs a single record with a byte array.
Record { "bytes": byte[] }
The outputted byte array is the following string as UTF-8 bytes.
{"event": "my data 1", "index": "syslog", "sourcetype": "syslog"}{"event": "my data 2", "index": "secindex", "sourcetype": "security"}{"event": "my data 3", "index": "proxy", "sourcetype": "squid"}
SPL2 examples
The following examples in this section assume that you are in the SPL View.
Suppose you have Splunk JSON events outputted by the to_splunk_json
function. To batch bytes of those Splunk JSON events and only send payloads once they reach 5MB or 2 minutes have passed:
... | to_splunk_json index=cast(map_get(attributes, "index"), "string") | batch_bytes bytes=to_bytes(json) size="5MB" millis="120000" |...;
Apply Timestamp Extraction | Batch Records |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0, 1.2.0, 1.2.1-patch02
Feedback submitted, thanks!