Splunk® Data Stream Processor

Function Reference

On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Aggregate

Applies one or more aggregation functions on a stream of events in a specified time window. The events must be grouped by one or more fields. This function returns a single value. Best practices are to limit window sizes to 24 hours or less and have a slide that is no smaller than 1/6th of your window size. For example, for a window size of 1 minute, make your window size at least 10 seconds. This function accepts a variable number of arguments..

View configurations for the aggregate function by highlighting the function in the Data Pipelines UI and clicking View Configurations. In the View Configurations tab, you can check what the original fields are for the data coming in to the aggregate function in the left sidebar, edit the function's arguments in the UI form, and see the outputted fields for data coming out of the aggregate function in the right sidebar.

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs the same collection of records but with a different schema S.

Note: The aggregation function has no concept of wall clock time, and the passage of time is based on the timestamps of incoming records. The aggregation function tracks the latest timestamp it received in the stream as the "current" time, and it determines the start and end of windows using this timestamp. Once the difference between the current timestamp and the start timestamp of the current window is greater than the window length, that window is closed and a new window starts. However, since events may arrive out of order, the grace period argument allows the previous window W to remain "open" for a certain period G after its closing timestamp T. Until we receive a record with a timestamp C where C > T + G, any incoming events with timestamp less than T are counted towards the previous window W.

Arguments

Argument Input Description UI example
by collection<expression<any>> Choose field values by which to group events. field/expression: host, output field: host
timestamp expression<long> The field name where your record's timestamps are located. timestamp
size long The window length in milliseconds to group events. 60 seconds
slide long The amount of time, in milliseconds, to wait before starting a new window. 60 seconds
grace-period long The amount of time, in milliseconds, to wait for late-arriving events. 10 seconds
aggregations collection<expression<any>> Apply one or more aggregation functions. New Aggregation: count , field/expression: source, output field: number_events_per_source.

DSL example

Return the status code for each host

aggregate(events, by: as(get("status_code"), "status_code"), timestamp: get("ts"), size: 50L, slide: 10L, grace-period: 0L, aggregations: count(get("host"), "HostPerStatus"));
Last modified on 02 January, 2020
How to use the Function Reference   Aggregate with Trigger

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters