Splunk® Data Stream Processor

Function Reference

On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Stats

This topic describes how to use the function in the .

Description

Applies one or more aggregation functions on a stream of events in a specified time window. The events must be grouped by one or more fields. This function returns a single value. Best practices are to limit window sizes to 24 hours or less and have a slide that is no smaller than 1/6th of your window size. For example, for a window size of 1 minute, make your window slide at least 10 seconds. This function accepts a variable number of arguments.

View configurations for the stats function by highlighting the function in the UI and clicking View Configurations. In the View Configurations tab, you can check what the original fields are for the data coming in to the stats function in the left sidebar, edit the function's arguments in the UI form, and see the outputted fields for data coming out of the stats function in the right sidebar.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs the same collection of records but with a different schema S.

Syntax

The required syntax is in bold.

stats
aggregations(field) [AS field]
[BY field-name], [span(timestamp, size, slide, grace period)]

Required arguments

by
Syntax: collection<expression<any>>
Description: The field values by which to group events.
Example in Canvas View: body
timestamp
Syntax: timestamp=expression<long>
Description: The field name where your record's timestamps are located.
Example in Canvas View: timestamp
size
Syntax: size=<long>
Description: The window length, in milliseconds, to group events. For a list of valid timescales, see the "Timescales" section.
Example in Canvas View: 60 seconds
aggregations
Syntax: aggregations=collection<expression<any>>
Description: An aggregation function to apply on your events.

Optional arguments

window
Syntax: sliding | tumbling
Description: A windowing method. You do not need to specify this argument in the SPL2 Builder. Instead, the infers the windowing method based on the size and slide fields. If the size and slide fields are equal to each other, then the windowing method is tumbling. If the size is larger than the slide, then the windowing method used is sliding. See the table for more details on the two windowing methods.
Example in Canvas View: tumbling
Windowing method Description
tumbling A tumbling window slices up time into segments based on the length of the provided window size. With the tumbling option, data in one window will not overlap with data in another window. At the start of each window, any aggregations are restarted.

The window does not include the right-most edge. For example, starting at Timestamp=1:00PM for a window size W=5 minutes, the windows would be [1:00PM - 1:05PM), [1:05PM - 1:10PM), [1:10PM - 1:15PM), ..., etc.

sliding Similar to a tumbling window, a sliding window slices up time into segments based on a provided window size but also uses an additional window slide parameter to control how frequently a sliding window is started. Therefore, sliding windows can be overlapping if the slide is smaller than the window size.

For example: Starting at Timestamp=1:00PM for a window size W=5 minutes and window slide S=2 minutes, the windows would be [1:00PM - 1:05PM], [1:02PM - 1:07PM], [1:04PM - 1:09PM], ..., etc.

slide
Syntax: slide=<long>
Description: The amount of time, in milliseconds, to wait before starting a new window. For a list of valid timescales, see the "Timescales" section.
Example in Canvas View: 60 seconds.
grace-period
Syntax: grace-period=<long>
Description: The amount of time, in milliseconds, to wait for late-arriving events. In some cases, you may have some events that arrive after the latest time window boundary. This setting allows you to specify an amount of time to wait for any late-arriving events for the time window. If specified, this argument affects when the windows close. For example, if you have a window of 1 hour (10:00:00AM - 11:00:00AM) with a grace period of 5 minutes, then the window won't close until the pipeline receives an event with timestamp >= 11:05:00AM. When you assign a grace period, the pipeline tolerates out of order events. See the How are time windows calculated? section for a more detailed explanation about how time windows are determined and how the grace period argument is used.
Example in Canvas View: 10 seconds.

Usage

This section contains additional usage information about the Stats function.

How are time windows calculated?

The Stats function has no concept of wall clock time, and the passage of time is based on the timestamps of incoming records. The Stats function tracks the latest timestamp it received in the stream as the "current" time, and it determines the start and end of windows using this timestamp. Once the difference between the current timestamp and the start timestamp of the current window is greater than the window length, that window is closed and a new window starts.

However, since records may arrive out of order, the grace period argument allows the previous window W to remain "open" for a certain period G after its closing timestamp T. Until the Stats function receives a record with a timestamp C where C > T + G, any incoming records with timestamp less than T are counted towards the previous window W. Once a record with timestamp > C is received, the window is closed and a new window is opened. If the Stats function never receives a record with timestamp >= C, then the window will remain open.

To illustrate this, consider the following example where you have a sliding window with a window size of 1 hour (10:00:00AM - 11:00:00AM) and two events that have timestamps 11:01:00AM and 10:59:00AM respectively. The event with timestamp 10:59:00AM arrives after the event with timestamp 11:01:00AM.

  • If the grace-period is set to 0, then the pipeline does not tolerate out of order data. When the event with timestamp 11:01:00AM arrives, the window will close and output results. The event with timestamp 10:59:00AM is dropped.
  • If the grace-period is set to 5 minutes, then the pipeline tolerates out of order data. Events with timestamps up to 5 minutes past the window are permitted. When the event with timestamp 11:01:00AM arrives, the window remains open and the event with timestamp 11:01:00AM is added to the next window: 11:00:00AM - 12:00:00PM. The event with timestamp 10:59:00AM then arrives and is added to the window 10:00:00AM - 11:00:00AM. The 10:00:00AM - 11:00:00AM window closes and outputs results when an event with timestamp >=11:05:00 arrives.

When previewing data on a stats function, you will only see data once a window has closed.

Windowing methods example

In the following example, we'll take a look at how your records appear in the using different windowing methods and function configurations. Assume that you have seven records with the following timestamps entering the stats function in this order.

{"body": "Event 1", "timestamp": Jul 26 9:00 AM}
{"body": "Event 2", "timestamp": Jul 26 9:30 AM}
{"body": "Event 3", "timestamp": Jul 27 9:00 AM}
{"body": "Event 4", "timestamp": Jul 26 1:00 PM}
{"body": "Event 5", "timestamp": Jul 26 1:30 PM}
{"body": "Event 6", "timestamp": Jul 26 10:15 AM}
{"body": "Event 7", "timestamp": Jul 27 10:00 AM}

Using a tumbling window method

Consider a tumbling window with window size of 1 hour and grace period of 30 minutes.

Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM is started.

Event 2: The event is included in the currently open window, Jul 26 9:00 AM - Jul 26 10:00 AM.

Event 3: Timestamp Jul 27 9:00 AM is later than Jul 26 9:00 AM - Jul 26 10:00 AM, so the window is closed. It is also greater than Jul 26 10:00 AM + 30min so the grace period is also closed. A new window for Jul 27 9:00 AM - Jul 27 10:00 AM is started.

Event 4: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM and there are no windows with open grace periods.

Event 5: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM and there are no windows with open grace periods.

Event 6: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM and there are no windows with open grace periods.

Event 7: A new window Jul 27 10:00 AM - Jul 27 11:00 AM is started. Grace period of the Jul 27 9:00 AM - Jul 27 10:00 AM will be open until any timestamp greater than Jul 27 10:30 AM is observed.

Using a sliding window method

Consider a sliding window with a window size of 1 hour, window slide of 30 minutes and grace period of 24 hours.

Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM is started.

Event 2: Counted towards Jul 26 9:00 AM - Jul 26 10:00 AM. A new window for Jul 26 9:30 AM - Jul 26 10:30 AM is also started.

Event 3: The timestamp is greater than the two open windows above, but their grace periods remain open until Jul 27 10:00 AM and Jul 27 10:30 AM respectively. This event contributes towards a new window Jul 27 9:00 AM - Jul 27 10:00 AM.

Event 4: Current maximum timestamp is Jul 27 9:00 AM which is less than Jul 26 2:00 PM + 24 hours, so this event counts towards the Jul 26 1:00 PM - Jul 26 2:00 PM window.

Event 5: Counted towards the Jul 26 1:00 PM - Jul 26 2:00 PM window and Jul 26 1:30 PM - Jul 26 2:30 PM. Same reason as event 4.

Event 6: Counted towards the window Jul 26 9:30 AM - Jul 26 10:30 AM, since its grace period is still open.

Event 7: Counted towards a new window Jul 27 10:00 AM - Jul 27 11:00 AM. Grace period of Jul 26 9:00 AM - Jul 26 10:00 AM window is closed.

Timescales

The following are valid timescales for the size, slide, and grace period arguments.

Time scale Syntax Description
<sec> s | sec | secs | second | seconds Time scale in seconds.
<min> m | min | mins | minute | minutes Time scale in minutes.
<hr> h | hr | hrs | hour | hours Time scale in hours.
<day> d | day | days Time scale in days.
<subseconds> ms | cs | ds Time scale in milliseconds (ms), centiseconds (cs), or deciseconds (ds)

SPL2 example

When working in the SPL View, you can write the function by providing the arguments in this exact order.

Return the status code for each host

... | stats count(host) AS HostPerStatus BY status_code, span(timestamp, 50s, 10s);
Last modified on 25 March, 2022
Select   To Splunk JSON

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0, 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters