Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

Stats

This topic describes how to use the function in the Splunk Data Stream Processor.

Description

Applies one or more aggregation functions on a stream of events in a specified time window. The events must be grouped by one or more fields. This function returns a single value. Best practices are to limit window sizes to 24 hours or less and have a slide that is no smaller than 1/6th of your window size. For example, for a window size of 1 minute, make your window slide at least 10 seconds. This function accepts a variable number of arguments.

View configurations for the stats function by highlighting the function in the Data Pipelines UI and clicking View Configurations. In the View Configurations tab, you can check what the original fields are for the data coming in to the stats function in the left sidebar, edit the function's arguments in the UI form, and see the outputted fields for data coming out of the stats function in the right sidebar.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs the same collection of records but with a different schema S.

Syntax

The required syntax is in bold.

stats
aggregations(field) [AS field]
[BY field-name], [span(timestamp, window, size, slide, grace period)]

Required arguments

By
Syntax: collection<expression<any>>
Description: The field values by which to group events.
UI Example: body
Timestamp
Syntax: timestamp=expression<long>
Description: The field name where your record's timestamps are located.
UI Example: timestamp or get("timestamp");
Size
Syntax: size=<long>
Description: The window length, in milliseconds, to group events. For a list of valid timescales, see the "Timescales" section.
UI Example: 60 seconds
Aggregations
Syntax: aggregations=collection<expression<any>>
Description: An aggregation function to apply on your events.

Optional arguments

Window
Syntax: sliding | tumbling
Description: A windowing method. See the table for more details on the two windowing methods.
Default: Tumbling
UI Example: Tumbling
Windowing method Description
Tumbling A tumbling window slices up time into segments based on the length of the provided window size. With the tumbling option, data in one window will not overlap with data in another window. At the start of each window, any aggregations are restarted.

The window does not include the right-most edge. For example, starting at Timestamp=1:00PM for a window size W=5 minutes, the windows would be [1:00PM - 1:05PM), [1:05PM - 1:10PM), [1:10PM - 1:15PM), ..., etc.

Sliding Similar to a tumbling window, a sliding window slices up time into segments based on a provided window size but also uses an additional window slide parameter to control how frequently a sliding window is started. Therefore, sliding windows can be overlapping if the slide is smaller than the window size.

For example: Starting at Timestamp=1:00PM for a window size W=5 minutes and window slide S=2 minutes, the windows would be [1:00PM - 1:05PM], [1:02PM - 1:07PM], [1:04PM - 1:09PM], ..., etc.

Slide
Syntax: slide=<long>
Description: The amount of time, in milliseconds, to wait before starting a new window. For a list of valid timescales, see the "Timescales" section.
UI Example: 60 seconds.
Grace Period
Syntax: grace-period=<long>
Description: The amount of time, in milliseconds, to wait for late-arriving events. In some cases, you may have some events that arrive after the latest time window boundary. For example, if you have a window size of 1 hour (10:00:00AM - 11:00:00AM), an event with timestamp 10:59:00 might come in 2 minutes later at 11:01:00. This setting allows you to specify an amount of time to wait for any late-arriving events for the time window. For a list of valid timescales, see the "Timescales" section.
UI Example: 10 seconds.

Usage

This section contains additional usage information about the Stats function.

How are time windows calculated?

The stats function has no concept of wall clock time, and the passage of time is based on the timestamps of incoming records. The stats function tracks the latest timestamp it received in the stream as the "current" time, and it determines the start and end of windows using this timestamp. Once the difference between the current timestamp and the start timestamp of the current window is greater than the window length, that window is closed and a new window starts.

However, since records may arrive out of order, the grace period argument allows the previous window W to remain "open" for a certain period G after its closing timestamp T. Until the stats function receives a record with a timestamp C where C > T + G, any incoming records with timestamp less than T are counted towards the previous window W. Once a record with timestamp > C is received, the window is closed and a new window is opened. If the stats function never receives a record with timestamp >= C, then the window will remain open.

When previewing data on a stats function, you will only see data once a window has closed.

Windowing methods example

In the following example, we'll take a look at how your records appear in DSP using different windowing methods and function configurations. Assume that you have seven records with the following timestamps entering the stats function in this order.

{"body": "Event 1", "timestamp": Jul 26 9:00 AM}
{"body": "Event 2", "timestamp": Jul 26 9:30 AM}
{"body": "Event 3", "timestamp": Jul 27 9:00 AM}
{"body": "Event 4", "timestamp": Jul 26 1:00 PM}
{"body": "Event 5", "timestamp": Jul 26 1:30 PM}
{"body": "Event 6", "timestamp": Jul 26 10:15 AM}
{"body": "Event 7", "timestamp": Jul 27 10:00 AM}

Using a tumbling window method

Consider a tumbling window with window size of 1 hour and grace period of 30 minutes.

Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM is started.

Event 2: The event is included in the currently open window, Jul 26 9:00 AM - Jul 26 10:00 AM.

Event 3: Timestamp Jul 27 9:00 AM is later than Jul 26 9:00 AM - Jul 26 10:00 AM, so the window is closed. It is also greater than Jul 26 10:00 AM + 30min so the grace period is also closed. A new window for Jul 27 9:00 AM - Jul 27 10:00 AM is started.

Event 4: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM and there are no windows with open grace periods.

Event 5: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM and there are no windows with open grace periods.

Event 6: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM and there are no windows with open grace periods.

Event 7: A new window Jul 27 10:00 AM - Jul 27 11:00 AM is started. Grace period of the Jul 27 9:00 AM - Jul 27 10:00 AM will be open until any timestamp greater than Jul 27 10:30 AM is observed.

Using a sliding window method

Consider a sliding window with a window size of 1 hour, window slide of 30 minutes and grace period of 24 hours.

Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM is started.

Event 2: Counted towards Jul 26 9:00 AM - Jul 26 10:00 AM. A new window for Jul 26 9:30 AM - Jul 26 10:30 AM is also started.

Event 3: The timestamp is greater than the two open windows above, but their grace periods remain open until Jul 27 10:00 AM and Jul 27 10:30 AM respectively. This event contributes towards a new window Jul 27 9:00 AM - Jul 27 10:00 AM.

Event 4: Current maximum timestamp is Jul 27 9:00 AM which is less than Jul 26 2:00 PM + 24 hours, so this event counts towards the Jul 26 1:00 PM - Jul 26 2:00 PM window.

Event 5: Counted towards the Jul 26 1:00 PM - Jul 26 2:00 PM window and Jul 26 1:30 PM - Jul 26 2:30 PM. Same reason as event 4.

Event 6: Counted towards the window Jul 26 9:30 AM - Jul 26 10:30 AM, since its grace period is still open.

Event 7: Counted towards a new window Jul 27 10:00 AM - Jul 27 11:00 AM. Grace period of Jul 26 9:00 AM - Jul 26 10:00 AM window is closed.

Timescales

The following are valid timescales for the size, slide, and grace period arguments.

Time scale Syntax Description
<sec> s | sec | secs | second | seconds Time scale in seconds.
<min> m | min | mins | minute | minutes Time scale in minutes.
<hr> h | hr | hrs | hour | hours Time scale in hours.
<day> d | day | days Time scale in days.
<month> mon | month | months Time scale in months.
<subseconds> ms | cs | ds Time scale in milliseconds (ms), centiseconds (cs), or deciseconds (ds)

SPL2 example

Return the status code for each host

... | stats count(host) AS HostPerStatus BY status_code, span(timestamp, 50s, 10s);
Last modified on 06 November, 2020
PREVIOUS
Select
  NEXT
To Splunk JSON

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0, 1.2.0


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters