Splunk Stream

User Manual

Acrobat logo Download manual as PDF


This documentation does not apply to the most recent version of Splunk Stream. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Stream aggregation methods

Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network.

When you apply aggregation to a stream, only the aggregated data is sent to indexers. Using aggregation can thus help you decrease both storage requirements and license usage.

Stream aggregate functions

Splunk Stream supports a subset of the aggregate functions provided by the SPL (Splunk Processing Language) stats command to calculate statistics based on fields in your network event data. You can apply aggregate functions to your data when you configure a stream in the Configure Streams UI.

Splunk Stream supports these aggregate functions:

  • sum
  • sum squared
  • max
  • min
  • mean
  • median
  • mode
  • sample standard deviation
  • population standard deviation
  • sample variance
  • population variance
  • distinct count
  • distinct values

For more information on aggregate functions, see Statistical and charting functions in the Splunk Enterprise Search Reference.

How aggregates work

You apply aggregate functions to stream events over a user-defined time interval. When Stream calculates the selected aggregates, it groups events into aggregation buckets, with one bucket allocated for each unique value of the "Key" field (or unique combination of values if there are multiple "Key" fields). At the end of the time interval, the app emits an object that represents each bucket.

For example, to gain more insight into the amount of inbound http traffic, you might select src_ip as a Key field, and apply aggregate functions such as max, mean, std dev (standard deviation), and so on to the bytes_in field of an http stream, over a 60 second time interval.

Stream calculates these aggregates for the bytes_in field for each unique value of src_ip that appears in the http stream, over the specified time interval. Search results for these aggregates might appear as follows:

Steam aggregate functions.png

Aggregated field syntax

Aggregated fields in Splunk Stream version 6.6.0 and later have the following syntax:

function(field_name) 

This is a change from version 6.5.x and earlier, where the aggregated field names matched the original field name (such as bytes_in) while actually containing the sum aggregate. To access the latest field aggregation capabilities in Splunk Stream, upgrade to Splunk Stream version 7.0.0, see Upgrade to Splunk Stream 7.0.0 in the Splunk Stream Installation and Configuration Manual.

To use the latest agg

To upgrade aggregated streams from earlier versions of the app to the new syntax in 6.6.0, Splunk Stream provides a migration script that runs automatically when you upgrade to version 6.6.0. For more information,

About the count field

Each aggregated event has a single count field that reflects the total number of raw events aggregated. For example, a search result that displays count: 73 contains 73 total aggregated events, as shown:

Stream aggregate count.png

About the values aggregate

The values aggregate function produces a list (JSON array) of distinct values of the target field, even if the list contains a single entry. The values in the array are sorted in alphabetical order for text fields and in ascending order for numeric fields.

For example, you might apply the values aggregate to the time_taken field in an http stream to get a list of values for the number of microseconds it took to complete each flow event over the selected time interval. Search results for the values(time_taken) aggregate appear as follows:

Stream values aggregate.png

About the sum of squares aggregate

In version 6.5.x and earlier, any field X which was being aggregated had a corresponding field psrsvd_ss_X which contained the sum of squares of X. This field did not appear in the stream configuration, but was automatically generated. As of version 6.6.0, the corresponding field is called sumsq(X), and can be selected for generation in the same way as any other aggregation method. (See Configure Streams UI.)

Last modified on 26 March, 2017
PREVIOUS
Stream field details
  NEXT
Use Global IP filters

This documentation applies to the following versions of Splunk Stream: 7.1.0, 7.1.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters