Splunk® Data Stream Processor

Function Reference

On April 3, 2023, Splunk Data Stream Processor reached its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.

All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.

Bin

This topic describes how to use the function in the .

Description

The Bin function puts continuous numerical values into discrete sets, or bins, by adjusting the value of <field> so that all of the items in a particular set have the same value.

Function Input/Output schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs the same collection of records but with a different schema S.

Syntax

bin
<span-options>
<field> [AS <result>]

How the Bin function works

Use the Bin function to group records by the numerical values in a field. Suppose your incoming data looks like the following:

Body Timestamp Hour and minute Minutes from first timestamp
1 2019-08-22 01:56:37.000 01:56
2 2019-08-22 01:58:21.000 01:58 2 minutes
3 2019-08-22 01:59:59.000 01:59 3 minutes
4 2019-08-22 02:03:16.000 02:03 7 minutes
5 2019-08-22 02:05:43.000 02:05 9 minutes
6 2019-08-22 02:09:38.000 02:09 13 minutes
7 2019-08-22 02:12:31.000 02:12 16 minutes

You decide to add a Bin function to your pipeline that bins the streaming data using a 5 minute time span on the timestamp field.

...| bin span=5m timestamp;

The Bin function groups the timestamps in the timestamp field into 5 minutes intervals. The groups are:

Group Timestamp values Timestamp span range for each bin
1 2019-08-22 01:56:37.000

2019-08-22 01:58:21.000
2019-08-22 01:59:59.000

2019-08-22 01:56:37.000 --- 2019-08-22 02:01:36.000
2 2019-08-22 02:03:16.000

2019-08-22 02:05:43.000

2019-08-22 02:01:37.000 --- 2019-08-22 02:06:36.000
3 2019-08-22 02:09:38.000 2019-08-22 02:07:37.000 --- 2019-08-22 02:11:36.000
4 2019-08-22 02:12:31.000 2019-08-22 02:11:37.000 --- 2019-08-22 02:16:36.000

Required arguments

field
Syntax: string
Description: The name of the field to bin. The value of the field must be a numerical data type.
Example: timestamp
span
Syntax: <time-specifier>
Description: Sets the size of each bin, using a span length based on time or log-based span.
Example: 5m

Span options

log-span
Syntax: <num>log<num>
Description: Sets to logarithm-based span. The first number is a coefficient. The second number is the base. The coefficient must be a real number >= 1.0 and < the base number.
Example: span=2log10
span-length
Syntax: <int><timescale>
Description: A span of each bin. If discretizing based on the timestamp field or used with a timescale, this is treated as a time range. If not, this is an absolute bin length.
timescale
Syntax: <subseconds> | <sec> | <min> | <hr> | <day> | <month> | <year>
Description: Time scale units. If discretizing based on the timestamp field.
Default: sec
Time scale Syntax Description
<subseconds> us | ms | cs | ds Time scale in microseconds (us), milliseconds (ms), centiseconds (cs), or deciseconds (ds).
<sec> s | sec | secs | second | seconds Time scale in seconds.
<min> m | min | mins | minute | minutes Time scale in minutes.
<hr> h | hr | hrs | hour Time scale in hours.
<day> d | day | days Time scale in days.
<month> mon | month | months Time scale in months.
<year> y | year | years Time scale in years.

Optional arguments

aligntime
Syntax: aligntime=<time-specifier>
Description: Align the bin times to something other than base UTC time (epoch 0). The aligntime option is valid only when doing a time-based discretization. Ignored if span is in days, months, or years. Aligntime of earliest and latest are not supported.
Example: 4h
result
Syntax: AS <string>
Description: A new name for the field.
Example: time

Example

An example of a common use case follows. These examples assume that you have added the function to your pipeline.

SPL2 Example: Align the bins to 3 hours and set the span to 1 hour intervals from that time

This example assumes that you are in the SPL View.

... | bin aligntime=3h span=1h timestamp | ...;
Last modified on 09 February, 2022
Batch Records   Break Events

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters