Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

Drift Detection

Drift Detection identifies large scale shifts and abrupt changes in a time-series data stream. Drift Detection is useful for understanding trends in data to detect a point in time when the distribution of data changes. This function may also be referred to as "changepoint detection."

The Drift Detection function identifies distributional change in a time series, like a metric or KPI. Examples of sudden changes that can be identified by Drift Detection include:

  • Shift in mean or trend of a signal
  • Increase or decrease in variance or noise of observed data
  • Change in periodicity such as the interval between observed data points

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

The required fields are in bold.

| drift detection value="input"

Required arguments

timestamp
Syntax: long
Description: The timestamp that comes with the value.
Example: cast(div(cast(get("time")
value
Syntax: double
Description: The value to detect drift on.
Example: "input"

Usage

Drift Detection monitors the time series for drift. For each observed data point, Drift Detection outputs two values:

  • Label
  • Output

Label is returned as True or False, and is an indicator to identify if a datapoint represents a change point. A value of True indicates that the algorithm has detected drift, and the data point is the observed changepoint.

Output acts as a measure of confidence. Output is a probability score between 0 and 1.0. The closer output is to one, the more confident the algorithm will be in its predicted label.

Generally, when Label = True, the confidence is high. In some noisy signals, this may not be the case. In those scenarios, you can filter the output of the algorithm by the following condition:

| where output > threshold and label=true

A threshold of typically 0.7 - 0.9 can be applied to select the high confidence change points.

SPL2 example

The following example uses Drift Detection on Bytes Sent by Source Address:

| from splunk_firehose()
| eval json=cast(body, "string"),
       'json-map'=from_json_object(json),
       input=parse_double(ucast(map_get('json-map', "Bytes Sent"), "string", null)),
       key=ucast(map_get('json-map', "Source Address"), "string", null),
       time=ucast(map_get('json-map', "Start Time"), "string", null),
       timestamp=cast(div(cast(get("time"), "long"), 1000000), "long")
| detect_drift value="input";
 
Last modified on 30 October, 2020
PREVIOUS
Datagen
  NEXT
Eval

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters