 Download topic as PDF

# Drift Detection

Drift Detection identifies large scale shifts and abrupt changes in a time-series data stream. Drift Detection is useful for understanding trends in data to detect a point in time when the distribution of data changes. This function may also be referred to as "changepoint detection."

The Drift Detection function identifies distributional change in a time series, like a metric or KPI. Examples of sudden changes that can be identified by Drift Detection include:

• Shift in mean or trend of a signal
• Increase or decrease in variance or noise of observed data
• Change in periodicity such as the interval between observed data points

## Function Input/Output Schema

Function Input
`collection<record<R>>`
This function takes in collections of records with schema R.
Function Output
`collection<record<S>>`
This function outputs collections of records with schema S.

## Syntax

`| drift detection value="input"`

## Required arguments

timestamp
Syntax: long
Description: The timestamp that comes with the value.
Example: cast(div(cast(get("time")
value
Syntax: double
Description: The value to detect drift on.
Example: "input"

## Usage

Drift Detection monitors the time series for drift. For each observed data point, Drift Detection outputs two values:

• Label
• Output

Label is returned as True or False, and is an indicator to identify if a datapoint represents a change point. A value of True indicates that the algorithm has detected drift, and the data point is the observed changepoint.

Output acts as a measure of confidence. Output is a probability score between 0 and 1.0. The closer output is to one, the more confident the algorithm will be in its predicted label.

Generally, when `Label = True`, the confidence is high. In some noisy signals, this may not be the case. In those scenarios, you can filter the output of the algorithm by the following condition:

`| where output > threshold and label=true`

A threshold of typically 0.7 - 0.9 can be applied to select the high confidence change points.

## SPL2 example

The following example uses Drift Detection on `Bytes Sent` by `Source Address`:

```| from splunk_firehose()
| eval json=cast(body, "string"),
'json-map'=from_json_object(json),
input=parse_double(ucast(map_get('json-map', "Bytes Sent"), "string", null)),
time=ucast(map_get('json-map', "Start Time"), "string", null),
timestamp=cast(div(cast(get("time"), "long"), 1000000), "long")
| detect_drift value="input";
```