Drift Detection identifies large scale shifts and abrupt changes in a time-series data stream. Drift Detection is useful for understanding trends in data to detect a point in time when the distribution of data changes. This function may also be referred to as "changepoint detection."
The Drift Detection function identifies distributional change in a time series, like a metric or KPI. Examples of sudden changes that can be identified by Drift Detection include:
- Shift in mean or trend of a signal
- Increase or decrease in variance or noise of observed data
- Change in periodicity such as the interval between observed data points
Function Input/Output Schema
- Function Input
- This function takes in collections of records with schema R.
- Function Output
- This function outputs collections of records with schema S.
The required fields are in bold.
| drift detection value="input"
- Syntax: long
- Description: The timestamp that comes with the value.
- Example: cast(div(cast(get("time")
- Syntax: double
- Description: The value to detect drift on.
- Example: "input"
Drift Detection monitors the time series for drift. For each observed data point, Drift Detection outputs two values:
Label is returned as True or False, and is an indicator to identify if a datapoint represents a change point. A value of True indicates that the algorithm has detected drift, and the data point is the observed changepoint.
Output acts as a measure of confidence. Output is a probability score between 0 and 1.0. The closer output is to one, the more confident the algorithm will be in its predicted label.
Label = True, the confidence is high. In some noisy signals, this may not be the case. In those scenarios, you can filter the output of the algorithm by the following condition:
| where output > threshold and label=true
A threshold of typically 0.7 - 0.9 can be applied to select the high confidence change points.
The following example uses Drift Detection on
Bytes Sent by
| from splunk_firehose() | eval json=cast(body, "string"), 'json-map'=from_json_object(json), input=parse_double(ucast(map_get('json-map', "Bytes Sent"), "string", null)), key=ucast(map_get('json-map', "Source Address"), "string", null), time=ucast(map_get('json-map', "Start Time"), "string", null), timestamp=cast(div(cast(get("time"), "long"), 1000000), "long") | detect_drift value="input";
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0