Sequential Outlier Detection
Sequential Outlier Detection identifies anomalous events in time-series sequence data. Sequential Outlier Detection can be applied to online streams for real-time monitoring to identify if sequences or patterns of events in a time series are anomalous.
Sequential Outlier Detection predicts an output that indicates if an observed sequence is anomalous or not. Smaller predicted values indicate that an observed sequence is more likely to be anomalous. Higher predicted values indicate that an observed sequence is expected (not anomalous) based on previously observed data.
Function Input/Output Schema
- Function Input
- This function takes in collections of records with schema R.
- Function Output
- This function outputs collections of records with schema S.
The required fields are in bold.
| detect_sequential_outliers value="input";
- Syntax: string
- Description: An ordered string where each subsequent letter represents the following event. For example, if your categorical time series has 10 different types of events, you can encode the first event as "1", second event as "2", third event as "3", and so forth.
- Example: ucast(map_get('json-map', "eventCode"), "string", null),ng", null),
No optional arguments.
This algorithm contains three pre-tuned parameters: Markov Order, Prune Threshold, and Prune Trigger Count. These parameters do not require manual input and can be ignored by most users.
For users interested in understanding more about these pre-tuned parameters, they are as follows:
val markovOrder: Int = DEFAULT_MARKOV_ORDER
val pruneThreshold: Int = DEFAULT_PRUNE_THRESHOLD
val pruneTriggerCount: Int = DEFAULT_TRIGGER_COUNT
The input to Sequential Outlier Detection is a time series of events, such as logs of commands executed over time. The model predicts whether the observed sequence in a stream is anomalous in real time.
For each data point observed, Sequential Outlier Detection outputs a probability score between 0 and 1 that corresponds to the probability that the sequence is normal or not. The lower the predicted output, the less likely the past sequence has been observed. This corresponds to a higher probability of anomaly. Predicted outputs closer to 1 correspond to a lower probability of anomaly (i.e., that the sequence is more likely to be normal).
The max length of the past sequence the algorithm computes the probability for is given by the default markov order of the algorithm at 4 observations.
For example, security users may want to identify suspicious activity from shell commands. To do so, she can use Sequential Outlier Detection to identify anomalous sequences of command logs executed over time. Each command may seem normal in isolation, but the sequence of commands can be used to identify suspicious activity. This approach to anomaly detection provides more context about the events and time series that are being monitored, improving the ability to detect abnormal sequences among event data.
The following example uses Sequential Outlier Detection to identify anomalies in event code:
| from splunk_firehose() | eval json=cast(body, "string"), 'json-map'=from_json_object(json), input=ucast(map_get('json-map', "eventCode"), "string", null), time=ucast(map_get('json-map', "timestamp"), "string", null), key=ucast(map_get('json-map', "user"), "string", null), timestamp=cast(time, "long") | detect_sequential_outliers value="input";
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0