Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Pairwise Categorical Outlier Detection (beta)

Pairwise Categorical Outlier Detection is the process of detecting anomalies in two categorical attributes which may be dependent on each other (given "A," is "B" anomalous). This function detects anomalous combinations of values from two categorical variables. Input variables should be categorical or a finite selection of non-categorical values including application name, IP, username, port, and hour of day.

Pairwise Categorical Outlier Detection predicts a rarity score that indicates if an observation is anomalous or not. Smaller values of rarity scores indicate that an observation is more likely to be anomalous.

This function requires two categorical fields in the input dataset. If the dataset contains columns titled "conditional" and "target" they will be applied by default. You can override these defaults and specify custom input fields. For more information, see the optional arguments section.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

conditional_anomaly
conditional="conditional"
target="target"

Required arguments

If the dataset contains columns labelled "conditional" and "target", the data in those columns is applied by default. To learn how to override the default inputs and specify different fields for "conditional" and "target" inputs, see the Optional arguments section.

Optional arguments

conditional
Syntax: string
Description: This field corresponds to the categorical attribute of the dataset that represents the condition on which to measure the rarity of the target variable. For example, if assessing the rarity of actions executed by users, the "conditional" field may be username or location. In mathematical notation P(A|B), this field represents B. If the dataset contains a column titled "conditional" the function will use it as default. To override this default, set the "conditional" argument to input a different field.
target
Syntax: string
Description: This field corresponds to the categorical attribute of the dataset on which you want to compute the rarity. In mathematical notation P(A|B), this field represents A. If the dataset contains a column titled "target" the function will use it as default. To override this default, set the "target" argument to input a different field.

Usage

This algorithm detects anomalies in categorical data. For each observed data point, Pairwise Categorical Outlier Detection outputs a rarity score. Rarity score is a positive, real value. The closer the rarity score is to 0, the more likely the data point is anomalous. Higher values indicate that an observed data point is less rare (i.e., not anomalous).

Pairwise Categorical Outlier Detection is useful in identifying anomalies on non-numeric data, or in use cases where contextual information about an observation is important in identifying an anomaly.

For example, transaction data can be monitored for fraud. Standard numeric outlier detection algorithms may erroneously flag all high-value transactions as suspicious. Instead, Pairwise Categorical Outlier Detection identifies anomalous transactions that are conditional on context, like User ID, Location, or Application. Using conditional fields establishes the baseline for normal observations within the reference population.

Security users may use Pairwise Categorical Outlier Detection to find anomalous access to a port by an application. To do so, set conditional="app" and target="port".

SPL2 example

The following example uses Pairwise Categorical Outlier Detection on a test set:

| from splunk_firehose() 
| eval json=cast(body, "string"), 
'json-map'=from_json_object(json), 
conditional=ucast(map_get('json-map', "conditional"), "string", null),
target=ucast(map_get('json-map', "target"), "string", null),
key="" 
| conditional_anomaly conditional="conditional" target="target";
Last modified on 13 January, 2021
PREVIOUS
Mvexpand
  NEXT
Rex

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters