Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

Pairwise Categorical Outlier Detection

Pairwise Categorical Outlier Detection is the process of detecting anomalies in two categorical attributes which may be dependent on each other. This function detects anomalous combinations of values from two categorical variables. Input variables should be categorical or a finite selection of non-categorical values including application name, IP, username, port, and hour of day.

Pairwise Categorical Outlier Detection predicts a rarity score that indicates if an observation is anomalous or not. Smaller values of rarity scores indicate that an observation is more likely to be anomalous.


Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

The required fields are in bold.

| conditional_anomaly conditional="conditional" target="target"

In an example use case where you want to find anomalous access to a port by an application, you would set conditional="app" and target="port".

Required arguments

conditional
Syntax: string
Description: Name of column with first categorical attribute.
Example: ucast(map_get('json-map', "conditional"), "string", null),
target
Syntax: string
Description: Name of column with second categorical attribute.
Example: ucast(map_get('json-map', "target"), "string", null),

Usage

This algorithm detects anomalies in categorical data. For each observed data point, Pairwise Categorical Outlier Detection outputs a rarity score. Rarity score is a positive, real value. The closer the rarity score is to 0, the more likely the data point is anomalous. Higher values indicate that an observed data point is less rare (i.e., not anomalous).

Pairwise Categorical Outlier Detection is useful in identifying anomalies on non-numeric data, or in use cases where contextual information about an observation is important in identifying an anomaly.

For example, transaction data can be monitored for fraud. Standard numeric outlier detection algorithms may erroneously flag all high-value transactions as suspicious. Instead, Pairwise Categorical Outlier Detection identifies anomalous transactions that are conditional on context, like User ID, Location, or Application. Using conditional fields establishes the baseline for normal observations within the reference population.

Security users may use Pairwise Categorical Outlier Detection to find anomalous access to a port by an application. To do so, set conditional="app" and target="port".

SPL2 example

The following example uses Pairwise Categorical Outlier Detection on a test set:

| from splunk_firehose() 
| eval json=cast(body, "string"), 
'json-map'=from_json_object(json), 
conditional=ucast(map_get('json-map', "conditional"), "string", null),
target=ucast(map_get('json-map', "target"), "string", null),
key="" 
| conditional_anomaly conditional="conditional" target="target";
Last modified on 30 October, 2020
PREVIOUS
Mvexpand
  NEXT
Rex

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters