Detect Categorical Outliers
The Detect Categorical Outliers assistant identifies data that indicate interesting or unusual events. This assistant allows non-numeric and multi-dimensional data, such as string identifiers and IP addresses. To detect categorical outliers, input data and select the fields for which to look for unusual combinations or a coincidence of rare values. When multiple fields have rare values, the result is an outlier. The image below illustrates results from the showcase example in the Splunk Machine Learning Toolkit with Bitcoin data.
Algorithm
The Detect Categorical Outliers assistant uses the probabilistic measures algorithm.
Workflow
To detect categorical outliers, input data and select the fields to analyze.
- Run a search.
- Select the fields you want to analyze.
- Click Detect Outliers.
The list populates every time you run a search.
Interpret and validate
After you detect outliers, review your results and the corresponding tables. Results often have a few outliers.
Result | Definition |
---|---|
Outliers | This result shows the number of events flagged as outliers. |
Total Events | This result shows the total number of events that were evaluated. |
Data and Outliers | This table lists the events that marked as outliers, and the corresponding reason that the event is marked as an outlier. |
Deploy categorical outlier detection
- Next to Detect Outliers, click Open in Search.
- Next to Open in Search, click Show SPL to see the search query that was used to detect outliers.
- Click the Schedule Alert button in a panel to set up an alert when the number of outliers outside both the upper and lower thresholds exceeds a specified value.
- After you save the alert, you can access it from the Scheduled Jobs > Alerts menu.
- Click any title to go to a new Search tab.
A search query opens using all of the data, not just the training set data.
You can use this same query on a different data set.
The search bar contains a search query to replicate the outlier detection calculations.
Detect Numeric Outliers | Forecast Time Series |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 2.4.0, 3.0.0, 3.1.0
Feedback submitted, thanks!