Detect Categorical Outliers
The Detect Categorical Outliers assistant identifies data that indicate interesting or unusual events. This assistant allows non-numeric and multi-dimensional data, such as string identifiers and IP addresses.
To detect categorical outliers, input data and select the fields for which to look for unusual combinations or a coincidence of rare values. When multiple fields have rare values, the result is an outlier.
The visualization below illustrates results from the showcase example in the Splunk Machine Learning Toolkit with Bitcoin data.
Algorithm
The Detect Categorical Outliers assistant uses the probabilistic measures algorithm.
Workflow
- Create a new Detect Categorical Outliers Experiment, including the provision of a name.
- On the resulting page, run a search.
- Select the fields you want to analyze.
- Click Detect Outliers.
Note that this list populates every time you run a search.
Interpret and validate
After you detect outliers, review your results and the corresponding tables. Results often have a few outliers.
Result | Definition |
---|---|
Outliers | This result shows the number of events flagged as outliers. |
Total Events | This result shows the total number of events that were evaluated. |
Data and Outliers | This table lists the events that marked as outliers, and the corresponding reason that the event is marked as an outlier. |
Deploy
After you interpret, validate and refine, deploy the categorical outlier detection:
- Click the Save button in the top right corner of the page. You can edit the title and add or edit and associated description. Click Save when ready.
- Next to Detect Outliers, click Open in Search.
- Click Show SPL to see the search query that was used to detect outliers.
- Under the Experiments tab, you can see experiments grouped by assistant analytic. Under the Manage menu, choose to:.
- Create Alert
- Edit Title and Description
- Click Create Alert to set up an alert that is triggered when the predicted value meets a threshold you specify. Once at least one alert is present, the bell icon will be highlighted in blue.
This shows you the search query that uses all the data, not just the training set data.
For example, you could use this same query on a different data set.
If you make changes to the saved experiment you may impact affiliated alerts. Re-validate your alerts once experiment changes are complete.
For more information about alerts, see Getting started with alerts in the Splunk Enterprise Alerting Manual.
Detect Numeric Outliers | Forecast Time Series |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 3.2.0, 3.3.0
Feedback submitted, thanks!