Splunk® Machine Learning Toolkit

User Guide

Acrobat logo Download manual as PDF


This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Smart Outlier Detection Assistant

The Smart Outlier Detection Assistant enables machine learning outcomes for users with little to no SPL knowledge. Introduced in version 5.0.0 of the Machine Learning Toolkit, this new Assistant is built on the backbone of the Experiment Management Framework (EMF), offering enhanced outlier detection abilities. The Smart Outlier Detection Assistant offers a segmented, guided workflow with an updated user interface. Move through the stages of Define, Learn, Review, and Operationalize to load data, build your model, and put that model into production. Each stage offers a data preview and visualization panel.

This Assistant leverages the DensityFunction algorithm which persists a model using the fit command that can be used with the apply command. DensityFunction creates and stores density functions for use in anomaly detection. DensityFunction groups the data, where for each of these groups a separate density function is fitted and stored.

To learn more about the Smart Outlier Detection Assistant algorithm, see DensityFunction algorithm.

The accuracy of the anomaly detection for DensityFunction depends on the quality and the size of the training dataset, how accurately the fitted distribution models the underlying process that generates the data, and the value chosen for the threshold parameter.

Smart Outlier Detection Assistant Showcase

You can gain familiarity of this new Assistant through the MLTK Showcase, accessed under its own tab. The Smart Outlier Detection Showcase examples include:

  • Find Anomalies in Hard Drive Metrics
  • Find Anomalies in Supermarket Purchases

This image shows the landing page for the Machine Learning Toolkit Showcase page. The Detect Outliers option is highlighted and pointing to the available examples for the Smart Outlier Detection Assistant.

Click the name of any Smart Outlier Detection Showcase to see this new Assistant and its updated interface using pre-loaded test data and pre-selected outlier detection parameters.

Smart Outlier Detection Assistant Showcases require you to click through to continue the demonstration. Showcases do not include the final stage of the Assistant workflow to Operationalize the model.

Smart Outlier Detection Assistant workflow

Move through the stages of Define, Learn, Review, and Operationalize to draw in data, build your model, and put that model into production.

This example workflow uses the supermarket.csv dataset that ships with the MLTK. You can use this dataset or another of your choice to explore the Smart Outlier Detection Assistant and its features before building a model with your own data.

To begin, select Smart Outlier Detection from the Experiments landing page and the Create New Experiment button in the top right.

This image shows the Machine Learning Toolkit and the view under the Experiments tab. The Experiment types are displayed from which a user can create a new Experiment of that type. The new Experiment type of Smart Outlier Detection Assistant is highlighted.

Enter an Experiment Title, and optionally add a Description. Click Create to move into the Assistant interface.

This image shows the resulting modal window that generates following clicking the Create New Experiment button. Fields are filled in for Experiment Title and Description and a button labeled Create is highlighted in the bottom right corner of the modal window.

Define

Use the Define stage to select and preview the data you want to use for the outlier detection. You have three options to pull data into the Assistant and you can pull data in from anywhere in the Splunk platform.

You can choose the Search option to search for a stored dataset. Use the Search bar to modify your dataset data in advance of using that data within the Learn step.

This image shows the Define stage of the Assistant. The Search bar is highlighted and contains an inputlookup for the supermarket dataset.

You can choose the Datasets option. Under Datasets, you can find any data you have ingested into Splunk, as well as any datasets that ship with Splunk Enterprise and the Machine Learning Toolkit. You can filter by type to find your preferred data faster.

This image shows the Define stage of the Assistant. This view is if the alternate option to getting data into the Assistant called Datasets. The View Datasets menu is open with the supermarket dataset selected from the list.

You can choose the Metrics option. Under Metrics, you can find any metrics data you have gathered and stored as a custom index type without the need to write any SPL. This index might include data from systems including hosts, network devices, web servers, and SaaS systems. To learn more, see About the Splunk Metrics Workspace.

This image shows the Define stage of the Assistant. This Metrics tab is highlighted.

As with other Experiment Assistants, the Smart Outlier Detection Assistant includes a time-range picker to narrow down the data timeframe to a particular date or date range. The default setting of All time can be changed to suit your needs. Once data is selected, the Data Preview and Visualization tabs populate.

This image shows the Define stage of the Assistant. The menu option to change the default time range for the data from All time to another preset time frame or a custom time frame is open.

When you are finished selecting your data, click Next in the top right, or Learn from the left hand menu to move on to the next stage of the Assistant.

This image shows the Define stage of the Assistant. The left-hand side menu option of Learn is highlighted. The green button labeled Next in the top right corner of the page is also highlighted.

Learn

Use the Learn stage to build your outlier detection model. Under Initial data you can review the data selected in the Define stage.

To extract time features from your data, choose the +Add preprocessing step. Four fields are extracted including atf_hour_of_day, atf_day_of_week, atf_day_of_month, and atf_month.

The optional preprocessing step only works on data that includes a valid _time field.

Under Detect Outliers make field selections to drive the outlier detection results.

This image shows the Learn stage of the Assistant. The sections for Initial data and Detect Outliers are highlighted. Under Initial data there is a button to add a preprocessing step for time data. Under Detect Outliers are several fields that help form the outlier model.

Refer to the following table for information on each available Detect Outliers field. Hover over the question mark helper icons beside fields to view in-app field descriptions.

Field name Description
Field to analyze Required field. From the drop-down menu populated from your data set, select the field from which you wish to perform outlier detection.
Split by fields Optional field. Select up to five fields. Use split by field(s) if the anomaly might be different based on the data in a particular field. You will not generate a cardinality histogram without the selection of at least one split by field.
Distribution type Required field. Choose the distribution type based on the statistical behavior of the data. Leave as default selection of Auto if unsure.
Outlier tolerance threshold Required field. Adjust as needed based on the number of expected outliers.
Notes Optional field. Use this free form block of text to track the selections made in the parameter fields. Refer back to notes to review which parameter combinations yield the best results.

Complete your field selections and click Detect Outliers to view results. Clicking Detect Outliers creates the model using the fit command and produces a written summary of the chosen model parameters at the top of the page. The Experiment is now in a Draft state, and the View History option is available. View History allows you to track any changes you make in the Learn stage.

This image shows the Learn stage with field selections set under the Detect Outliers menu. A visualization of the data is displayed on the Evaluate screen. Arrows point to where a plain English summary of the field selection appears at the top of the page, as well as the now available View history button.

The SPL button is also available as a means to review the Splunk Search Processing Language being auto-generated for you in the background as you work through the Assistant.

This image shows the resulting modal window from clicking the button labeled SPL. The window displays the Splunk Search Processing Language generated for you as you work through the Assistant.

Within the ellipses menu, view the underlying SPL from the SPL option, or run the SPL in a new browser tab from the Open in Search option.

This image shows the Evaluate tab on the Learn step of the Assistant. An ellipses button is clicked showing the resulting drop-down options of Open in Search and SPL.

Making adjustments or changes to the fields in the Detect Outliers section and clicking the Detect Outlier button retrains the model using the fit command. This process can be compute intensive. The Smart Outlier Detection Assistant offers the unique option to make changes to some of the available fields without using the Detect Outliers section. Changes here tune the model parameters using the apply command. This option can be less compute intensive as you can update model settings without retraining the model.

Fields and information views on the Evaluate tab include the following items that correspond in number to the screenshot:

  1. Choose to toggle the distribution mode from Automatic to Manual.
  2. Change the tolerance threshold for outliers. Click Apply to update the model parameters.
  3. View the total number of outliers based on current settings.
  4. Choose to view the top three, top five, or top ten groups with the most outliers. This option only appears if at least one field is selected to split by.
  5. Use this drop down to get a quick view of top outlier group values as based on the selection made at item #4.
  6. Opt for a split view of charts or combined view. This option only appears if at least one field is selected to split by.
  7. Toggle the view to show/ not show the histogram, outlier area, and outliers. All view options available in both the combined or split chart view.
  8. Hover over points within the visualization for additional details.

This image shows the Learn stage of the Assistant. Numbers have been added the the screen that correspond to the list of items one to eight. Highlighted items here include the option to toggle the distribution mode, to view the charts as a combined or split view, and a slider to change the outlier tolerance threshold.

When you are happy with your results, click Next in the top right, or Review from the left-hand menu to move on to the next stage of the Assistant.

This image shows the Learn stage of the Assistant. The button labeled Next in the top right of the screen is highlighted, as is the icon labeled Review in the left-hand menu. Either option moves users forward to the next stage of this Assistant.

Outlier tolerance threshold settings

If you change the Outlier tolerance threshold parameter without retraining the model, and you try to move on to the Review stage, you will see a pop-up modal window. In this example, the model has been trained with an outlier tolerance threshold of 0.1, but the parameter is adjusted to an outlier tolerance threshold of 0.001. Choosing Keep Original Threshold will use the outlier tolerance threshold of 0.1 and ignore the adjusted parameter setting. Choosing Update Threshold will retrain the model using the adjusted parameter setting of 0.001 and update this setting in the Detect Outliers menu.

This image shows resulting modal window that appears when you update the outlier tolerance threshold using the apply command and try to move to the next stage of the Assistant. On the modal window you have the option to keep the original threshold used when training the model, or to update the threshold to the updated model parameter value.

Review

Use the Review stage to explore the resulting model based on the fields selected at the Learn stage. The Review panels give you the opportunity to assess your outlier detection results prior to putting the model into production.

There are four panels in this stage as seen in the following image and described in the table:

This image shows the initial view when navigating to the Review stage. The Model Summary view is on display. Other view options on screen include Cardinality Histogram, Distribution Properties, and Outlier Analysis.

Panel name Description
Model Summary This is the default view showing a summary table based on selections made in the Learn stage.
Cardinality Histogram View a histogram of groups by number of data points. Groups are based on any fields chosen in the Learn stage and the split by fields section. Having fewer numbers of low value in this histogram can be an indication of a more reliable model. To see different results, increase the amount of data in the search or change the fields selected in the split by fields section. No results view generates if no split by fields were selected at the Learn step.
Distribution Properties Use this view to get a sense of how groups with a similar type of distribution relate to one another. A histogram of mean and standard deviation that is sharp and narrow can mean that most of those groups have similar statistical behavior. Two distinct peaks in the histograms can signal two obvious characteristics in the groups worthy of further investigation. No results view generates if no split by fields were selected at the Learn step.
Outlier Analysis View outliers by any field selections made in the Learn stage and the split by fields section. Use this breakdown to gain insight into individual dimensions that could require further investigation. No results view generates if no split by fields were selected at the Learn step.

Navigate back to the Learn stage to make outlier detection adjustments or click Save and Next to continue. Clicking Save and Next generates a modal window that offers the opportunity to update the Experiment name or description. When ready, click Save.

This image shows the Review stage of the Assistant. A modal window is present following the clicking of the Save and Next button. From this modal window you can choose to rename the Experiment and Save from draft state.

Operationalize

The Operationalize stage provides publishing, alerting, and scheduled training in one place. Click Done to move to the Experiments listings page.

This image shows the Operationalize stage of the Assistant. Options on this page include Publish Outlier Models, Create Alert, Manage Alerts, Schedule Model Training, and View Scheduled Training Jobs. A green button labeled Done in the top right of the page is highlighted.

The Experiments listing page provides a place to publish, set up alerts, and schedule training for any of your saved Experiments across all Assistant types including Smart Outlier Detection.

This image shows the Experiment list view page. The outlier detection Experiment created in this document is listed, and options to Manage and Publish the Experiment are highlighted.

Learn more

To learn about implementing analytics and data science projects using Splunk's statistics, machine learning, built-in and custom visualization capabilities, see the Splunk for Analytics and Data Science course.

Last modified on 05 August, 2022
PREVIOUS
Smart Forecasting Assistant
  NEXT
Smart Clustering Assistant

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.2.0, 5.2.1, 5.2.2, 5.3.0, 5.3.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters