Splunk® Machine Learning Toolkit

User Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Experiment Assistants overview

The Experiment Management Framework (EMF) brings all aspects of a monitored machine learning pipeline into one interface with automated model setting saving built in.

Experiments manage the data source, algorithm, and the parameters to configure that algorithm within one framework. An Experiment is an exclusive knowledge object in the Splunk platform that keeps track of its settings and history, as well as its affiliated alerts and scheduled trainings. Experiment Assistants enable machine learning through a guided user interface.

Experiment workflow

The Experiment workflow begins with the creation of a new machine learning pipeline, based on the selected MLTK guided modeling interface or Assistant. There are six Experiment Assistants to select from including Predict Numeric Fields, Predict Categorical Fields, Detect Numeric Outliers, Detect Categorical Outliers, Forecast Time Series, and Cluster Numeric Events.

Once you select and apply Experiment parameters to your data and generate results, the workflow continues through the available visualizations and statistical analysis. In this way, Experiments are similar to the Classic Assistant workflow. Through the guided Experiment Assistant you make selections including:

  • Specifying your data sources.
  • Selection of an algorithm and algorithm parameters.
  • Selection of the fields for the algorithms to analyze.
  • Setting training/test data splits.

Every step of an Experiment has tool-tips as additional guides, the option to see the SPL as written by the Experiment with explanations for the commands, and an option to open a clone of the SPL in a new search window for further customization.

Saved Experiments

Once you save an Experiment, a new exclusive knowledge object is created in the Splunk platform that keeps track of all the settings for that pipeline, as well as its affiliated alerts and scheduled trainings.

Save your work prior to scheduling a training job for the Experiment, managing alerts for an Experiment, or deploying an Experiment.

This saved knowledge object enables you to:

  • Organize your Experiment around solving a business problem with machine learning.
  • Keep all of your modeling history and experimentation in one place.

Experiments are knowledge objects that are bound to the user who creates them. Experiment-built models cannot be shared in the GUI. Use the publish or export options to share models generated in an Experiment with another app or user.

Users with admin permissions can access stored MLTK model data in the following .conf file: SPLUNK_HOME/etc/users/username/Splunk_ML_Toolkit/local/experiments.conf. To learn more about .conf files, see About configurations files in the Splunk Enterprise Admin Manual.

Operationalize models

You can operationalize your persisted models to other SPL workflows in the Splunk platform through the publish functionality, as well as create alerts for any Experiments saved within the framework. When creating alerts, select from standard Trigger Conditions, or from Machine Learning Conditions that are specific to your Experiment and the Experiment Assistant.

This image shows the Save As Alert window as generated from the Manage option of a saved Experiment in the toolkit. The options under Trigger Condition options for Machine Learning Conditions are unfurled from the drop-down menu. These particular conditions are related to Predict Numeric Fields Experiments.

The following table lists the Machine Learning trigger conditions as available by Experiment Assistant.

Experiment Assistant Machine Learning Condition Options
Smart Forecasting Assistant

Triggers based on a value of predicted field during a scheduled search.

Smart Outlier Detection Assistant

Triggers based a number of outliers during a scheduled search.

Smart Clustering Assistant

Triggers based on a value of cluster_distance during a scheduled search.

Smart Prediction Assistant

Triggers based on the numeric value of a predicted field during a scheduled search.
Triggers based on the categorical value of a predicted field during a scheduled search.
Triggers based on whether the predicted value matches the actual value during a scheduled search.

Predict Numeric Fields Experiment Assistant

Triggers based on a value of predicted field during a scheduled search.
Triggers based on the residual value during a scheduled search.
Triggers based on the R square value during a scheduled search.

Predict Categorical Fields Experiment Assistant

Triggers based on a value of predicted field during a scheduled search.
Triggers based on whether the predicted categorical value matches the actual value during a scheduled search.

Detect Numeric Outliers Experiment Assistant

Triggers based on the outlier number being greater than threshold during a scheduled search.

Detect Categorical Outliers Experiment Assistant

Triggers based on the number of outliers during a scheduled search.

Forecast Time Series Experiment Assistant

Triggers based on a value of predicted field during a scheduled search.

Cluster Numeric Events Experiment Assistant

Triggers based on the number of clusters during a scheduled search.
Triggers based on a value of cluster_distance during a scheduled search.
Triggers based on the range of cluster ID on a scheduled search.

Experiment composition

Each Experiment contains the following sections. These vary slightly depending on the type of machine learning analytic being performed.

  • Create or Detect: Follow the workflow laid out in the Experiment to create a new model or forecast, or detect outliers. The workflow depends on the type of analytic but usually includes performing a lookup on a dataset, selecting a field to predict or analyze, and selecting fields or values to use for performing different types of analysis.
  • Raw Data Preview: This section is displayed for predictions and forecasts to show you the data that is being used.
  • Validate: Use the tables and visualizations to determine how well the model was fitted, how well outliers were detected, or how well a forecast performed.
  • Deploy: Click the buttons beneath the visualizations and tables to see different ways to use the analysis. For example, you can open the search in the Search app, show the SPL, or create an alert. Experiments that persist a model include the option to publish from within the EMF.
  • Experiment History tab: Each time you use an Experiment, a history is captured of the settings used. Compare the effects of different searches, algorithms and parameters, and identify the best choices for your use. As a model learns over time, the Experiment monitors models to asses for an increase or decrease in accuracy. This average amount of error within the mathematical model is also captured within this history tab.

Experiment commands

Experiment Assistants use machine learning SPL commands. Commands use varies depending upon which Experiment Assistant is selected:

  • Predict Numeric and Predict Categorical Fields use the fit and apply commands
  • Forecast Time Series can use the predict command
  • Cluster Numeric Events uses the fit and apply commands

Choose the Experiment Assistant to suit your needs

Experiments cover machine learning areas including Predict Numeric Fields, Predict Categorical Fields, Detect Numeric Outliers, Detect Categorical Outliers, Forecast Time Series, and Cluster Numeric Events. Choose an Experiment Assistant based on the type of machine learning you wish to perform on your data.

This image shows the landing page a user sees when they first select Experiments. Each areas of machine learning covered by Experiment Assistants is displayed, along with brief descriptions of each one.
  • The Smart Forecasting Assistant offers an updated guided interface to forecast future numeric time-series data. This Assistant is built on the backbone of the Experiment Management Framework but offers an improved look and feel as well as the option to bring in data from different sources to build your model.
  • The Smart Outlier Detection Assistant offers an updated guided interface and leverages a density function algorithm to segment data in advance of an anomaly search. This Assistant is built on the backbone of the Experiment Management Framework but offers an improved look and feel as well as the option to bring in data from different sources to build your model.
  • The Smart Clustering Assistant offers an updated guided interface and leverages the K-means algorithm which persists a model using the fit command that can be used with the apply command. This new Assistant is built on the backbone of the Experiment Management Framework but offers an improved look and feel as well as the option to bring in data from different sources to build your model.
  • The Smart Prediction Assistant offers an updated guided interface and leverages the AutoPrediction algorithm to determine the data type as categorical or numeric before carrying out the prediction. This Assistant is built on the backbone of the Experiment Management Framework but offers an improved look and feel as well as the option to bring in data from different sources to build your model.
  • The Predict Numeric Fields Experiment Assistant uses regression algorithms to predict or estimate numeric values. Such models are useful for determining to what extent certain peripheral factors contribute to a particular metric result. After the regression model is computed, use these peripheral values to make a prediction on the metric result.
  • The Predict Categorical Fields Experiment Assistant displays a type of learning known as classification. A classification algorithm learns the tendency for data to belong to one category or another based on related data.
  • The Detect Numeric Outliers Experiment Assistant determines values that appear to be extraordinarily higher or lower than the rest of the data. Identified outliers are indicative of interesting, unusual, and possibly dangerous events. This Assistant is restricted to one numeric data field.
  • The Detect Categorical Outliers Experiment Assistant identifies data that indicate interesting or unusual events. This Assistant allows non-numeric and multi-dimensional data, such as string identifiers and IP addresses. To detect categorical outliers, input your data and select the fields from which to look for unusual combinations or a coincidence of rare values. When multiple fields have rare values, the result is an outlier.
  • The Forecast Time Series Experiment Assistant forecasts the next values in a sequence for a single time series. Forecasting makes use of past time series data trends to make a prediction about likely future values. The result includes both the forecasted value and a measure of the uncertainty of that forecast.
  • The Cluster Numeric Events Experiment Assistant partitions events into groups of events based on the values of those fields. As the groupings are not known in advance, this is considered unsupervised learning.
Last modified on 24 January, 2023
Smart Assistants overview   Classic Assistants overview

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.2.0, 5.2.1, 5.2.2, 5.3.0, 5.3.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters