Splunk® Machine Learning Toolkit

User Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Experiments overview

Introduced in version 3.2 of the Splunk Machine Learning Toolkit (MLTK), the Experiment Management Framework (EMF) brings all aspects of a monitored machine learning pipeline into one interface with automated model versioning and lineage baked in.

An EMF workflow begins with the creation of a new machine learning pipeline, based on a selected Assistant (MLTK's guided modeling interface). A new exclusive knowledge object is created in Splunk that keeps track of all the settings for that pipeline, as well as its affiliated alerts and scheduled trainings.

This knowledge object empowers users to:

  • Organize your experimentation around solving a business problem with machine learning.
  • Keep all of your modeling history and experimentation in one place.

When you are ready to operationalize, publish the models created to another SPL workflow in Splunk or take action from within the EMF's alerting system.

Through the EMF workflow, user's specify what Assistant they want to use and then populate that Assistant like they would outside the EMF including:

  1. Specify data sources.
  2. Select an algorithm and algorithm parameters.
  3. Select the fields for the algorithms to analyze.
  4. Set training/test data splits.

Once the user instructs the EMF to fit the algorithms to the selected training data and generate results, the workflow continues through the available visualizations and statistical analysis (exactly like in the Classic Assistant workflow). Every step of the EMF has tooltips as additional guides, the option to see the SPL being written by the EMF with an explanation for the commands, and an option to open a clone of the SPL in a new search window for customization by the user.

When you have created your first model in an EMF, a green Save button appears to finalize your first modeling in a specific EMF knowledge object.

Save your work prior to scheduling a training job for the Experiment, managing alerts for an Experiment, or deploying an Experiment.

The following image is the screen you see on your first visit to the Experiments tab in the MLTK:

This image shows the 6 areas covered by Experiment Assistants, along with brief descriptions of each one.

Choose the Experiment Assistant to suit your needs

Experiments cover six useful areas. Click below to view a more detailed workflow of a particular Experiment area:

  • The Predict Numeric Fields Experiment Assistant uses regression algorithms to predict or estimate numeric values. Such models are useful for determining to what extent certain peripheral factors contribute to a particular metric result. After the regression model is computed, you can use these peripheral values to make a prediction on the metric result.
  • The Predict Categorical Fields Experiment Assistant displays a type of learning known as classification. A classification algorithm learns the tendency for data to belong to one category or another based on related data.
  • The Detect Numeric Outliers Experiment Assistant determines values that appear to be extraordinarily higher or lower than the rest of the data. Identified outliers are indicative of interesting, unusual, and possibly dangerous events. This assistant is restricted to one numeric data field.
  • The Detect Categorical Outliers Experiment Assistant identifies data that indicate interesting or unusual events. This assistant allows non-numeric and multi-dimensional data, such as string identifiers and IP addresses. To detect categorical outliers, input data and select the fields for which to look for unusual combinations or a coincidence of rare values. When multiple fields have rare values, the result is an outlier.
  • The Forecast Time Series Experiment Assistant forecasts the next values in a sequence for a single time series. The result includes both the forecasted value and a measure of the uncertainty of that forecast. Forecasting refers to the use of past time series data trends to make a prediction about likely future values.
  • The Cluster Numeric Events Experiment Assistant partitions events with multiple numeric fields into groups of events based on the values of those fields. The groupings aren't known in advance an the algorithms are often referred to as unsupervised learning.

Experiment composition

Each Experiment contains the following sections that will vary slightly depending on the type of machine learning analytic being performed:

  • Create or Detect: Follow the workflow laid out in the Experiment to create a new model or forecast, or detect outliers. The workflow depends on the type of analytic but usually includes performing a lookup on a dataset, selecting a field to predict or analyze, and selecting fields or values to use for performing different types of analysis.
  • Raw Data Preview: This section is displayed for predictions and forecasts to show you the data that is being used.
  • Validate: Use the tables and visualizations to determine how well the model was fitted, how well outliers were detected, or how well a forecast performed.
  • Deploy: Click the buttons beneath the visualizations and tables to see different ways to use the analysis. For example, you can open the search in the Search app, show the SPL, or create an alert. Experiments that persist a model include the option to publish from within the EMF.
  • Experiment History tab: Each time you use an Experiment, a history is captured of the settings used. Compare the effects of different searches, algorithms and parameters, and identify the best choices for your use. As a model learns over time, the Experiment monitors models to asses for an increase or decrease in accuracy. This average amount of error within the mathematical model is also captured within this history tab.

Experiment commands

Commands can vary depending upon which Assistant is selected:

  • Predict Numeric and Predict Categorical Fields use the fit and apply commands
  • Forecast Time Series can use the predict command
  • Cluster Numeric Events uses the fit and apply commands
Last modified on 02 November, 2021
Algorithm permissions   Predict Numeric Fields Experiment workflow

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.0.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters