Splunk® Machine Learning Toolkit

User Guide

Download manual as PDF

This documentation does not apply to the most recent version of MLApp. Click here for the latest version.
Download topic as PDF

Splunk Machine Learning Toolkit workflow

Overview

Machine learning is a process for generalizing from examples. These generalizations, typically called models, are used to perform a variety of tasks such as predicting the value of a field, forecasting future values, identifying patterns in data, and detecting anomalies from new (never-before-seen) data. Machine learning isn't magic, it's a process that starts with a question. For example:

  • Am I being hacked?
  • How hot are the servers?
  • How many visits to my site do I expect in the next hour?
  • What is the price range of houses in a particular neighborhood?

The Splunk Machine Learning Toolkit lets users create analytics in 6 useful areas: :

  • Predict Numeric Fields
  • Predict Categorical Fields
  • Detect Numeric Outliers
  • Detect Categorical Outliers
  • Forecast Time Series
  • Cluster Numeric Events

The toolkit includes over 30 common algorithms, and gives you access to over 300 popular open source algorithms through the Python for Scientific Computing library.

Get started by exploring interactive examples that step you through the entire process for IT, security, business and IoT use cases. When ready, choose a guided modeling assistant to step you through creating your own custom built model.

You also have complete access to the underlying SPL commands generated by the toolkit. This gives you the freedom to further customize your model and to operationalize it in any way desired. 

Workflow options

The MLTK's guided modeling assistants live within the six different areas listed above. There are 2 main workflows for these areas:

Both the Classic and Experiment workflows follow the same major steps:

This image shows the intertwined steps for gaining insights from your machine data including collecting, exploring and deploying data.

  1. Specify a data source via the search bar.
  2. Select an algorithm and algorithm parameters.
    • Click here to see a list of all supported algorithms
    • Some of the assistants provide the ability to apply multiple sequential transformations to your data. See preprocessing methods for information.
  3. Select the fields for the algorithms to analyze, and set training/ test data splits.
  4. Instruct the assistant to fit the algorithm to the selected training data and generate results, including visualizations and statistical analysis. Commands can vary depending upon which Assistant is selected:
    • Predict Numeric and Predict Categorical Fields use the fit and apply commands
    • Forecast Time Series can use the predict command
    • Cluster Numeric Events uses the fit and apply commands
  5. Once complete, optionally schedule regular re-training of the model and deploy it in your production environment as a scheduled alert.

Not all of the Assistants, nor all of the algorithms result in a model being created.

Training data examples

The MLTK Assistants, regardless of Experiment or Classic framework, support both supervised and unsupervised learning:

  • In supervised learning, the model learns from labeled examples through prediction, regression and forecasting methods.
  • In unsupervised learning, the model learns from unlabeled examples through clustering methods.

The MLTK guided modeling Assistants

Select from the following links to see specific workflows for each guided modeling Assistant within both the experiment and classic frameworks:

Experiment framework

Classic framework

PREVIOUS
Understanding the fit and apply commands
  NEXT
Search commands for machine learning

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.0.0, 4.1.0, 4.2.0, 4.3.0


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters