Splunk® Machine Learning Toolkit

User Guide

Acrobat logo Download manual as PDF


This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Splunk Machine Learning Toolkit workflow

Overview

Machine learning is a program that trains a model from input data without being explicitly programmed. The trained model is used to predict outcomes, categorize like things, identify patterns and detect the unexpected from new (never-before-seen) data. Machine learning isn't magic, it's a process that starts with a question. For example:

  • Am I being hacked?
  • How hot are the servers?
  • How many visits to my site do I expect in the next hour?
  • What is the price range of houses in a particular neighborhood?

Splunk’s Machine Learning Toolkit provides guided modeling assistants to help users build models, and to enable users to gain meaningful insights from their machine data in real time. The toolkit let's you create these analytics in 6 useful areas:

  • Predict Numeric Fields
  • Predict Categorical Fields
  • Detect Numeric Outliers
  • Detect Categorical Outliers
  • Forecast Time Series
  • Cluster Numeric Events

The toolkit includes over 30 common algorithms, and gives you access to over 300 popular open source algorithms through the Python for Scientific Computing library.

Get started by exploring interactive examples that step you through the entire process for IT, security, business and IoT use cases. When ready, choose a guided modeling assistant to step you through creating your own custom built model.

You also have complete access to the underlying SPL commands generated by the toolkit. This gives you the freedom to further customize your model and to operationalize it in any way desired. 

Workflow options

The MLTK’s guided modeling assistants live within the six different areas listed above. There are 2 main workflows for these areas:

Experiment workflows are recommended, in that they manage the data source, algorithm used and any additional parameters to configure that algorithm, within one framework. An Experiment is an exclusive knowledge object in Splunk that keeps track of its settings and history, as well as its affiliated alerts and scheduled trainings.

Classic workflows are valuable if you are looking to quickly generate some SPL, but perhaps not working on a longer-term project.

Both the classic and experiment workflows follow the same major steps:

  1. Specify a data source via a search bar.
  2. Select an algorithm and algorithm parameters.
    • Click here to see a list of all supported algorithms
    • Some of the assistants provide the ability to apply multiple sequential transformations to your data. See preprocessing methods for information.
  3. Select the fields for the algorithms to analyze, and set training/ test data splits.
  4. Instruct the assistant to fit the algorithm to the selected training data and generate results, including visualizations and statistical analysis. The composition of this step will depend on the assistant in use:
    • Predict Numeric and Predict Categorical Fields use the fit model
    • Detect Numeric Outliers and Detect Categorical Outliers use detect outliers
    • Forecast Time Series uses forecast
    • Cluster Numeric Events uses cluster
  5. Once complete, you can schedule regular re-training of the model and deploy it in your production environment as a scheduled alert.

Not all of the assistants, nor all of the algorithms result in a model being created.

Training data examples

The MLTK assistants, regardless of experiment or classic framework, support both supervised and unsupervised learning:

  • In supervised learning, the model learns from labeled examples through prediction, regression and forecasting methods.
  • In unsupervised learning, the model learns from unlabeled examples through clustering methods.

The MLTK guided modeling assistants

Select from the links below to see specific workflows for each guided modeling assistant within both the experiment and classic frameworks:

Experiment framework

Classic framework

Last modified on 21 August, 2018
PREVIOUS
Understanding the fit and apply commands
  NEXT
Search commands for machine learning

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 3.4.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters