Splunk® Machine Learning Toolkit

User Guide

Welcome to the Splunk Machine Learning Toolkit

The Splunk Machine Learning Toolkit (MLTK) is an app available for both Splunk Enterprise and Splunk Cloud Platform users through Splunkbase. The Splunk Machine Learning Toolkit acts like an extension to the Splunk platform and includes machine learning Search Processing Language (SPL) search commands, macros, and visualizations.

On top of the platform extensions meant for machine learning, MLTK has guided modeling dashboards called Assistants. Assistants walk you through the process of performing particular analytics.

The Splunk Machine Learning Toolkit is not a default solution, but a way to create custom machine learning. You must have have domain knowledge, SPL knowledge, Splunk platform experience, and data science skills or experience to be successful using MLTK.

Types of machine learning

Machine learning is a process for generalizing from examples. You can use these generalizations, typically called models, to perform a variety of tasks, such as predicting the value of a field, forecasting future values, identifying patterns in data, and detecting anomalies from new data. Without data and correct examples, it is difficult for machine learning to work at all.

The machine learning process starts with a question. You might ask one of these questions:

  • Am I being hacked?
  • How hot are the servers?
  • How many visits to my site do I expect in the next hour?
  • What is the price range of houses in a particular neighborhood?

There are different types of machine learning, including:

Regression

Regression modeling predicts a number from a variety of contributing factors. Regression is a predictive analytic. For example, you might have utilization metrics on a machine, such as CPU percentage and amount of disk reads and writes. You can use regression modeling to predict the amount of power that that machine is likely to draw both now and in the future.

Example

... | fit DecisionTreeRegressor temperature from date_month date_hour into temperature_model

Classification

Classification modeling predicts a category or a class from a number of contributing factors. Classification is a predictive analytic. For example, you might have data on user behavior on a website or within a software product. You can use classification modeling to predict whether that customer is going to churn.

Example

... | fit RandomForestClassifier SLA_violation from * into sla_model

Forecasting

Forecasting is a predictive analytic that predicts the future of a single value moving through time. Forecasting looks at past measurements for a single value, like profit per day or CPU usage per minute, to predict future values. For example, you might have sales results by quarter for the past 5 years. Use forecast modeling to predict sales for the upcoming quarter.

Example

... | fit ARIMA Voltage order=4-0-1 holdback=10 forecast_k=10

Clustering

Clustering groups similar points of data together. For example, you might want to group customers together based on their buying behaviors, such as how much they tend to spend or how many items they buy at one time. Use cluster modeling to group together the features you specify.

Example

... | fit XMeans * | stats count by cluster

Anomaly detection

Anomaly detection finds outliers in your data by computing an expectation based on one of the machine learning types, comparing it with reality, and triggering an alert when the discrepancies between the two values is large.

Example

... | fit OneClassSVM * kernel="poly" nu=0.5 coef0=0.5 gamma=0.5 tol=1 degree=3 shrinking=f into Model_Name

How MLTK fits into the Splunk platform

Different machine learning options exist in the Splunk platform.

This image shows a graphic representing the three parts of the Splunk platform that include machine learning: Core Platform, Packaged Solutions, and MLTK.

  • Core Platform Search: The Splunk platform has machine learning capabilities integrated in the SPL.
  • Splunk Machine Learning Toolkit: The Splunk Machine Learning Toolkit is an app in the Splunkbase ecosystem that allows you to build custom machine learning solutions for any use case.

The machine learning process

Machine learning is a process for generalizing from examples. Ideally, the machine learning process follows a series of steps, beginning collecting data and ending with deploying your machine learning model.

This image shows a graphic representing six steps in the machine learning process, including collect data, clean/ transform, explore/ visualize, model, evaluate, and deploy. There is one arrow leading from step to step in order.

  1. Collect available data like CPU percentages, memory utilization, server temperatures, disk space, or sales values.
  2. Clean and transform that data. All machine learning expects a matrix of numbers as input. If you're collecting data that is missing values, then you need to clean and transform that data until it's in the form machine learning requires.
  3. Explore and visualize the data to make sure it is encoding what you expect it to encode.
  4. Build a model on training data.
  5. Evaluate the model test data.
  6. Deploy the model on un-seen data.

The machine learning process follows those steps in theory, but in practice it's rarely linear:

This image shows the graphic of the six steps in the machine learning process but now arrows are criss-crossed between all steps. There is no longer a clear, linear path from a first step to a final step.

You might evaluate your model, discover the performance is not generating the results your expect, and go back to further clean and transform the data. Maybe the data has too many missing values, lacks completeness, is improperly weighted, or has unit disagreement. Iterate around the machine learning process until the model is providing desired machine learning outcomes.

The machine learning process is potentially time consuming and could mean a need for different tools, different team members, and context switching. But with the Splunk Machine Learning Toolkit, this entire process, from ingesting the data to building reports, can all occur inside the Splunk platform.

Layers that make up the Splunk Machine Learning Toolkit

The Splunk Machine Learning Toolkit is comprised of several layers, each of which aid in the process of building a generalization from your data.

Python for Scientific Computing Library

Access to over 300 open source algorithms.

MLTK is built on top of the Python for Scientific Computing Library. This ecosystem includes the most popular machine learning library called sci-kit learn, as well as other supporting libraries like NumPy and Statsmodels.

ML-SPL API

Extensibility to easily support any algorithm, whether proprietary or open source.

MLTK includes access to an extensibility API that not only allows you to expose the 300+ algorithms from the Python app, but also custom algorithms you can write yourself. You can share and reuse your own custom algorithms in the Splunk Community for MLTK on GitHub.

ML Commands

New SPL commands to fit, test, score, and operationalize models.

MLTK uses the PSC app to expose new SPL commands that let you do machine learning. These are SPL commands for building (fit), testing (apply), and validating (score) models and more.

Algorithms

Over 30 standard algorithms to accommodate both supervised and unsupervised learning.

The SPL commands expose different algorithms. There are more than 30 standard algorithms, which are the most commonly used machine learning algorithms.

Showcases

Interactive examples for typical IT, security, business, and IoT use cases.

MLTK includes many interactive examples from different domains like IoT and business analytics. This is called the Showcase. The Showcase is the initial page you land on when you open MLTK, where you see different examples from end-to-end.

Experiments and Assistants

Guided model building, model testing, and deployment for common objectives.

On top of the tools included for machine learning are guided modeling dashboards called Assistants. Built on an Experiment Management Framework (EMF), Assistants offer a walk-through interface of the process for performing particular analytics with your own data. Smart Assistants and Experiment Assistants bring all aspects of a monitored machine learning pipeline into one interface, with automated model versioning and lineage.

What is included in MLTK

MLTK includes several platform extensions. This includes a set of macros that allow you to compute validation statistics as well as a set of custom visualizations. The Machine Learning Toolkit provides the key commands of fit and apply:

  • fit builds the generalization or model
  • apply lets you use that generalization later

Commands

  • apply
  • deletemodel
  • fit
  • listmodels
  • sample
  • score
  • summary

Macros

  • classificationreport
  • classificationstatistics
  • confusionmatrix
  • forecastviz
  • histogram
  • modvizpredict
  • regressionstatistics
  • splitby (1-5)

Visualizations

  • 3D Scatterplot
  • Boxplot Chart
  • Distribution Plot
  • Downsampled Line Chart
  • Forecast Chart
  • Heatmap Plot
  • Histogram Chart
  • Outliers Chart
  • Scatter Line Chart
  • Scatterplot Matrix

Other commands, including summary, listmodels, and deletemodel, manage or inspect the models you've created. MLTK also offers the sample command that lets you take random partitions of your data. The score command is available to run statistical tests to validate model outcomes.

This graphic shows the fit, apply, summary, and score commands with their basic syntax as well as an example.

To learn more about the fit and apply commands, see Understanding the fit and apply commands.
To learn more about the score command see, Using the score command.
To learn more about the available visualizations, see Custom visualizations in the Machine Learning Toolkit.

Introduction to the Assistants

The Assistants guide you through the process of performing an analytic in MLTK. Assistants use all tools and extensions, including both SPL and ML-SPL. You can click the button labeled SPL in any Smart Assistant, or the button labeled Show SPL in any Experiment Assistant to see the underlying SPL.

This is a screen shot of the Smart Clustering Assistant at the Learn stage. The button labeled SPL is highlighted.

The underlying SPL can also be viewed in a new window using the pop-out button.

This screenshot shows the detailed view a user will see after clicking the SPL button. Details of the underlying SPL display in their own modal window. And option to open this SPL in a new window is highlighted.

The Assistants walk you through the entire machine learning process including:

  • Preparing your data
  • Building the model
  • Validating the model
  • Deploying the model

Each type of machine learning has an accompanying Assistant:

Type of machine learning Assistant
Regression Predict Numeric Fields Experiment Assistant

Smart Prediction Assistant

Classification Predict Categorical Fields Experiment Assistant

Smart Prediction Assistant

Anomaly detection Detect Numeric Outliers Experiment Assistant

Detect Categorical Outliers Experiment Assistant
Smart Outlier Detection Assistant

Forecasting Forecast Time Series Experiment Assistant

Smart Forecasting Assistant

Clustering Cluster Numeric Events Experiment Assistant

Smart Clustering Assistant

To learn more about the variety of Assistants within the Splunk Machine Learning Toolkit, see MLTK guided workflows.

Learn more

Continue learning about MLTK by working through this user guide or through the following links:

Last modified on 14 November, 2023
About the Splunk Machine Learning Toolkit   What is the MLTK process?

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.3.3, 5.4.0, 5.4.1, 5.4.2


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters