Splunk Machine Learning Toolkit workflow

Overview

Machine learning is a program that trains a model from input data without being explicitly programmed. The trained model is used to predict outcomes, categorize like things, identify patterns and detect the unexpected from new (never-before-seen) data. Machine learning isn't magic, it's a process that starts with a question. For example:

Am I being hacked?
How hot are the servers?
How many visits to my site do I expect in the next hour?
What is the price range of houses in a particular neighborhood?

Splunk’s Machine Learning Toolkit provides guided modeling assistants to help users build models, and to enable users to gain meaningful insights from their machine data in real time. The toolkit let's you create these analytics in 6 useful areas:

Predict Numeric Fields
Predict Categorical Fields
Detect Numeric Outliers
Detect Categorical Outliers
Forecast Time Series
Cluster Numeric Events

The toolkit includes over 30 common algorithms, and gives you access to over 300 popular open source algorithms through the Python for Scientific Computing library.

Get started by exploring interactive examples that step you through the entire process for IT, security, business and IoT use cases. When ready, choose a guided modeling assistant to step you through creating your own custom built model.

You also have complete access to the underlying SPL commands generated by the toolkit. This gives you the freedom to further customize your model and to operationalize it in any way desired.

Workflow options

The MLTK’s guided modeling assistants live within the six different areas listed above. There are 2 main workflows for these areas:

Experiment workflows are recommended, in that they manage the data source, algorithm used and any additional parameters to configure that algorithm, within one framework. An Experiment is an exclusive knowledge object in Splunk that keeps track of its settings and history, as well as its affiliated alerts and scheduled trainings.

Classic workflows are valuable if you are looking to quickly generate some SPL, but perhaps not working on a longer-term project.

Both the classic and experiment workflows follow the same major steps:

Specify a data source via a search bar.
Select an algorithm and algorithm parameters.

Click here to see a list of all supported algorithms
Some of the assistants provide the ability to apply multiple sequential transformations to your data. See preprocessing methods for information.

Select the fields for the algorithms to analyze, and set training/ test data splits.
Instruct the assistant to fit the algorithm to the selected training data and generate results, including visualizations and statistical analysis. The composition of this step will depend on the assistant in use:

Predict Numeric and Predict Categorical Fields use the fit model
Detect Numeric Outliers and Detect Categorical Outliers use detect outliers
Forecast Time Series uses forecast
Cluster Numeric Events uses cluster

Once complete, you can schedule regular re-training of the model and deploy it in your production environment as a scheduled alert.

Not all of the assistants, nor all of the algorithms result in a model being created.

Training data examples

The MLTK assistants, regardless of experiment or classic framework, support both supervised and unsupervised learning:

In supervised learning, the model learns from labeled examples through prediction, regression and forecasting methods.
In unsupervised learning, the model learns from unlabeled examples through clustering methods.

The MLTK guided modeling assistants

Select from the links below to see specific workflows for each guided modeling assistant within both the experiment and classic frameworks:

Experiment framework

Classic framework

Related answers from Splunk Community

Splunk Machine Learning Toolkit workflow

Overview

Workflow options

Training data examples

The MLTK guided modeling assistants

Comments

Was this topic useful?