Splunk Machine Learning Toolkit workflow

Overview

Machine learning is a process for generalizing from examples. These generalizations, typically called models, are used to perform a variety of tasks such as predicting the value of a field, forecasting future values, identifying patterns in data, and detecting anomalies from new (never-before-seen) data. Machine learning isn't magic, it's a process that starts with a question. For example:

Am I being hacked?
How hot are the servers?
How many visits to my site do I expect in the next hour?
What is the price range of houses in a particular neighborhood?

The Splunk Machine Learning Toolkit lets users create analytics in 6 useful areas: :

Predict Numeric Fields
Predict Categorical Fields
Detect Numeric Outliers
Detect Categorical Outliers
Forecast Time Series
Cluster Numeric Events

The toolkit includes over 30 common algorithms, and gives you access to over 300 popular open source algorithms through the Python for Scientific Computing library.

Get started by exploring interactive examples that step you through the entire process for IT, security, business and IoT use cases. When ready, choose a guided modeling assistant to step you through creating your own custom built model.

You also have complete access to the underlying SPL commands generated by the toolkit. This gives you the freedom to further customize your model and to operationalize it in any way desired.

Workflow options

The MLTK's guided modeling assistants live within the six different areas listed above. There are 2 main workflows for these areas:

Both the Classic and Experiment workflows follow the same major steps:

Specify a data source via the search bar.
Select an algorithm and algorithm parameters.

Click here to see a list of all supported algorithms
Some of the assistants provide the ability to apply multiple sequential transformations to your data. See preprocessing methods for information.

Select the fields for the algorithms to analyze, and set training/ test data splits.
Instruct the assistant to fit the algorithm to the selected training data and generate results, including visualizations and statistical analysis. Commands can vary depending upon which Assistant is selected:

Predict Numeric and Predict Categorical Fields use the fit and apply commands
Forecast Time Series can use the predict command
Cluster Numeric Events uses the fit and apply commands

Once complete, optionally schedule regular re-training of the model and deploy it in your production environment as a scheduled alert.

Not all of the Assistants, nor all of the algorithms result in a model being created.

Training data examples

The MLTK Assistants, regardless of Experiment or Classic framework, support both supervised and unsupervised learning:

In supervised learning, the model learns from labeled examples through prediction, regression and forecasting methods.
In unsupervised learning, the model learns from unlabeled examples through clustering methods.

The MLTK guided modeling Assistants

Select from the following links to see specific workflows for each guided modeling Assistant within both the experiment and classic frameworks:

Related answers from Splunk Community

Splunk Machine Learning Toolkit workflow

Overview

Workflow options

Training data examples

The MLTK guided modeling Assistants

Experiment framework

Classic framework

Comments

Was this topic useful?