Splunk Machine Learning Toolkit workflow
Overview
Machine learning is a process for generalizing from examples. These generalizations, typically called models, are used to perform a variety of tasks such as predicting the value of a field, forecasting future values, identifying patterns in data, and detecting anomalies from new (never-before-seen) data. Machine learning isn't magic, it's a process that starts with a question. For example:
- Am I being hacked?
- How hot are the servers?
- How many visits to my site do I expect in the next hour?
- What is the price range of houses in a particular neighborhood?
The Splunk Machine Learning Toolkit lets users create analytics in 6 useful areas: :
- Predict Numeric Fields
- Predict Categorical Fields
- Detect Numeric Outliers
- Detect Categorical Outliers
- Forecast Time Series
- Cluster Numeric Events
The toolkit includes over 30 common algorithms, and gives you access to over 300 popular open source algorithms through the Python for Scientific Computing library.
Get started by exploring interactive examples that step you through the entire process for IT, security, business and IoT use cases. When ready, choose a guided modeling assistant to step you through creating your own custom built model.
You also have complete access to the underlying SPL commands generated by the toolkit. This gives you the freedom to further customize your model and to operationalize it in any way desired.
Workflow options
The MLTK's guided modeling assistants live within the six different areas listed above. There are 2 main workflows for these areas:
Both the Classic and Experiment workflows follow the same major steps:
- Specify a data source via the search bar.
- Select an algorithm and algorithm parameters.
- Click here to see a list of all supported algorithms
- Some of the assistants provide the ability to apply multiple sequential transformations to your data. See preprocessing methods for information.
- Select the fields for the algorithms to analyze, and set training/ test data splits.
- Instruct the assistant to fit the algorithm to the selected training data and generate results, including visualizations and statistical analysis. Commands can vary depending upon which Assistant is selected:
- Predict Numeric and Predict Categorical Fields use the
fit
andapply
commands - Forecast Time Series can use the
predict
command - Cluster Numeric Events uses the
fit
andapply
commands - Once complete, optionally schedule regular re-training of the model and deploy it in your production environment as a scheduled alert.
Not all of the Assistants, nor all of the algorithms result in a model being created.
Training data examples
The MLTK Assistants, regardless of Experiment or Classic framework, support both supervised and unsupervised learning:
- In supervised learning, the model learns from labeled examples through prediction, regression and forecasting methods.
- In unsupervised learning, the model learns from unlabeled examples through clustering methods.
The MLTK guided modeling Assistants
Select from the following links to see specific workflows for each guided modeling Assistant within both the experiment and classic frameworks:
Experiment framework
- Predict Numeric Fields Experiment Assistant
- Predict Categorical Fields Experiment Assistant
- Detect Numeric Outliers Experiment Assistant
- Detect Categorical Outliers Experiment Assistant
- Forecast Time Series Experiment Assistant
- Cluster Numeric Events Experiment Assistant
Classic framework
Understanding the fit and apply commands | Search commands for machine learning |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.0.0, 4.1.0, 4.2.0, 4.3.0
Feedback submitted, thanks!