How does the MLTK process work?
Machine learning is a process for gaining data insights and creating actionable machine learning models from your data. This machine learning process is comprised of a series of steps from data ingestion to model operationalization. The Machine Learning Toolkit (MLTK) operates like an extension to the Splunk platform and enables users to complete this machine learning process.
The machine learning process
The machine learning process is comprised of several generally accepted steps. The following image shows a common machine learning process:
Although the process typically begins with data collection, and ends with model deployment, it is not always a straightforward process and does not account for time spent finding and cleaning data. Data scientists and analysts also spend time experimenting with clean and collected data, evaluating experiment results and then adjusting experiment settings in order to generate good machine learning results that are suitable to put into operation.
Using the Machine Learning Toolkit, this entire process, from ingesting the data to model deployment, can all occur inside the Splunk platform.
The machine learning process within the MLTK
The Machine Learning Toolkit is a way to create custom machine learning outcomes. The MLTK enables a machine learning workflow using a suite of guided modelling Assistants. The MLTK can also be used outside of the guided framework with a series of machine learning specific Search Processing Language (SPL) commands and over 300 algorithms.
Once data is ingested you must explore that data to ensure it is suitable for and ready to be used in a machine learning process. Data you ingest into the MLTK is easily visualized in both tables and graphics. The Splunk platform and the MLTK also offer several methods through which you can clean and transform your data and address common data issues including the identification and removal of errors, addressing missing values, and potentially converting categorical values into numeric values.
- To learn about the options and methods to clean and transform your machine data, see Preparing your data for machine learning.
Data experimentation is the process of training your data and creating a working machine learning model. The Machine Learning Toolkit offers several machine learning commands and built-in algorithms through which you can perform data experimentation. The MLTK also offers guided machine learning workflows through a series of Smart Assistants and Experiment Assistants.
- To learn about the supported machine learning Search Processing Language (SPL) commands, see Search commands for machine learning.
- To learn about the algorithms available in the MLTK, see Algorithms in the Machine Learning Toolkit.
- To learn about the available suite of Smart Assistants, see Smart Assistant guided workflows.
- To learn about the available suite of Experiment Assistants, see Experiment Assistant guided workflows.
Evaluating results as you experiment with your data is an important part of getting a useful machine learning model. Are you asking the right question from your data? Do you have enough data, sufficiently cleaned data, or the right data to conduct your experiment? Do your experiment outcomes give you the results you expected? The MLTK guided modeling Assistants all include data visualizations through which you can quickly assess experiment results. You can also choose from a range of scoring metrics to measure your machine learning results.
- To learn about the available data visualizations in the MLTK, see Custom visualizations in the Machine Learning Toolkit.
- To learn about using the score command in the MLTK, see Scoring metrics in the Machine Learning Toolkit.
Tune and Iterate
As part of evaluating experiment results, you can tune and iterate the machine learning model. Adjusting model settings ensures you get the desired machine learning results prior to applying the model to unseen data and putting the model into operation. The MLTK guided Assistants make it simple to adjust model settings and gauge model performance improvement.
Repeat the steps of experimenting, evaluating, and tuning until you are ready to put your trained model into production.
A trained machine learning model is ready for deployment and application on new, never-before-seen data. As a best practice, regularly check your model outcomes, as well as the sources of the new data and make adjustment to your machine learning model settings as needed.
In the MLTK guided modelling Assistants, you can schedule model retraining, get alerted about model, and publish models.
Learn to use the MLTK by working through this User Guide or with the following links:
- To learn about implementing analytics and data science projects using Splunk's statistics, machine learning, built-in and custom visualization capabilities, see the Splunk for Analytics and Data Science course.
- To learn more about adding and searching your data in the Splunk platform, begin with the Search Tutorial.
- To see a series of MLTK use-cases based on different machine learning goals, see the MLTK Showcases in the Machine Learning Toolkit.
- To read more about the available data sets within the MLTK you can use to explore machine learning without needing to ingest your own data, see MLTK data set credits.
- To learn about installing the Machine Learning Toolkit, see Installing the MLTK.
- To learn about companion apps, cheat-sheets, videos, and courses see Learn more about the Machine Learning Toolkit.
- To learn about further support available for the Machine Learning Toolkit, see the Support for the Machine Learning Toolkit.
Welcome to the Machine Learning Toolkit
Machine Learning Toolkit Showcase
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.0.0, 5.1.0, 5.2.0, 5.2.1, 5.2.2, 5.3.0, 5.3.1