Splunk® Machine Learning Toolkit

User Guide

What is the MLTK process?

Machine learning is a process for gaining data insights and creating actionable machine learning models from your data. This machine learning process is comprised of a series of steps from data ingestion to model operationalization. The Splunk Machine Learning Toolkit (MLTK) operates like an extension to the Splunk platform and enables users to complete this machine learning process.

The machine learning process

The machine learning process is comprised of several generally accepted steps. The following image shows a common machine learning process:

This image shows a graphic representation of the machine learning process. Steps displayed include Collect Data, Explore Data, Experiment, Evaluate Results, Tune and Iterate, and Deploy Model. Arrows connect these steps in order.

Although the process typically begins with data collection, and ends with model deployment, it is not always a straightforward process and does not account for time spent finding and cleaning data. Data scientists and analysts also spend time experimenting with clean and collected data, evaluating experiment results and then adjusting experiment settings in order to generate good machine learning results that are suitable to put into operation.

Using the Splunk Machine Learning Toolkit, this entire process, from ingesting the data to model deployment, can all occur inside the Splunk platform.

The machine learning process within MLTK

The Splunk Machine Learning Toolkit is a way to create custom machine learning outcomes. MLTK enables a machine learning workflow using a suite of guided modeling Assistants. MLTK can also be used outside of the guided framework with a series of machine learning specific Search Processing Language (SPL) commands and over 30 algorithms.

Explore Data

Once data is ingested you must explore that data to ensure it is suitable for and ready to be used in a machine learning process. Data you ingest into MLTK is easily visualized in both tables and graphics. The Splunk platform and MLTK also offer several methods through which you can clean and transform your data and address common data issues including the identification and removal of errors, addressing missing values, and potentially converting categorical values into numeric values.

Experiment

Data experimentation is the process of training your data and creating a working machine learning model. The Splunk Machine Learning Toolkit offers several machine learning commands and built-in algorithms through which you can perform data experimentation. MLTK also offers guided machine learning workflows through a series of Smart Assistants and Experiment Assistants.

Evaluate Results

Evaluating results as you experiment with your data is an important part of getting a useful machine learning model. Are you asking the right question from your data? Do you have enough data, sufficiently cleaned data, or the right data to conduct your experiment? Do your experiment outcomes give you the results you expected? MLTK guided modeling Assistants all include data visualizations through which you can quickly assess experiment results. You can also choose from a range of scoring metrics to measure your machine learning results.

Tune and Iterate

As part of evaluating experiment results, you can tune and iterate the machine learning model. Adjusting model settings ensures you get the desired machine learning results prior to applying the model to unseen data and putting the model into operation. MLTK guided Assistants make it simple to adjust model settings and gauge model performance improvement.

Repeat the steps of experimenting, evaluating, and tuning until you are ready to put your trained model into production.

Deploy Model

A trained machine learning model is ready for deployment and application on new, never-before-seen data. As a best practice, regularly check your model outcomes, as well as the sources of the new data and make adjustment to your machine learning model settings as needed.

In the MLTK guided modelling Assistants, you can schedule model retraining, get alerted about model, and publish models.

Learn more

Learn to use MLTK by working through this User Guide or with the following links:

Last modified on 22 November, 2024
Welcome to the Machine Learning Toolkit   Machine Learning Toolkit Showcase

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.0.0, 5.1.0, 5.2.0, 5.2.1, 5.2.2, 5.3.0, 5.3.1, 5.3.3, 5.4.0, 5.4.1, 5.4.2, 5.5.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters