Splunk® Machine Learning Toolkit

User Guide

What's new in MLTK

Here's what's new in each version of the Splunk Machine Learning Toolkit (MLTK).

Version 5.5.0

MLTK version 5.5.0 requires version 3.2.2 or 4.2.2 of the PSC add-on, and version 3.9 or higher of Python.

Make sure you use compatible versions of the MLTK app and PSC add-on, see Splunk Machine Learning Toolkit version dependencies.

Features and improvements

  • Enhancements to the DensityFunction algorithm. The new supervise_split_by parameter can be set to true or false.
    • When set to true, the fields entered in the by clause are used by a decision tree algorithm to automatically generate groups in the dataset.
  • Changes have been made to what anonymized data the Machine Learning Toolkit as deployed on Splunk Enterprise sends Splunk Inc. For details, see Share data in the Machine Learning Toolkit.
  • Patches for security vulnerabilities, including an upgrade of the OpenSSL library in PSC versions 3.2.2 and 4.2.2.

Version 5.4.2

MLTK version 5.4.2 is a maintenance and patch release.

Make sure you use compatible versions of the MLTK app and PSC add-on, see Splunk Machine Learning Toolkit version dependencies.

Features and improvements

MLTK version 5.4.2 is a maintenance and patch release:

  • If you choose to upgrade to MLTK version 5.4.2 and you have trained models that are built using certain algorithms, then you will need to retrain those models. For details, see Upgrade the Splunk Machine Learning Toolkit.
    • This incompatibility for certain models with lower versions of MLTK is the result of the PSC add-on upgrading scikit-learn from version 0.24.2 to version 1.5.1.
  • If you use the ML-SPL API additional steps might also be required to validate any custom algorithms. See the Adding a custom algorithm to the Splunk Machine Learning Toolkit in the ML-SPL API Guide for more information.
  • If you choose to upgrade to MLTK version 5.4.2 you must use Python for Scientific Computing (PSC) add-on version 3.2.1 or 4.2.1. Both MLTK and PSC must be updated.

For ITSI customers

ITSI customers using Predictive Analytics must upgrade both MLTK and PSC. After upgrading, if you use the Predictive Analytics functionality, your existing models will break and each service using Predictive Analytics will require retraining of the models.

For every service using Predictive Analytics, re-train the models using the following steps:

  1. Select Train in Predictive Analytics to retrain the model.
  2. Then, select Save to save the retrained model to the service definition.

For more information, see Retrain a predictive model in ITSI in the ITSI Service Insights Manual.

All correlation search alerts must be set up again. This is because once the models are retrained in ITSI, the model names change. Since existing correlation searches have the old model name embedded in them, they will not function without deletion and recreation.

If you have correlation searches set up for alerting then you can fail silently because the apply command is run by a periodic saved search. Meaning there is a recreation of correlation searches after retraining, even if no errors are observed. For more information, see Create an alert for potential service degradation in ITSI in the ITSI Service Insights Manual.

All glass tables using the old models must be updated and saved to use the retrained models.

Version 5.4.1

MLTK version 5.4.1 requires version 3.1.0, 4.1.0, 4.1.2, or 4.2.0 of the PSC add-on. These PSC versions include the ONNX library required for the ONNX model upload feature.

Features and improvements

  • MLTK version 5.4.1 is a maintenance and patch release.
  • This version addresses the Experiments page not properly loading for some users. For more information, see Fixed issues.
  • This version includes as keyword support for inferencing ONNX models. To learn more, see Upload and inference pre-trained ONNX models in MLTK.
  • A new deep dive topic is available. See, Deep dive: Inference externally trained ONNX models with MLTK.
  • PSC version 4.2.0 is available. This version includes the following features:
    • Python libraries that monitor access and file transfers to the AWS S3 buckets: boto3, botocore, and s3transfer.
    • Scalable library for matrix profiling and time series data mining tasks: stumpy
    • Libraries with Dynamic Time Warping (DTW) algorithms for optimal alignment with O(N) time and memory complexity: fastdw
    • New versions of math libraries for intel and intel compatible processors.
    • Packages that implement pythonic file system spec and HTTP thread-safe connection pooling: fsspec, urllib3, and requests
    • This version includes the utility packages chardet, future, idna, and six.
  • Changes have been made to what anonymized data the Machine Learning Toolkit as deployed on Splunk Enterprise sends Splunk Inc. For details, see Share data in the Machine Learning Toolkit.

Version 5.4.0

MLTK version 5.4.0 requires version 3.1.0, 4.1.0, or 4.1.2 of the PSC add-on. These PSC versions include the ONNX library required for the new ONNX model upload feature in MLTK version 5.4.0.

Features and improvements

Version 5.3.3

Features and improvements

  • In version 5.3.3, the StateSpaceForecast algorithm scoring metric values are based on the holdback period data.
  • The Classic tab of the MLTK app is removed in version 5.3.3. The features of the Classic tab are available in the Experiments tab.
  • Deprecated support of Internet Explorer.
  • Version 4.0.0 of the PSC add-on release. This version provides updates and adds several libraries in the package. In particular, Pytorch, cpuonly, transformers, onnxruntime, pydantic, and watchdog.

Version 4.0.0 of the PSC add-on is only available for MLTK version 5.3.3. Users upgrading to version 4.0.0 of the PSC add-on must follow some additional installation steps. See Install version 4.0.0 of the Python for Scientific Computing add-on.

The build size of PSC version 4.0.0 might exceed the default value of max_upload_size which can prevent you from installing the package using the "Install app from file" option under Manage Apps. To install PSC 4.0.0 you must create a web.conf file , update max_upload_size to a higher value, and restart Splunk from your terminal.

Version 5.3.1

Features and improvements

  • The new parameter of exclude_dist is available for the Density Function algorithm. Use this parameter when dist=auto to exclude a minimum of 1 and a maximum of 3 of the available distribution types (norm, expon, gaussian_kde, beta). For more information, see Density Function.
  • Version 3.0.2 of the PSC add-on is now available. This version of PSC is compatible with both version 5.3.0 and 5.3.1 of the MLTK app. This PSC version does not include any new features, and addresses bug fixes only.
  • The streaming_apply feature was deprecated in version 5.0.0 of MLTK, and has now been removed from the app. This feature has been removed to prevent bundle replication performance issues on index clusters.

Users upgrading to MLTK version 5.3.0 or higher must retrain models created in lower versions of MLTK.

Version 5.3.0

Features and improvements

On November 16, 2021 version 3.0.1 of the Python for Scientific Computing (PSC) add-on was released.

This version is limited to configuration updates for deployment on Splunk Cloud Platform.

MLTK versions 5.2.2 or lower are not compatible with this version of PSC. Incompatibility with lower MLTK versions is due to the introduction of several libraries in PSC version 3.0.0. In particular, Numpy, Scipy, scikit-learn, Statsmodels, and Networkx were upgraded to their latest available versions with PSC 3.0.0.
If you are upgrading to version 3.0.0 or higher of PSC, you must also upgrade your installation of MLTK to version 5.3.0 or higher.

Users upgrading to MLTK version 5.3.0 or higher must retrain models created in lower versions of MLTK.

Version 5.3.0

Features and improvements

This version of MLTK requires version 3.0.0 of the Python for Scientific Computing (PSC) add-on. This release of PSC brings updates to several libraries in the package including Numpy, Scipy, scikit-learn, Statsmodels, and Networkx. 

Users upgrading to MLTK version 5.3.0 must retrain models created in lower versions of MLTK.

Version 5.2.2

Features and improvements

  • There are no new features in MLTK version 5.2.2.
  • There is a minor update to MLTK version 5.2.2. The jQuery dependency has been upgraded from jQuery2 to jQuery3 to address vulnerability concerns of jQuery2.
  • To confirm you are using a compatible version of MLTK, see the Machine Learning Toolkit version dependencies matrix.


Version 5.2.1

Features and improvements

  • There are no new features in MLTK version 5.2.1
  • The minor update to MLTK 5.2.1 is that changes to any MLTK configuration (conf) files do not trigger a search head restart.
  • To confirm you are using a compatible version of MLTK, see the Machine Learning Toolkit version dependencies matrix.

Version 5.2.0

Features and improvements

  • Introduction of the Smart Prediction Assistant. This Assistant uses the AutoPrediction algorithm to determine the data type as categorical or numeric and then carries out the prediction.
  • Gain familiarity with the new Smart Prediction Assistant with two new Showcase examples. Use these Showcases to review the step-by-step user interface prior to working with your own data.
  • The Density Function algorithm now supports partial_fit for incremental learning.
  • The Density Function algorithm now supports the continuous probability density function of Beta distribution.
  • Introduction of the AutoPrediction algorithm. Use AutoPrediction to predict the value of a categorical field or to predict the value of a numeric field.
  • Introduction of the G-means algorithm for clustering.
  • All MLTK Smart Assistants now offer the option to pull Metrics data stored in the Splunk platform into the Assistant without the need to write any SPL.
  • The custom visualization of Heatmap Plot is now available. Use the Heatmap Plot to show data values as colors in a table matrix.
  • The Learn step of the Smart Outlier Detection Assistant now offers two new features:
    • Option to use a preprocessing step by which you can extract indexed time features from your data
    • Option to choose the data distribution type of Beta
  • This version of MLTK offers the option to use version 2.0.0 or 2.0.1 of the PSC add-on. Version 2.0.1 is only different in a minor library upgrade with no differences in functionality to version 2.0.0.
  • Changes have been made to what anonymized data the Machine Learning Toolkit as deployed on Splunk Enterprise sends Splunk Inc. For details, see Share data in the Machine Learning Toolkit.

Version 5.1.0

Features and improvements

  • Introduction of the Smart Clustering Assistant. This Assistant offers enhanced event partition for users with little to no SPL knowledge, and leverages the K-Means algorithm.
  • Gain familiarity with the new Smart Clustering Assistant with two new Showcase examples. Use these Showcases to review the updated user interface prior to working with your own data.
  • The Gaussian KDE density function option within the Density Function algorithm now supports the lower_threshold parameter.

Version 5.0.0

Features and improvements

  • Python 3 now serves as the basis for the Python for Scientific Computing (PSC) add-on. This version is required in order to use MLTK version 5.0.0. Use our quick reference document to ensure you are running the correct version combinations of MLTK, the PSC add-on, and Splunk Enterprise. See Machine Learning Toolkit version dependencies.
  • Introduction of the Smart Outlier Detection Assistant. This Assistant offers enhanced numeric anomaly detection for users with little to no SPL knowledge, and leverages the DensityFunction algorithm.
  • Gain familiarity with the new Smart Outlier Detection Assistant with two new Showcase examples. Use these Showcases to click through the updated user interface and view the outlier detection parameter options prior to working with your own data.
  • The Showcase of end-to-end MLTK Assistant examples has been updated with an improved interface and new filtering options. See, Machine Learning Toolkit Showcase.
  • Energy distance is now available as a statstest scoring method.
  • The regression algorithm System Identification is now available.
  • A new custom visualization called Distribution Plot is now available. Use Distribution Plot to show the output of the DensityFunction algorithm.
  • The Density Function algorithm now supports the random_state parameter.
  • The StateSpaceForecast algorithm now supports the wildcard (*) character.
  • Users may begin seeing a security warning dialog box when calling the fit command. For more details including how to turn off this warning, see Why am I seeing a security warning with the fit command.
  • A quick reference document has been created so you can easily see which MLTK algorithms support the ML-SPL commands of fit, apply, summary, and partial fit. See Algorithm support of key ML-SPL commands quick reference.
  • The handy cheat-sheet for all ML-SPL commands and machine learning algorithms available in MLTK has been updated. View and download the Machine Learning Toolkit Quick Reference Guide in English or Japanese.

Version 4.5.0

Features and improvements

  • This version of MLTK offers the majority of the features of MLTK version 5.0.0, but is compatible with Python versions 2.x and Splunk Enterprise versions 7.x. For specific version dependencies, see Machine Learning Toolkit version dependencies.

The random_state parameter of the DensityFunction anomaly detection algorithm is only available in MLTK version 5.0.0 and above. This parameter is not supported in version 4.5.0 of MLTK.

Last modified on 17 October, 2024
Support for the Splunk Machine Learning Toolkit   Known issues

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.5.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters