What's new
Here's what's new in each version of the Splunk Machine Learning Toolkit.
To ensure you are using compatible versions of the MLTK app and PSC add-on, see the Machine Learning Toolkit version dependencies matrix.
Version 5.3.3
Features and improvements
- In version 5.3.3, the StateSpaceForecast algorithm scoring metric values are based on the
holdback
period data. - The Classic tab of the MLTK app is removed in version 5.3.3. The features of the Classic tab are available in the Experiments tab.
- Deprecated support of Internet Explorer.
- Version 4.0.0 of the PSC add-on release. This version provides updates and adds several libraries in the package. In particular, Pytorch, cpuonly, transformers, onnxruntime, pydantic, and watchdog. Version 4.0.0 of the PSC add-on is only available for MLTK version 5.3.3.
Version 4.0.0 of the PSC add-on is only available for MLTK version 5.3.3. Users upgrading to version 4.0.0 of the PSC add-on must follow some additional installation steps. See Install version 4.0.0 of the Python for Scientific Computing add-on.
The build size of PSC version 4.0.0 might exceed the default value of max_upload_size
which can prevent you from installing the package using the "Install app from file" option under Manage Apps. To install PSC 4.0.0 you must create a web.conf file , update max_upload_size
to a higher value, and restart Splunk from your terminal.
Version 5.3.1
Features and improvements
- The new parameter of
exclude_dist
is available for the Density Function algorithm. Use this parameter when dist=auto to exclude a minimum of 1 and a maximum of 3 of the available distribution types (norm, expon, gaussian_kde, beta). For more information, see Density Function. - Version 3.0.2 of the PSC add-on is now available. This version of PSC is compatible with both version 5.3.0 and 5.3.1 of the MLTK app. This PSC version does not include any new features, and addresses bug fixes only.
- The
streaming_apply
feature was deprecated in version 5.0.0 of MLTK, and has now been removed from the app. This feature has been removed to prevent bundle replication performance issues on index clusters.
Users upgrading to MLTK version 5.3.0 or higher must retrain models created in lower versions of MLTK.
Version 5.3.0
Features and improvements
On November 16, 2021 version 3.0.1 of the Python for Scientific Computing (PSC) add-on was released.
This version is limited to configuration updates for deployment on Splunk Cloud Platform.
MLTK versions 5.2.2 or lower are not compatible with this version of PSC. Incompatibility with lower MLTK versions is due to the introduction of several libraries in PSC version 3.0.0. In particular, Numpy, Scipy, scikit-learn, Statsmodels, and Networkx were upgraded to their latest available versions with PSC 3.0.0.
If you are upgrading to version 3.0.0 or higher of PSC, you must also upgrade your installation of MLTK to version 5.3.0 or higher.
Users upgrading to MLTK version 5.3.0 or higher must retrain models created in lower versions of MLTK.
Version 5.3.0
Features and improvements
This version of the MLTK requires version 3.0.0 of the Python for Scientific Computing (PSC) add-on. This release of PSC brings updates to several libraries in the package including Numpy, Scipy, scikit-learn, Statsmodels, and Networkx.
Users upgrading to MLTK version 5.3.0 must retrain models created in lower versions of MLTK.
Version 5.2.2
Features and improvements
- There are no new features in MLTK version 5.2.2.
- There is a minor update to MLTK version 5.2.2. The jQuery dependency has been upgraded from jQuery2 to jQuery3 to address vulnerability concerns of jQuery2.
- To confirm you are using a compatible version of the MLTK, see the Machine Learning Toolkit version dependencies matrix.
Version 5.2.1
Features and improvements
- There are no new features in MLTK version 5.2.1
- The minor update to MLTK 5.2.1 is that changes to any MLTK configuration (conf) files do not trigger a search head restart.
- To confirm you are using a compatible version of the MLTK, see the Machine Learning Toolkit version dependencies matrix.
Version 5.2.0
Features and improvements
- Introduction of the Smart Prediction Assistant. This Assistant uses the AutoPrediction algorithm to determine the data type as categorical or numeric and then carries out the prediction.
- Gain familiarity with the new Smart Prediction Assistant with two new Showcase examples. Use these Showcases to review the step-by-step user interface prior to working with your own data.
- The Density Function algorithm now supports
partial_fit
for incremental learning. - The Density Function algorithm now supports the continuous probability density function of Beta distribution.
- Introduction of the AutoPrediction algorithm. Use AutoPrediction to predict the value of a categorical field or to predict the value of a numeric field.
- Introduction of the G-means algorithm for clustering.
- All MLTK Smart Assistants now offer the option to pull Metrics data stored in the Splunk platform into the Assistant without the need to write any SPL.
- The custom visualization of Heatmap Plot is now available. Use the Heatmap Plot to show data values as colors in a table matrix.
- The Learn step of the Smart Outlier Detection Assistant now offers two new features:
- Option to use a preprocessing step by which you can extract indexed time features from your data
- Option to choose the data distribution type of Beta
- This version of the MLTK offers the option to use version 2.0.0 or 2.0.1 of the PSC add-on. Version 2.0.1 is only different in a minor library upgrade with no differences in functionality to version 2.0.0.
- Changes have been made to what anonymized data the Machine Learning Toolkit as deployed on Splunk Enterprise sends Splunk Inc. For details, see Share data in the Machine Learning Toolkit.
Version 5.1.0
Features and improvements
- Introduction of the Smart Clustering Assistant. This Assistant offers enhanced event partition for users with little to no SPL knowledge, and leverages the K-Means algorithm.
- Gain familiarity with the new Smart Clustering Assistant with two new Showcase examples. Use these Showcases to review the updated user interface prior to working with your own data.
- The Gaussian KDE density function option within the Density Function algorithm now supports the
lower_threshold
parameter.
Version 5.0.0
Features and improvements
- Python 3 now serves as the basis for the Python for Scientific Computing (PSC) add-on. This version is required in order to use MLTK version 5.0.0. Use our quick reference document to ensure you are running the correct version combinations of MLTK, the PSC add-on, and Splunk Enterprise. See Machine Learning Toolkit version dependencies.
- Introduction of the Smart Outlier Detection Assistant. This Assistant offers enhanced numeric anomaly detection for users with little to no SPL knowledge, and leverages the DensityFunction algorithm.
- Gain familiarity with the new Smart Outlier Detection Assistant with two new Showcase examples. Use these Showcases to click through the updated user interface and view the outlier detection parameter options prior to working with your own data.
- The Showcase of end-to-end MLTK Assistant examples has been updated with an improved interface and new filtering options. See, Machine Learning Toolkit Showcase.
- Energy distance is now available as a statstest scoring method.
- The regression algorithm System Identification is now available.
- A new custom visualization called Distribution Plot is now available. Use Distribution Plot to show the output of the DensityFunction algorithm.
- The Density Function algorithm now supports the
random_state
parameter. - The StateSpaceForecast algorithm now supports the wildcard (*) character.
- Users may begin seeing a security warning dialog box when calling the
fit
command. For more details including how to turn off this warning, see Why am I seeing a security warning with the fit command. - A quick reference document has been created so you can easily see which MLTK algorithms support the ML-SPL commands of fit, apply, summary, and partial fit. See Algorithm support of key ML-SPL commands quick reference.
- The handy cheat-sheet for all ML-SPL commands and machine learning algorithms available in the MLTK has been updated. View and download the Machine Learning Toolkit Quick Reference Guide in English or Japanese.
Version 4.5.0
Features and improvements
- This version of the MLTK offers the majority of the features of MLTK version 5.0.0, but is compatible with Python versions 2.x and Splunk Enterprise versions 7.x. For specific version dependencies, see Machine Learning Toolkit version dependencies.
The random_state
parameter of the DensityFunction anomaly detection algorithm is only available in MLTK version 5.0.0 and above. This parameter is not supported in version 4.5.0 of the MLTK.
Version 4.4.2
Features and improvements
- This version of the MLTK is compatible with Python versions 2.x and Splunk Enterprise versions 7.x. For specific version dependencies, see Machine Learning Toolkit version dependencies.
Version 4.4.1
Features and improvements
- Addressed an issue preventing models created in version 4.3.0 of the MLTK using the DensityFunction algorithm from loading into version 4.4.0 of the MLTK.
Version 4.4.0
Features and improvements
- The Smart Forecasting Assistant now supports multivariate forecasting. For highlights of this enhancement, see the Smart Forecasting Assistant document.
- A new Smart Forecasting Showcase example steps you through the forecasting of app expenses from multiple variable.
- Analysis of Variance (Anova) is now available as a statstest score command option.
- The Density Function algorithm now supports multiple thresholds. Multiple thresholds enable you to run your different threshold values all at once rather than one by one, getting all your outliers returned faster.
- The Density Function algorithm now supports min and max values in the
summary
command. - The
full_sample
parameter is now available for use with the Density Function algorithm. - The
show_options
parameter is now available for use with the Density Function algorithm. - New Experiments created using either the Predict Categorical Fields or Predict Numeric Fields Assistants now default to a 70-30 training and testing data split. The previous default split was 50-50.
- MLTK dashboards now support dark theme. For more information, see Dashboards and Visualizations.
- To increase the ease of use and clarity of content, version 4.4.0 of the MLTK documentation has an improved chapter and topic order, as well as updated chapter and topic naming. Reach out to an MLTK support resource in the event you are unable to find the content you're looking for. For support options, see Support for the Machine Learning Toolkit.
Version 4.3.0
Features and improvements
- Introduction of the Smart Forecasting Assistant. This Assistant offers enhanced time-series analysis for users with little to no SPL knowledge, and leverages the StateSpaceForecasting algorithm.
- Gain familiarity with the new Smart Forecasting Assistant with three new Showcase examples. Use these Showcases to click through the updated user interface and view forecast parameter options prior to working with your own data.
- Introduction of the NPR algorithm for feature extraction.
- A new document covering methods for preparing your data for machine learning is now available.
- The
sample
parameter is now available for use with the DensityFunction algorithm. - Time-saving MLTK macros are now documented for your review and use.
In order to save models, users need the upload_lookup_files
capability included in their role.
Version 4.2.0
Features and improvements
- Pairwise distances scoring now fully supports the wildcard (*) character.
- Pairwise distances scoring now supports the Kolmogorov-Smirnov (2 samples) and Wasserstein distance metrics.
- Wasserstein distance is now available as a statstest scoring method.
- The following score commands now support the wildcard (*) character in 1-to-n cases: Silhouette score, Accuracy score, F1-score, Precision score, Recall score, R2 score, Explained variance score, Mean absolute error score, Mean squared error score, T-test (2 independent samples) score, T-test (2 related samples) score.
- Introduction of a 3D Scatter Plot visualization. Use this visualization to see clusters of similar data points, or to drill down to see singular data points.
- Introduction of the StateSpaceForecast algorithm for time series analysis.
- Introduction of the ICA algorithm for preprocessing.
- Introduction of the DensityFunction algorithm for anomaly detection.
- Models created using the PCA algorithm can be inspected with the
summary
command. - Models created using the FieldSelector algorithm can be inspected with the
summary
command. - Time-saving MLTK macros are now documented for your review and use.
- Experiment Assistant alerts have been updated. When creating an alert, you now go directly to the Save As Alert window. Use this space to select from standard Trigger Conditions, as well as new Machine Learning Conditions.
Version 4.1.0
Features and improvements
- The wildcard character (*) is now enabled for single array scoring methods. For more information, see Using the score command.
- Introduction of the Hashing vectorizer algorithm.
- Introduction of the Pairwise distances scoring method.
- Introduction of the Imputer preprocessing algorithm.
- When using the BernoulliNB algorithm, GaussianNB algorithm, and MLPClassifier algorithm you can now inspect trained models using the
summary
command. - The
variance
parameter is now available when using the PCA algorithm. - The
anomaly_score
parameter is now available when using the LocalOutlierFactor algorithm. - The
fit_intercept
andnormalize
parameters are now available when using the Lasso algorithm. - The Box plot visualization option has been updated and improved.
- The Machine Learning Toolkit as deployed on Splunk Enterprise now sends anonymized data to Splunk Inc. Learn more here.
- An updated version of the Machine Learning Toolkit Quick Reference Guide now available. Use as a handy cheat sheet of ML-SPL commands and machine learning algorithms in the Splunk Machine Learning Toolkit.
- We've also published a new document to capture the most Frequently Asked Questions regarding the Machine Learning Toolkit.
Version 4.0.0
Features and improvements
- A number of new demos have been added to the Showcase tab based on customer requests.
- The Experiment History tab now captures the model history from scheduled retraining.
- Introduction of the LocalOutlierFactor algorithm. Accessing this algorithm requires an upgrade to PSC version 1.3.
- The MLPClassifier algorithm now supports
partial_fit
for incremental learning. - Introduction of the score command to validate models and statistical tests for any use case.
- Introduction of k-fold cross validation as a method to help test for model overfitting.
- A new tab called Settings is now part of the MLTK nav bar. Users with admin access can work within this interface to configure the toolkit settings of the
fit
andapply
commands. Ensure you know the impact of setting changes by adding the ML-SPL Performance App for the Machine Learning Toolkit to your setup. - Customers can now share and reuse custom algorithms in the Splunk Community for MLTK on GitHub.
- The Splunk MLTK Connector for Apache Spark™ allows users to leverage their own Spark clusters for large data sets. This is a limited availability release. Reach out to Splunk's ML Spark team for LAR application access.
- The Splunk MLTK Container for TensorFlow™ is an add-on docker container, allowing multiple local GPU/ CPU acceleration. This is available via Splunk's Professional Services department.
- For bug fixes, see Fixed issues.
Version 3.4.0
Features and improvements
- Version 1.3 of the Python for Scientific Computing add-on is now available in Splunkbase. Upgrading to version 3.4 of the MLTK requires upgrading to PSC version 1.3 .
- Introduction of the MLPClassifier algorithm. Accessing this algorithm requires an upgrade to PSC version 1.3.
- Introduction of Boxplot Chart to the search visualization options.
- Models created within the Experiments framework can now be published and more easily used outside of the MLTK environment.
- For bug fixes, see Fixed issues.
Upgrading to version 3.4.0 of the MLTK requires upgrading to version 1.3 of the Python for Scientific Computing add-on. Two previous versions of the MLTK (3.2.0 and 3.3.0) will successfully operate on version 1.2 or 1.3 of the PSC add-on. However, users cannot access new features in the 3.4.0 MLTK without updating to that version.
Version 3.3.0
Features and improvements
- Introduction of another preprocessing method - TFIDF (term frequency-inverse document frequency).
- Addition of the RobustScaler algorithm.
- For bug fixes, see Fixed issues.
Version 3.4.0 of the MLTK will require an update to Version 1.3 of Python for Scientific Computing. The release for the MLTK 3.4.0 will coincide with the availability of PSC 1.3 in Splunkbase.
Version 3.2.0
Features and improvements
- Introduction of the Experiment Management Framework. This framework ties the experiment, along with any alerts or scheduled trainings, together. Users can now see which alerts or scheduled trainings are assigned to any experiment, and which experiment has or has not undergone preprocessing steps.
- Relocation within the Machine Learning Toolkit of the previously free-standing Assistant module. This version of Assistants now lives under the Legacy tab of the MLTK bar. It is recommended that you do not create Models via this version of Assistants, and instead create Models via the Experiments Management Framework. Doing so will ensure that you can both:
- Create Alerts and Scheduled Trainings on the saved Experiment
- See Alerts and Scheduled Trainings organized by Experiment
- The Splunk Machine Learning Toolkit version 3.2 does not support Splunk Enterprise version 6.4 or earlier.
- For bug fixes, see Fixed issues.
Version 3.1.0
Features and improvements
- The FieldSelector algorithm can now be used in the preprocessing panel. See FieldSelector in the Machine Learning Toolkit User Guide.
- The maximum number of distinct values supported for categorical fields, formerly 100, can now be configured for both features fields and target fields.
- Use max_distinct_cat_values to change the setting for feature fields, the input to your ML algorithm.
- Use max_distinct_cat_values_for_classifiers to change the setting for a target field, the field you are trying to predict, in classifier algorithms.
- The Splunk Machine Learning Toolkit has a new clustering algorithm:
- For bug fixes, see Fixed issues.
Version 3.0.0
Features and improvements
Introduced a new interface for managing models. You can now easily see what types of models you have, inspect the settings of each model (such as which variables were used to train it), and view or update each model's sharing settings. For more information, see Manage models.
Version 2.4.0
Features and improvements
- You must update any custom algorithms from earlier releases (before 2.4.0) to use the algos.conf file. See Register an algorithm in the ML-SPL API Guide.
- You can now package and distribute custom algorithms as apps. See Package an algorithm for Splunkbase in the ML-SPL API Guide.
- The Splunk Machine Learning Toolkit has two new algorithms.
- For bug fixes, see Fixed issues.
Version 2.3.0
Features and improvements
- The Splunk Machine Learning Toolkit's custom search commands are now fully integrated with Splunk's knowledge object permissions. For more details, see Models.
- Entries in the "Load Existing Settings" tab are now unique per-user instead of being shared with all users. Entries created prior to version 2.3 will continue to be accessible by all users.
- Two new algorithms have been added:
- The Forecast Time Series Assistant now allows for the selection of the ARIMA forecasting algorithm. Additional panels have been added for inspecting properties unique to ARIMA models.
Version 2.2.1
This version contains bug fixes. See Fixed issues for details.
Version 2.2.0
Features and improvements
- The preprocessing feature has been redesigned and is offered in the Predict Numeric Fields, Predict Categorical Fields, and Clustering Numeric Events assistants. See Preprocessing for information.
- The ML-SPL API has been updated to make it easier for developers and partners to import custom algorithms in order to extend the capabilities of the Splunk Machine Learning Toolkit. See ML-SPL API Guide for information.
- A new video overview of what's new in versions 2.1.0 and 2.2.0 of the Splunk Machine Learning Toolkit.
Version 2.1.0
Features and improvements
Enhancements to the Detect Numeric Outliers assistant:
- You can now specify one or more fields to split by (up to 5). Specifying one or more split by fields enables you to see the values of the field you are analyzing grouped by the values of the split by fields in visualizations.
- Enhanced visualizations including a new Data Distribution histogram that shows the number of data points within the threshold and the number of data points outside the threshold.
For more information, see Detect Numeric Outliers.
Version 2.0.1
The Downsampled Line Chart custom visualization now supports the same drilldown actions as the built-in Line Chart visualization.
Version 2.0.0
Features and improvements
- The app has been renamed to "Machine Learning Toolkit."
- New Cluster Numeric Events assistant that steps you through how to perform clustering on your own data. This assistant includes the ability to preprocess data by applying StandardScaler, PCA, or KernelPCA methods. See Cluster Numeric Events.
- Updated examples for the Cluster Numeric Events showcase.
- A
streaming_apply
setting has been added to themlspl.conf
file, which allows you to run theapply
command on your indexers. For details, see Use your indexers to apply models. - The Predict Numeric Fields and Predict Categorical Fields assistants now support multiple algorithms.
- A new visualization type has been added: Scatterplot matrix. This visualization is available in the Cluster Numeric Events assistant.
- The Machine Learning Toolkit app has a walk-through tour and each assistant has its own walk-through tour.
- A link to machine learning video tutorials has been added to the top menu bar and the Showcase page.
- Tooltips have been added for the fields in each of the assistants.
Algorithms
- The SGDClassifier algorithm is now supported. For details, see Algorithms.
- The SGDRegressor algorithm is now supported. For details, see Algorithms.
- The ARIMA algorithm is now supported. For details, see Algorithms.
- The LogisticRegression algorithm supports a new parameter
probabilities=<true|false>
. For details, see Algorithms. - Summary support has been added to the RandomForestClassifier and RandomForestRegressor algorithms. For details, see Algorithms.
- The BernoulliNB, GaussianNB, Birch, and StandardScaler algorithms support a new parameter
partial_fit=<true|false>
. For details, see Algorithms.
Version 1.3.0
Features and improvements
- You can now create alerts within the Machine Learning Toolkit from some of the panels in the assistants. Alerts can be viewed under Scheduled Jobs > Alerts.
- You can now schedule model training in the Predict Numeric Fields and Predict Categorical Fields assistants by clicking the icon on the right side of the Fit Model button.
- The Training/Test split can now be set to a 100/0 split (no split).
Version 1.2.0
Features and improvements
- The DecisionTreeClassifier and DecisionTreeRegressor algorithms are now supported. For details, see Algorithms.
- The Detect Numeric Outliers assistant now includes an Include current point checkbox to support the "current" parameter of the
streamstats
command. - The Predict Numeric Fields assistant has an improved Actual vs. Predicted Line Chart, which replaces the Actual vs. Predicted Overlay.
- Two macros in the Forecast Time Series assistant have been merged into one macro.
- The
max_features
parameter of the RandomForestClassifier and RandomForestRegressor algorithms now accepts values with the float data type. - The Remove from history confirmation dialog box has been improved.
- A basic framework has been implemented for displaying Bootstrap's modal dialog boxes in the Machine Learning Toolkit and Showcase UI.
Version 1.1.0
Features and improvements
- The visualizations in the Cluster Events showcase have been updated.
- The Predict Numeric Fields and Predict Categorical Fields assistants now allow you to enter wildcards in Fields to use for predicting. For example, to specify both the Packets Received and Packets Sent fields, enter "Packets*". Wildcards are case sensitive.
- The Select All and Select None buttons on the Predict Numeric Fields and Predict Categorical Fields assistants have been moved inside the dropdown list.
Algorithms
- The KernelRidge regression algorithm is now supported. For details, see Algorithms.
Bug fixes
The following bugs were fixed:
- Changing the time range or search mode on assistant search bars will now re-run the search in the search bar, the same as the default Search page in Splunk Enterprise.
- Custom visualizations will now display time stamps correctly when the event time differs from browser time.
- Caching issues have been fixed, and the app no longer loads old versions of resources after an update.
- Exit points in assistants now correctly have the same time range as that assistant's search bar
Version 1.0.0
This is the first release of the Machine Learning Toolkit and Showcase app.
Support for the Splunk Machine Learning Toolkit | Known issues |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 5.3.3
Feedback submitted, thanks!