Configure the fit and apply commands
You can configure the fit
and apply
commands by setting properties in the mlspl.conf
configuration file located in the default directory:
$SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/default/mlspl.conf
In this file, you can specify default settings for all algorithms, or for an individual algorithm. To apply global settings, use the [default] stanza and algorithm-specific settings in a stanza named for the algorithm, for example, [LinearRegression] for the LinearRegression algorithm. Be aware that not all global settings can be set or overwritten in an algorithm-specific section. For details, see How to copy and edit a configuration file.
To avoid losing your configuration file changes when you upgrade the app, create a copy of the mlspl.conf
file with only the modified stanzas and settings, then save it to $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/local/
Setting | Default | Description |
---|---|---|
max_inputs |
100000 | The maximum number of events an algorithm considers when fitting a model. If this limit is exceeded and use_sampling is true, the fit command downsamples its input using the Reservoir Sampling algorithm before fitting a model. If use_sampling is false and this limit is exceeded, the fit command throws an error.
|
use_sampling |
true | Indicates whether to use Reservoir Sampling for data sets that exceed max_inputs or to instead throw an error.
|
max_fit_time |
600 | The maximum time, in seconds, to spend in the "fit" phase of an algorithm. This setting does not relate to the other phases of a search such as retrieving events from an index. |
max_memory_usage_mb |
1000 | The maximum allowed memory usage, in megabytes, by the fit command while fitting a model.
|
max_model_size_mb |
15 | The maximum allowed size of a model, in megabytes, created by the fit command. Some algorithms (e.g. SVM and RandomForest) might create unusually large models, which can lead to performance problems with bundle replication.
|
max_distinct_cat_values |
100 | The maximum number of distinct values in a categorical feature field, or input field, that will be used in one-hot encoding. One-hot encoding is when you convert categorical values to numeric values. If the number of distinct values exceeds this limit, the field will be dropped, or excluded from analysis, and a warning appears. |
max_distinct_cat_values_for_classifiers |
100 | The maximum number of distinct values in a categorical field that is the target, or output, variable in a classifier algorithm. |
Configure permissions for ML-SPL commands | Algorithms |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 3.1.0, 3.2.0, 3.3.0, 3.4.0
Feedback submitted, thanks!