Configure the fit and apply commands

You can configure the fit and apply commands by setting properties in the mlspl.conf configuration file located in the default directory:


In this file, you can specify default settings for all algorithms, or for an individual algorithm. To apply global settings, use the [default] stanza and algorithm-specific settings in a stanza named for the algorithm, for example, [LinearRegression] for the LinearRegression algorithm. Be aware that not all global settings can be set or overwritten in an algorithm-specific section. For details, see How to copy and edit a configuration file.

To avoid losing your configuration file changes when you upgrade the app, create a copy of the mlspl.conf file with only the modified stanzas and settings, then save it to $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/local/

Setting Default Description
max_inputs 100000 The maximum number of events an algorithm considers when fitting a model. If this limit is exceeded and use_sampling is true, the fit command downsamples its input using the Reservoir Sampling algorithm before fitting a model. If use_sampling is false and this limit is exceeded, the fit command throws an error.
use_sampling true Indicates whether to use Reservoir Sampling for data sets that exceed max_inputs or to instead throw an error.
max_fit_time 600 The maximum time, in seconds, to spend in the "fit" phase of an algorithm. This setting does not relate to the other phases of a search such as retrieving events from an index.
max_memory_usage_mb 1000 The maximum allowed memory usage, in megabytes, by the fit command while fitting a model.
max_model_size_mb 15 The maximum allowed size of a model, in megabytes, created by the fit command. Some algorithms (e.g. SVM and RandomForest) might create unusually large models, which can lead to performance problems with bundle replication.
max_distinct_cat_values 100 The maximum number of distinct values in a categorical feature field, or input field, that will be used in one-hot encoding. One-hot encoding is when you convert categorical values to numeric values. If the number of distinct values exceeds this limit, the field will be dropped, or excluded from analysis, and a warning appears.
max_distinct_cat_values_for_classifiers 100 The maximum number of distinct values in a categorical field that is the target, or output, variable in a classifier algorithm.
