Configure the fit and apply commands

You can configure the fit and apply commands by setting properties in the mlspl.conf configuration file located in the default directory:

$SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/default/mlspl.conf

In this file, you can specify default settings for all algorithms, or for an individual algorithm. To apply global settings, use the [default] stanza and algorithm-specific settings in a stanza named for the algorithm, for example, [LinearRegression] for the LinearRegression algorithm. Be aware that not all global settings can be set or overwritten in an algorithm-specific section. For details, see How to copy and edit a configuration file.

To avoid losing your configuration file changes when you upgrade the app, create a copy of the mlspl.conf file with only the modified stanzas and settings, then save it to $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/local/

Setting	Default	Description
`max_inputs`	100000	The maximum number of events an algorithm considers when fitting a model. If this limit is exceeded and `use_sampling` is true, the `fit` command downsamples its input using the Reservoir Sampling algorithm before fitting a model. If `use_sampling` is false and this limit is exceeded, the `fit` command throws an error.
`use_sampling`	true	Indicates whether to use Reservoir Sampling for data sets that exceed `max_inputs` or to instead throw an error.
`max_fit_time`	600	The maximum time, in seconds, to spend in the "fit" phase of an algorithm. This setting does not relate to the other phases of a search such as retrieving events from an index.
`max_memory_usage_mb`	1000	The maximum allowed memory usage, in megabytes, by the `fit` command while fitting a model.
`max_model_size_mb`	15	The maximum allowed size of a model, in megabytes, created by the `fit` command. Some algorithms (e.g. SVM and RandomForest) might create unusually large models, which can lead to performance problems with bundle replication.
`max_distinct_cat_values`	100	The maximum number of distinct values in a categorical feature field, or input field, that will be used in one-hot encoding. One-hot encoding is when you convert categorical values to numeric values. If the number of distinct values exceeds this limit, the field will be dropped, or excluded from analysis, and a warning appears.
`max_distinct_cat_values_for_classifiers`	100	The maximum number of distinct values in a categorical field that is the target, or output, variable in a classifier algorithm.

Related answers from Splunk Community

Configure the fit and apply commands

Comments

Was this topic useful?