**here**for the latest version.

# Search commands for machine learning

The Splunk Machine Learning Toolkit contains several custom search commands, referred to as ML-SPL commands, that implement classic machine learning and statistical learning tasks:

`fit`

: Fit and apply a machine learning model to search results.`apply`

: Apply a machine learning model that was learned using the`fit`

command.`summary`

: Return a summary of a machine learning model that was learned using the`fit`

command.`listmodels`

: Return a list of machine learning models that were learned using the`fit`

command.`deletemodel`

: Delete a machine learning model that was learned using the`fit`

command.`sample`

: Randomly sample or partition events.`score`

: Run statistical tests to validate model outcomes.

You can use these custom search commands on any Splunk platform instance on which the Splunk Machine Learning Toolkit is installed.

Download the Machine Learning Toolkit Quick Reference Guide (also available in Japanese) for a handy cheat sheet of ML-SPL commands and machine learning algorithms used in the Splunk Machine Learning Toolkit.

## fit

Use the `fit`

command to fit and apply a machine learning model to search results.

**Supervised Syntax**

fit <algorithm> (<option_name>=<option_value>)* (<response-field>) from (<explanatory-field>)+

**Unsupervised Syntax**

fit <algorithm> (<option_name>=<option_value>)* (from)? (<explanatory-field>)+

The first argument, which is required, is the algorithm to use. There are a number of available algorithms, which are documented here: Algorithms

All algorithms require a list of fields to use when learning a model. For classification and regression algorithms, follow the response field with the from keyword. Subsequent fields are the fields to use when making predictions (i.e. explanatory fields). The from keyword separates the response field from the explanatory fields. When using unsupervised algorithms, there is no response field, and the from keyword is optional. In the unsupervised case, all fields listed will be treated as explanatory fields when training the model (i.e. for clustering, which fields to cluster over).

Use the `as`

keyword to rename the field added to search results by the model.

Use the `into`

keyword to store the learned model in an artifact that can later be applied to new search results with the `apply`

command. Not all algorithms support saved models.

Some algorithms support options that can be given as *name* = *value* arguments. For example, KMeans and PCA both support a `k`

option that specifies how many clusters or how many principal components to learn.

You can also configure the `fit`

command. See Configure the fit and apply commands.

**Examples**

Fit a LinearRegression model to predict `errors`

using `_time`

:

... | fit LinearRegression errors from _time

Fit a LinearRegression model to predict `errors`

using `_time`

and save it into a model named `errors_over_time`

:

... | fit LinearRegression errors from _time into errors_over_time

Fit a LogisticRegression model to predict a categorical response from numerical measurements:

... | fit LogisticRegression species from petal_length petal_width sepal_length sepal_width

## apply

Use the `apply`

command to compute predictions for the current search results based on a model that was learned by the `fit`

command. The `apply`

command can be used on different search results than those used when fitting the model, but the results should have an identical list of fields.

You can also configure the `apply`

command. See Configure the fit and apply commands.

**Syntax**

apply <model_name> (as <output_field>)?

Use the `as`

keyword to rename the field added to search results by the model.

**Examples**

Apply a learned LinearRegression model, "errors_over_time":

... | apply errors_over_time

Rename the output of the model to "predicted_errors":

... | apply errors_over_time as predicted_errors

## summary

Use the `summary`

command to return a summary of a machine learning model that was learned using the `fit`

command. The summary is algorithm specific. For example, the summary for the LinearRegression algorithm is a list of coefficients. The summary for the LogisticRegression algorithm is a list of coefficients for each class.

**Syntax**

summary <model_name>

**Examples**

Inspect a learned LinearRegression model "errors_over_time":

| summary errors_over_time

## listmodels

Use the `listmodels`

command to return a list of machine learning models that were learned using the `fit`

command. The algorithm and arguments given when `fit`

was invoked are displayed for each model.

**Syntax**

listmodels

**Examples**

List all models:

| listmodels

## deletemodel

Use the `deletemodel`

command to delete a machine learning model learned using the `fit`

command.

**Syntax**

deletemodel <model_name>

**Examples**

Delete the "errors_over_time" model:

| deletemodel errors_over_time

## sample

Use the `sample`

command to randomly sample or partition events.

Sampling modes:

`ratio`

: A float between 0 and 1 indicating the probability as a percentage that each event has of being included in the result set. For example, a ratio of 0.01 means that events have a 1% probability of being included in the results. Use`ratio`

when you want an approximation.`count`

: A number that indicates the exact number of randomly-chosen events to return. If the sample count exceeds the total number of events in the search, all events are returned.`proportional`

: The name of a numeric field to use to determine the sampling probability of each event, which yields a biased sampling. Each event is sampled with a probability specified by this field value.

You can omit the `ratio`

keyword, for example use `| sample ratio=0.01`

or `| sample 0.01`

.

You can omit the `count`

keyword, for example use `| sample count=10`

or `| sample 10`

.

Partitioning mode:

`partitions`

: The number of partitions in which to randomly divide events, approximately split. Use`partitions`

when you want to divide your results into groups for different purposes, such as using results for testing and training.

Additional options:

`seed`

: A number that specifies a random seed. Using`seed`

ensures reproducible results. If unspecified, a pseudorandom value is used.`by <field>`

: Used with`count`

. Specifies a field by which to split events, returning the`count`

number of events for each value of the specified field. If there are more events than`count`

, all events are included in the results.`inverse`

: Used with`proportional`

. Inverts the probability, returning samples with one minus the probability specified in the proportional field.`fieldname`

: The name of the field in which to store the partition number. Defaults to`partition_number`

.

This `sample`

command is not identical to using sampling options on the **Event Sampling** menu on the Search page in Splunk Web:

- Options from the
**Event Sampling**menu perform sampling before the data is collected from indexes, at the beginning of the search pipeline. - The
`sample`

command is applied after data is collected, accessing everything in the search pipeline.

Using the **Event Sampling** menu options is faster, but the `sample`

command can be used anywhere in the search command and provides several modes that are not available to the **Event Sampling** feature. For example, the `sample`

command supports partitioning, biased sampling, and the ability to retrieve an exact number of results.

**Syntax**

sample (ratio=<float between 0 and 1>)? (count=<positive integer>)? (proportional=<name of numeric field> (inverse)?)? (partitions=<natural number greater than 1> (fieldname=<string>)?)? (seed=<number>)? (by <split_by_field>)?

**Examples**

Retrieve approximately 1% of events at random:

... | sample ratio=0.01 ... | sample 0.01

Retrieve exactly 20 events at random:

... | sample count=20 ... | sample 20

Retrieve exactly 20 events at random from each host:

... | sample count=20 by host

Return each event with a probability determined by the value of "some_field":

... | sample proportional="some_field"

Partition events into 7 groups, with the chosen group returned in a field called "partition_number":

... | sample partitions=7 fieldname="partition_number"

## score

The `score`

command runs statistical tests to validate model outcomes. Use the score command to validate models and statistical tests for any use case.

**Syntax**

... | score <scoring_method> (<field> | <selector>=<field>)* [as <outputfield>] [by <splitfield>] [<arg> | <param>=<value>]* | ...

**Example**

... | score confusion_matrix true="species" pred="predicted(species)"

The Splunk Machine Learning Toolkit includes the following classes of the `score`

command, each with their own sets of methods (ie. Accuracy, F1-score, T-test etc):

- Classification
- Clustering scoring
- Pairwise distances scoring
- Regression scoring
- Statistical functions (statsfunctions)
- Statistical testing (statstest)

Score commands are not customizable within the Splunk Machine Learning Toolkit.

The Splunk Machine Learning Toolkit also helps you test for model overfitting through the K-fold scoring option.

PREVIOUS Splunk Machine Learning Toolkit workflow |
NEXT Configure permissions for ML-SPL commands |

This documentation applies to the following versions of Splunk^{®} Machine Learning Toolkit:
4.1.0, 4.2.0, 4.3.0

Feedback submitted, thanks!