The basic process of machine learning

Machine learning is a process for generalizing from examples. You can use these generalizations, typically called models, to perform a variety of tasks such as predicting the value of a field, forecasting future values, or detecting anomalies.

1. Clean and transform your data

Many analytics have explicit requirements, such as timestamps or fields with numeric values. Other analytics perform better when the data meets certain criteria, such as the values of a field not being arbitrarily large. Before applying machine learning, you should clean or transform your data to meet these explicit requirements. Later, after fitting a machine learning model and validating it, you might need to apply additional transformations to your data to improve the quality of the model.

2. Fit the model

Using training data, fit a model that is appropriate for the task at hand. The examples in the Machine Learning Toolkit feature the following algorithms for fitting the following models:

The Linear regression algorithm predicts a numeric field.
The Logistic regression algorithm predicts a categorical field.
The Distribution statistics algorithm finds values that are far from previous values.
The Probabilistic measures algorithm finds events that contain unusual combinations of values.
The State-space Method using Kalman Filter algorithm predicts likely future values from past values of a numerical time series.

For a list of all supported algorithms, see Algorithms.

3. Validate the model

Validation involves training a model with a portion of your data (the training set) and then testing it with a different portion (the test set). For prediction tasks, validation often involves randomly partitioning events into one set or the other. For forecasting tasks, the training set is some prefix of the data and the test set is a suffix of the data that is withheld to compare against the forecasts.

Validating a trained model with the test set can be performed in a number of ways, depending on the type of model. Each assistant provides a few methods in the Validate section, which is displayed after you have trained a model.

For example, when predicting a numeric field, we are primarily interested in how much predictions vary from the actual values. The Predict Numeric Fields assistant provides a number of validation methods, including statistical methods such as root mean squared error, and visual methods such as a scatterplot. For other analytics or other applications of those analytics, other methods might be more appropriate.

4. Refine the model

If the validation step reveals weaknesses in your model or if the error is too high, you can adjust the parameters to try to improve the relevant metrics. To review a history of the parameter combinations you have tried along with their corresponding validation metrics, click the Load Existing Settings tab in any assistant.

5. Deploy the model

A model is ready to be deployed after passing the validation step. You can deploy a model in different ways, depending on its type and the intended application. The Deploy section of each assistant suggests common deployment actions for that type of model.

Generally, these deployment actions fall into the following categories:

Make predictions or forecasts.

The prediction or forecast made by a model might be useful directly or as the input to another analytic.

Detect outliers and anomalies.

A model encodes expectations. When the model's predictions or forecasts do not match reality, especially when they otherwise usually do, the result is an anomaly. Detecting outliers and other anomalies in data is an extremely common use case for machine learning.

Trigger or inform an action.

Both of the above applications are ultimately aimed at taking action. For example, an anomaly might trigger an alert, which leads to a response, which leads to a fix. Similarly, a forecast (such as earnings) might be used in a strategic business discussion.

Related answers from Splunk Community

The basic process of machine learning

1. Clean and transform your data

2. Fit the model

3. Validate the model

4. Refine the model

5. Deploy the model

Comments

The basic process of machine learning

Was this topic useful?