Predict Categorical Fields
The Predict Categorical Fields assistant displays a type of learning known as classification. A classification algorithm learns the tendency for data to belong to one category or another based on related data. The classification table below shows the actual state of the field versus predicted state of the field. The yellow bar highlights an incorrect prediction.
The Predict categorical Fields assistant uses the following classification algorithms:
Fit a model to predict a categorical field
- For information about preprocessing, see Preprocessing in the Splunk Machine Learning ML-SPL API Guide.
- If you are not sure which algorithm to choose, start with the default algorithm, Logistic Regression, or see Algorithms.
- Run a search.
- (Optional) Add preprocessing steps.
- Select the algorithm to use to predict field values.
- Select the categorical field you want to predict.
- Select a combination of fields you want to use to predict the categorical field.
- Specify how much of your data to use for training (fitting the data model) versus testing (validating the model afterwards).
- Fill out any additional fields required by the algorithm you selected.
- Enter a name in the Save the model as field. The model is saved when you click outside the field.
- Click Fit Model.
This list of fields is populated by the search you just ran.
This list contains all of the fields from your search except for the field you selected to predict.
The data is divided randomly into two groups. The default split is 50/50.
To get information about a field, hover over it to see a tooltip.
You must specify a name for the model in order to fit a model on a schedule or schedule an alert. You can find your model in the saved history.
Interpret and validate
After you fit the model, review the prediction results and visualizations to see how well the model predicted the categorical field. In this analysis, metrics are related to misclassifying the field, and are based on false positives and negatives, and true positives and negatives.
|Precision||This statistic is the percentage of the time a predicted class is the correct class.|
|Recall||This statistic is the percentage of time that the correct class is predicted.|
|Accuracy||This statistic is the overall percentage of correct predictions.|
|F1||This statistic is the the weighted average of precision and recall, based on a scale from zero to one. The closer the statistic is to one, the better the fit of the model.|
|Classification Results (Confusion Matrix)||This table charts the number of actual results against predicted results, also known as a Confusion Matrix. The shaded diagonal numbers should be high (closer to 100%), while the other numbers should be closer to 0.|
Refine the model
After you validate the model, you can refine the model by adjusting which fields you use to predict the categorical field and fit the model again:
- Remove fields that might generate a distraction.
- Try adding more fields. In the Load Existing Settings tab, which displays a history of models you have fitted, sort by the statistics to see which combination of fields yielded the best results.
Deploy the model
After you validate and refine the model, deploy it.
- Click the icon to the right of Fit Model to schedule model training.
- (Optional) To access it, click Scheduled Jobs > Scheduled Training in the menu.
- Click Open in Search to open a new Search tab.
- Click Show SPL to see the search query that was used to fit the model.
- Click the Schedule Alert to set up an alert that is triggered when the predicted value meets a threshold you specify.
- After you save the alert, you can access it from the Scheduled Jobs > Alerts menu. For more information about alerts, see Getting started with alerts in the Splunk Enterprise Alerting Manual.
You can set up a regular interval to fit the model, such as every week.
This shows you the search query that uses all data, not just the training set.
For example, you could use this same query on a different data set.
Predict Numeric Fields
Detect Numeric Outliers
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 2.4.0, 3.0.0, 3.1.0