Test a predictive model in ITSI

Always evaluate a model to determine if it will do a good job predicting future health scores. When you assess the quality of a predictive model, you determine what will happen if you use that model to make predictions. That is, whether the predictions will be close to the eventual outcomes.

ITSI provides several industry-standard metrics and insights to review the predictive accuracy of your model. Although ITSI provides a recommendation based on the best-performing algorithm, it is important to review the visualizations to decide if your model is performing well enough to match your business requirements.

Select a model to test under Test a Model. A checkmark indicates which regression model had the best overall performance. For more information about the difference between regression and classification algorithms, see Choose an algorithm type.

Use these questions to help you choose the best model:

Which model had the best performance on the test set? ("recommended" model)
Does the model perform well across various performance metrics?

Do not change the Test Period unless you want to test on all of the available data. Leaving the data in separate train and test partitions, as configured in the training/test split, provides an honest assessment of the model's performance.

After you select the best model for your needs, click Save to save the model into your service. You can now use the model to perform root cause analysis in the Predictive Analytics dashboard.

It is advised to only save one service at a time when creating models. Saving multiple models in separate windows might cause an error.

Test a regression model

Use the following metrics to test and evaluate a regression model:

Value	Description
R²	R² represents how well the model explains the variability of the result. 100 percent (a value of 1) means the model fits perfectly. The closer the value is to 1, the better the result.
RMSE	Root Mean Squared Error (RMSE) is a measure of the difference between values predicted by the model and the values observed. RMSE explains the variability of the result, which is essentially the standard deviation of the residual. This value gives you an idea of how close or far the model is from completely accurate predictions. These values only make sense within one dataset. Do not compare them across datasets.
Actual vs. Predicted Service Health Score	A sequential overlay that compares the actual service health score to what the model predicted the number will be 30 minutes into the future.
Residual Error Histogram	A histogram of the residual error of each service health score during the test period. The residual error is the difference between the observed service health score and the estimated score that the model predicted. A graph that is normally distributed with values closer to 0 is most accurate.

Test a classification model

Because classification divides your data into separate categories, your data must be highly variable for logistic regression to work. If your data is fairly stable and usually hovers around the same health score, logistic regression cannot divide it into three distinct categories.

Use the following metrics to test and evaluate a classification model:

Value	Description
Precision	The percentage of time a model identifies only the relevant data points. For example, of all predicted Severe data points, Precision measures how many of them are actually Severe. Precision is a good metric to use when the cost of a false positive is high. A false positive means that a health score that is Normal was actually predicted to be Severe. In this case, a user might waste important time investigating a predicted outage that never occurs.
Recall	The percentage of time a model predicts the correct severity. Recall is a good metric to use when there is a high cost associated with a false negative value. For example, if a Severe health score is predicted as Normal, your model might not detect a potential outage, and the consequence can be bad for your business.
Accuracy	The overall percentage of correct predictions from all predictions made. Sometimes it is desirable to select a model with a lower accuracy because it has a greater overall predictive power. In these cases it is more important that Precision, Recall, or F1 be high.
F1	The weighted average of Precision and Recall, based on a scale from zero to one. The closer the statistic is to one, the better the fit of the model. The F1 score is a better measure to use if you need to seek a balance between Precision and Recall.
Classification Results (Confusion Matrix)	The number of actual results against predicted results. The actual severities form the columns, and the predicted severities form the rows. The intersection of the rows and columns show the outcomes. The main diagonal (which generally has larger numbers) gives the correct predictions. That is, the cases where the actual values and the model predictions are the same. It is best if the diagonal numbers are closer to 100 percent of the total values, while the other numbers are closer to 0.

Related answers from Splunk Community

Test a predictive model in ITSI

Test a regression model

Test a classification model

See also

Comments

Test a predictive model in ITSI

Was this topic useful?