Splunk® Machine Learning Toolkit

User Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Predict Categorical Fields

MLApp PredictCategoricalFields.png

The Predict Categorical Fields assistant displays a type of learning commonly known as classification, using the Logistic Regression algorithm. A classification algorithm learns the tendency for a data to belong to one category or the other based on related data.

Algorithm

  • Logistic Regression

Workflow

To predict categorical fields, you must fit and train a model. The basic steps are as follows:

  1. Enter a search to retrieve your data, then click the search button to run it.
  2. Select the categorical field you want to predict. This list of fields is populated by the search you just ran.
  3. Select a combination of fields you want to use for predicting the categorical field. This list contains all of the fields from your search except for the field you selected to predict.
  4. Specify how much of your data to use for training (fitting the data model) versus testing (validating the model afterwards). The data is divided randomly into two groups. The default split is 50/50.
  5. Name the model. This name and the settings you select are saved in the history.
  6. Click Fit Model.

Interpret and validate

After you fit the model, review the prediction results and visualizations to see how well the model predicted the categorical field. In this analysis, metrics are related to misclassifying the field, and are based on false positives and negatives, and true positives and negatives.

  • Precision: Displays the percentage of the time a predicted class is the correct class.
  • Recall: Displays the percentage of time that the correct class is predicted.
  • Accuracy: Displays the overall percentage of correct predictions.
  • F1: Displays a weighted average of precision and recall, where 1 is great and 0 is bad.
  • Classification Results: Displays a chart of actual results against predicted results, also known as a Confusion Matrix.
Interpretation: The shaded diagonal numbers should be high (closer to 100%), while the other numbers are better when closer to 0.

Refine the model

After you have validated the model, the way to refine the model is by adjusting which fields you use to predict the categorical field and fit the model again:

  • Remove fields that might generate a distraction.
  • Try adding more fields. In the Load Existing Settings tab, which displays a history of models you have fitted, sort by the statistics to see which combination of fields yielded the best results.

Deploy the model

Once you have validated and refined a model and are satisfied with it, you can take the following actions:

  • Click the icon in the right part of the Fit Model button to schedule model training. You can set up a regular interval to fit the model, such as every week. After saving the schedule, you can access it from the Scheduled Jobs > Scheduled Training menu.
  • Click the Open in Search button next to the Fit Model button to open a new Search tab, filled out with a search query that uses all data (not just the training set).
  • Click the Show SPL button next to the Open in Search button to see the search query that was used to fit the model. For example, you could use this same query on a different data set.
  • Click the Schedule Alert button beneath the Prediction Results table to trigger an alert when the predicted value meets a threshold you specify. After you save the alert, you can access it from the Scheduled Jobs > Alerts menu.
Last modified on 01 September, 2016
Predict Numeric Fields   Detect Numeric Outliers

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 1.3.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters