Splunk® Machine Learning Toolkit

User Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Predict Categorical Fields Classic Assistant

Classic Assistants enable machine learning through a guided user interface. The Predict Categorical Fields Classic Assistant displays a type of learning known as classification. A classification algorithm learns the tendency for data to belong to one category or another based on related data.

The following classification table shows the actual state of the field versus predicted state of the field. The yellow bar highlights an incorrect prediction.

This classification table shows the actual state versus predicted state of the field.

Algorithms

The Predict Categorical Fields assistant uses the following classification algorithms to predict fields:

Create a model to predict a categorical field

Before you begin

  • The Predict Numeric Fields Assistant offers the option to preprocess your data. Read up on the preprocessing algorithms available here: Preprocessing machine data.
  • The toolkit default selects the Logistic Regression algorithm. Use this default if you aren't sure which one is best for you. Read up on the other algorithm options here: Algorithms.

Workflow

Follow these steps for the Predict Categorical Fields Classic Assistant.

  1. From the MLTK navigation bar select Classic > Assistants > Predict Categorical Fields.
  2. Run a search, and be sure to select a date range.
  3. (Optional) Click + Add a step to add preprocessing steps.
  4. Select an algorithm from the Algorithm drop-down menu.
  5. Select a target field from the drop-down menu Field to predict.
    When you select the Field to predict, the Fields to use for predicting drop-down populates with available fields to include in your model.
  6. Select a combination of fields from the drop-down menu Fields to use for predicting.
  7. Split your data into training and testing data. The default split is 50/50, and the data is divided randomly into two groups.

    The algorithm selected determines the fields available to build your model. Hover over any field name to get more information about that field

  8. Type the name the model in Save the model as field.
    You must specify a name for the model in order to fit a model on a schedule or schedule an alert.
  9. Click Fit Model.

Interpret and validate

After you fit the model, review the prediction results and visualizations to see how well the model predicted the categorical field. In this analysis, metrics are related to mis-classifying the field, and are based on false positives and negatives, and true positives and negatives.

Result Application
Precision This statistic is the percentage of the time a predicted class is the correct class.
Recall This statistic is the percentage of time that the correct class is predicted.
Accuracy This statistic is the overall percentage of correct predictions.
F1 This statistic is the the weighted average of precision and recall, based on a scale from zero to one. The closer the statistic is to one, the better the fit of the model.
Classification Results (Confusion Matrix) This table charts the number of actual results against predicted results, also known as a Confusion Matrix. The shaded diagonal numbers should be high (closer to 100%), while the other numbers should be closer to 0.

Refine the model

After you validate the model, refine the model and run the fit command again.

Consider trying the following:

  1. Reduce the number of fields selected in the Fields to use for predicting drop-down menu. Having too many fields can generate a distraction.
  2. Bring in new data sources to enrich your modeling space.
  3. Build features on raw data, model on behaviors of the data instead of raw data points, using SPL. Streamstats, eventstats, etc.
  4. Check your fields - are you using categorical values correctly? For example are you using DayOfWeek as a number (0 to 6) instead of "Monday", "Tuesday" , etc ? Make sure you have the right type of value.
  5. Bring in context via lookups - holidays, external anomalies, etc.
  6. Increase the number of fields ( from additional data, feature building as above,etc) selected in the Fields to use for predicting drop-down menu.

Deploy the model

After you validate and refine the model, you can deploy the model.

Within the classic assistant framework

  1. Click the Schedule Training button to the right of Fit Model to schedule model training.

This open a new modal/ window overlay with fields to fill out including Report title, time range and trigger actions. You can set up a regular interval to fit the model.

Outside the Classic Assistant framework

  1. Click Open in Search to to generate a New Search tab for this same dataset. This new search will open in a new browser tab, away from the Classic Assistant.
    This search query that uses all data, not just the training set. You can adjust the SPL directly and see results immediately. You can also save the query as a Report, Dashboard Panel or Alert.
  2. Click Show SPL to generate a new window showing the search query that was used to fit the model. Copy the SPL here for use in other aspects of your Splunk instance.
  3. Click Schedule Alert to set up an alert that is triggered when the predicted value meets a threshold you specify.

Once you navigate away from the Classic Assistant page, you cannot return to it through the Classic or Models tabs. Classic Assistants are great for generating SPL, but may not be ideal for longer-term projects.

For more information about alerts, see Getting started with alerts in the Splunk Enterprise Alerting Manual.

Last modified on 04 October, 2018
Predict Numeric Fields Classic Assistant   Detect Numeric Outliers Classic Assistant

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 3.4.0, 4.0.0, 4.1.0, 4.2.0, 4.3.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters