Predict Categorical Fields Experiment workflow
Experiments manage the data source, algorithm, and the parameters to configure that algorithm within one framework. An Experiment is an exclusive knowledge object in the Splunk platform that keeps track of its settings and history, as well as its affiliated alerts and scheduled trainings. Experiment Assistants enable machine learning through a guided user interface.
The Predict Categorical Fields Assistant displays a type of learning known as classification. A classification algorithm learns the tendency for data to belong to one category or another based on related data.
The following classification table shows the actual state of the field versus predicted state of the field. The yellow bar highlights an incorrect prediction.
Algorithms
The Predict Categorical Fields assistant uses the following classification algorithms:
Create a model to predict a categorical field
Before you begin
- The Predict Categorical Fields Assistant offers the option to preprocess your data. Read up on the preprocessing algorithms available here: Preprocessing machine data.
- The toolkit default selects the Logistic Regression algorithm. Use this default if you aren't sure which one is best for you. Read up on the other algorithm options here: Algorithms.
Workflow
Follow these steps to create a Predict Categorical Fields Experiment.
- From the MLTK navigation bar, click Experiments.
- If this is the first experiment in the toolkit, you will land on a display screen of all six Assistants. Select the Predict Categorical Fields block.
- If you have at least one experiment in the toolkit, you will land on a list view of experiments. Click the Create New Experiment button.
- Fill in an Experiment Title, and (optionally) add a description. Both the name and description can be edited later if needed.
- Click Create.
- Run a search and be sure to select a date range.
- (Optional) Click + Add a step to add preprocessing steps.
- Select an algorithm from the
Algorithm
drop-down menu. - Select a target field from the drop-down menu
Field to Predict
.
When you select theField to predict
, theFields to use for predicting
drop-down menu populates with available fields to include in your model. - Select a combination of fields from the drop-down menu
Fields to use for predicting
. - Use the slider bar to split your data into training and testing data. The default split is 50/50, and the data is divided randomly into two groups.
- (Optional) Add notes to this experiment. This free form block of text can be used to track the selections made in the fields above. Refer back to notes to review which parameter combinations yield the best results.
The algorithm you select determines the fields available to build your model. Hover over any field name to get more information about that field.
- Click Fit Model. The experiment is now in a Draft state.
Draft versions allow you to alter settings without committing or overwriting a saved Experiment. An Experiment is not stored to Splunk until it is saved.
The following table explains the differences between a draft and a saved Experiment.Action Draft Experiment Saved Experiment Create new record in Experiment history Yes No Run Experiment search jobs Yes No (As applicable) Save and update Experiment model No Yes (As applicable) Update all Experiment alerts No Yes (As applicable) Update Experiment scheduled trainings No Yes
Interpret and validate
After you fit the model, review the prediction results and visualizations to see how well the model predicted the categorical field. In this analysis, metrics are related to mis-classifying the field, and are based on false positives and negatives, and true positives and negatives. You can use the following methods to evaluate your predictions:
Result | Application |
---|---|
Precision | This statistic is the percentage of the time a predicted class is the correct class. |
Recall | This statistic is the percentage of time that the correct class is predicted. |
Accuracy | This statistic is the overall percentage of correct predictions. |
F1 | This statistic is the the weighted average of precision and recall, based on a scale from zero to one. The closer the statistic is to one, the better the fit of the model. |
Classification Results (Confusion Matrix) | This table charts the number of actual results against predicted results, also known as a Confusion Matrix. The shaded diagonal numbers should be high (closer to 100%), while the other numbers should be closer to 0. |
Refine the model
After you validate the model, refine the model and run the fit
command again. Optionally choose to track your changes in the Notes text field.
Consider trying the following:
- Reduce the number of fields selected in the
Fields to use for predicting
drop-down menu. Having too many fields can generate a distraction. - Increase the number of fields selected in the
Fields to use for predicting
drop-down menu.
Use the Experiment History tab to review settings and changes made as you refine the model.
Save the model
Once you are happy with the results of your Experiment, save it. Saving your Experiment will result in the following:
- Assistant settings saved as an Experiment knowledge object.
- (As applicable) Draft model updated to an Experiment model.
- (As applicable) Affiliated scheduled trainings and alerts update to synchronize with the search SPL and trigger conditions.
You can load a saved Experiment by clicking the Experiment name.
Deploy the model and manage Experiments
After you validate, refine and save the model, you can deploy the model.
Within the Experiment framework:
From within the framework, you can both manage and publish your Experiments. To manage your Experiment, perform the following steps:
- From the MLTK navigation bar, choose Experiments. A list of your saved experiments populates.
- Click the Manage button available under the Actions column.
The toolkit supports the following Experiment management options:
- Create Experiment-level alerts.
If you make changes to the saved Experiment you may impact affiliated alerts. Re-validate your alerts once you complete the changes.
For more information about alerts, see Getting started with alerts in the Splunk Enterprise Alerting Manual. - Edit the title (name) and description of the Experiment.
- (As applicable) Manage alerts for a single Experiment.
- (As applicable) Schedule a training job for an Experiment.
- Delete an Experiment.
To publish your Experiment, perform the following steps:
- From the MLTK navigation bar, choose Experiments. A list of saved Experiments populates.
- Click the Publish button available under the Actions column.
Publishing an experiment model means the main model and any associated preprocessing models will be copied as lookup files in the user’s namespace within a selected destination app. Published models can be used to create alerts or schedule model trainings.The Publish link will only show if you have created the experiment, and fit the model.
- Give the model a title. It must start with letter or underscore, and only have letters, numbers, and underscores in the name.
- Select the destination app.
- Click Save.
- A message will let you know whether the model has been published, or why the action was not completed.
Experiments are always stored under the user's namespace, so changing sharing settings and permissions on Experiments is not supported.
Outside the Experiment framework:
- Click Open in Search to generate a New Search tab for this same dataset. This new search opens in a new browser tab, away from the Assistant.
This search query uses all data, not just the training set. You can adjust the SPL directly and see the results immediately. You can also save the query as a Report, Dashboard Panel or Alert. - Click Show SPL to generate a new modal window/ overlay showing the search query you used to fit the model. Copy the SPL to use in other aspects of your Splunk instance.
Predict Numeric Fields Experiment workflow | Detect Numeric Outliers Experiment workflow |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 3.4.0
Feedback submitted, thanks!