Splunk® Machine Learning Toolkit

User Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Share data in the Machine Learning Toolkit

When the Machine Learning Toolkit is deployed on Splunk Enterprise, the Splunk platform sends anonymized usage data to Splunk Inc. ("Splunk") to help improve the MLTK in future releases. For information about how to opt in or out, and how the data is collected, stored, and governed, see Share data in Splunk Enterprise.

What data is collected

The Splunk Machine Learning Toolkit collects the following basic usage information:

Component Description Example
algo_name Name of algorithm used in fit or apply.
{
  "algo_name": "StandardScaler"
}
apply_time Time the apply command took.
{
  'apply_time': 0.005
}
app_context Name of the app from which search is run.
{
 "app_context": "Splunk_ML_Toolkit"
}
columns The number of columns being run through fit command.
{
 "columns": 10
}
command fit or apply
{
  "command":"apply"
}
csv_parse_time CSV parse time.
{
  "csv_parse_time": 0.019296
}
csv_read_time CSV read time.
{
  "csv_read_time": 0.019296
}
csv_render_time CSV render time.
{
  "csv_render_time" : 0.01162
}
example_name Name of the Showcase example being run.
{
  'example_name': "'Predict Server Power Consumption'"
}
experiment_id ID of the fit and apply run on the Experiments page. All preprocessing steps and final fit have the same ID.
{
  "experiment_id": "6c47bca2776d4b6cb82685461d918180"
}
fit_time Amount of time it took to run the fit command.
{
  "fit_time": 39.87447
}
full_punct The punct of the data during fit or apply.
{
 "full_punct": [
...s-s-s[//:::.s-]s"s/-/////.s/."sss"://:/-//@:///-."s"/.s(;sssss)s/.s(,ss)s/...s/."s-ss
]
}
handle_time Time for the handler to handle the data.
{
  "handle_time": 0.274072
}
num_fields Total number of fields.
{
  "num_fields": 4
  }
}
num_fields_fs Number of fields that have the fs for Field Selector prefix.
{
  "num_fields_fs": 9
}
num_fields_PC Number of fields that have the PC for preprocessed prefix.
{
  "num_fields_PC": 70
}
num_fields_prefixed Total number of preprocessed fields.
{
  "num_fields_prefixed": 28
}
num_fields_RS Number of fields that have the RS for Robust Scaler prefix.
{
  "num_fields_RS": 17
}
num_fields_SS Number of fields that have the SS for Standard Scaler prefix.
{
  "num_fields_SS": 30
}
num_fields_tfidf Number of fields that have used term frequency-inverse document frequency preprocessing.
{
  "num_fields_tfidf": 9
}
orig_sourcetype The original sourcetype of the machine data.
{
  "orig_sourcetype" : "access_combined_wcookie"
}
params Optional parameters used in fit step.
{
 "params": "{{\"with_std\": \"true\", \"with_mean\": \"true\"}}"
}
PID Process identifer associated with the command.
{
 "PID" : 63654
}
pipeline_stage Each preprocessing step on the Experiments page is assigned a number starting from 0. This helps determine the order of the preprocessing steps and length of the pipeline.
{
  "pipeline_stage": 0
}
rows The number of rows being run through fit command.
{
  'rows': 15627
}
UUID Universally unique identifier associated with command. This is 128-bit and used to keep each fit/apply unique.
{
 "UUID": "7e0828e7-3059-4a43-8419-acc0e81f2f2d"
}
Last modified on 18 March, 2020
Troubleshooting the deep dives   Learn more about the Machine Learning Toolkit

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.4.0, 4.4.1, 4.4.2, 4.5.0, 5.0.0, 5.1.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters