Share data in the Machine Learning Toolkit

When the Machine Learning Toolkit is deployed on Splunk Enterprise, the Splunk platform sends anonymized usage data to Splunk Inc. ("Splunk") to help improve the MLTK in future releases. For information about how to opt in or out, and how the data is collected, stored, and governed, see Share data in Splunk Enterprise.

What data is collected

The Splunk Machine Learning Toolkit collects the following basic usage information:

Component	Description	Example
`algo_name`	Name of algorithm used in `fit` or `apply`.	{ "algo_name": "StandardScaler" }
`apply_time`	Time the `apply` command took.	{ 'apply_time': 0.005 }
`app_context`	Name of the app from which search is run.	{ "app_context": "Splunk_ML_Toolkit" }
`columns`	The number of columns being run through `fit` command.	{ "columns": 10 }
`command`	`fit` or `apply`	{ "command":"apply" }
`csv_parse_time`	CSV parse time.	{ "csv_parse_time": 0.019296 }
`csv_read_time`	CSV read time.	{ "csv_read_time": 0.019296 }
`csv_render_time`	CSV render time.	{ "csv_render_time" : 0.01162 }
`example_name`	Name of the Showcase example being run.	{ 'example_name': "'Predict Server Power Consumption'" }
`experiment_id`	ID of the `fit` and `apply` run on the Experiments page. All preprocessing steps and final `fit` have the same ID.	{ "experiment_id": "6c47bca2776d4b6cb82685461d918180" }
`fit_time`	Amount of time it took to run the `fit` command.	{ "fit_time": 39.87447 }
`full_punct`	The punct of the data during `fit` or `apply`.	{ "full_punct": [ ...s-s-s[//:::.s-]s"s/-/////.s/."sss"://:/-//@:///-."s"/.s(;sssss)s/.s(,ss)s/...s/."s-ss ] }
`handle_time`	Time for the handler to handle the data.	{ "handle_time": 0.274072 }
`num_fields`	Total number of fields.	{ "num_fields": 4 } }
`num_fields_fs`	Number of fields that have the `fs` for Field Selector prefix.	{ "num_fields_fs": 9 }
`num_fields_PC`	Number of fields that have the `PC` for preprocessed prefix.	{ "num_fields_PC": 70 }
`num_fields_prefixed`	Total number of preprocessed fields.	{ "num_fields_prefixed": 28 }
`num_fields_RS`	Number of fields that have the `RS` for Robust Scaler prefix.	{ "num_fields_RS": 17 }
`num_fields_SS`	Number of fields that have the `SS` for Standard Scaler prefix.	{ "num_fields_SS": 30 }
`num_fields_tfidf`	Number of fields that have used term frequency-inverse document frequency preprocessing.	{ "num_fields_tfidf": 9 }
`orig_sourcetype`	The original sourcetype of the machine data.	{ "orig_sourcetype" : "access_combined_wcookie" }
`params`	Optional parameters used in `fit` step.	{ "params": "{{\"with_std\": \"true\", \"with_mean\": \"true\"}}" }
`PID`	Process identifer associated with the command.	{ "PID" : 63654 }
`pipeline_stage`	Each preprocessing step on the Experiments page is assigned a number starting from 0. This helps determine the order of the preprocessing steps and length of the pipeline.	{ "pipeline_stage": 0 }
`rows`	The number of rows being run through `fit` command.	{ 'rows': 15627 }
`UUID`	Universally unique identifier associated with command. This is 128-bit and used to keep each `fit`/`apply` unique.	{ "UUID": "7e0828e7-3059-4a43-8419-acc0e81f2f2d" }

Related answers from Splunk Community

Share data in the Machine Learning Toolkit

What data is collected

Comments

Share data in the Machine Learning Toolkit

Was this topic useful?