Share data in the Machine Learning Toolkit

When the Machine Learning Toolkit is deployed on Splunk Enterprise, the Splunk platform sends aggregated usage data to Splunk Inc. ("Splunk") to help improve MLTK in future releases. For information about how to opt in or out, and how the data is collected, stored, and governed, see Share data in Splunk Enterprise.

What data is collected

The Splunk Machine Learning Toolkit collects the following basic usage information:

Component	Description	Example
`algo_name`	Name of algorithm used in `fit` or `apply`.	{ "algo_name": "StandardScaler" }
`app_context`	Name of the app from which search is run.	{ "app_context": "Splunk_ML_Toolkit" }
`apply_time`	Time the `apply` command took.	{ 'apply_time': 0.005 }
`app.session.Splunk_ML_Toolkit.changeSmartAssistantStep`	User progress through an MLTK Smart Assistant.	{ component: app.session.Splunk_ML_Toolkit.changeSmartAssistantStep data: { [-] app: Splunk_ML_Toolkit experiment_id: 63fb7afba756455d8056b5e547f8545f experimentType: smart_outlier_detection page: smart_outlier_detection previousStep: learn step: define } deploymentID: 88A80D96D80B30B6F48E3FF9A0B318 eventID: 7185ae51-04aa-2025-8a57-6e0340e50c46 experienceID: d914fba4-7ca1-4370-a123-3a03a01d2569 optInRequired: 3 timestamp: 1585251931 userID: 60749ba2789ec1eee0ada6a0b5680512460559541023017ad6f5b4a3b0172841 version: 4 visibility: anonymous,support }
`app.session.Splunk_ML_Toolkit.createExperiment`	User creating an MLTK Experiment.	{ component: app.session.Splunk_ML_Toolkit.createExperiment data: { app: Splunk_ML_Toolkit experiment_id: 09ca5db894894c86b20b083941acaae0 experimentType: smart_forecast page: experiments } deploymentID: 88A80D96D80B30B6F48E3FF9A0B318 eventID: 8318866b-f2f5-35a4-1348-b82486b3a41f experienceID: dfbde5b8-eb57-10a3-5ced-3be47f2b8ad2 optInRequired: 3 timestamp: 1583786919 userID: 60749ba2789ec1eee0ada6a0b5680512460559541023017ad6f5b4a3b0172841 version: 4 visibility: anonymous,support }
`app.session.Splunk_ML_Toolkit.createExperimentAlert`	Users creating alerts for MLTK Experiments.	{ component: app.session.Splunk_ML_Toolkit.createExperimentAlert data: { app: Splunk_ML_Toolkit experiment_id: 46221dd8661d420aaa988ca7d41821ae experimentType: smart_forecast page: experiments } deploymentID: 88A80D96D80B30B6F48E3FF9A0B318 eventID: 6bd85948-4f9b-ff9d-bf02-18defe062eec experienceID: f2c4f65b-a723-88af-875a-73737bbc9061 optInRequired: 3 timestamp: 1584480173 userID: 60749ba2789ec1eee0ada6a0b5680512460559541023017ad6f5b4a3b0172841 version: 3 visibility: anonymous,support }
`app.session.Splunk_ML_Toolkit.loadAssistant`	Number of times the user has loaded a MLTK Assistant.	{ component: app.session.Splunk_ML_Toolkit.loadAssistant data: { [-] app: Splunk_ML_Toolkit experiment_id: 6196da5dc78f4606925295ead869f023 experimentType: smart_clustering page: smart_clustering } deploymentID: 88A80D96D80B30B6F48E3FF9A0B318 eventID: 54e3887b-acf3-ba6c-7f4f-cef1373c4d99 experienceID: d914fba4-7ca1-4370-a123-3a03a01d2569 optInRequired: 3 timestamp: 1585270611 userID: 60749ba2789ec1eee0ada6a0b5680512460559541023017ad6f5b4a3b0172841 version: 4 visibility: anonymous,support }
`app.session.Splunk_ML_Toolkit.saveExperiment`	Users saving their work in MLTK Experiments.	{ component: app.session.Splunk_ML_Toolkit.saveExperiment data: { app: Splunk_ML_Toolkit experiment_id: 4f390e49096c43adb05feb29fe9bfbbc experimentType: smart_outlier_detection page: smart_outlier_detection } deploymentID: 88A80D96D80B30B6F48E3FF9A0B318 eventID: bdc34718-163c-56c0-3c7b-7d51380a258e experienceID: dfbde5b8-eb57-10a3-5ced-3be47f2b8ad2 optInRequired: 3 timestamp: 1583873964 userID: 60749ba2789ec1eee0ada6a0b5680512460559541023017ad6f5b4a3b0172841 version: 4 visibility: anonymous,support }
`app.session.Splunk_ML_Toolkit.scheduleExperimentTraining`	Users scheduling model re-training for MLTK Experiments.	{ component: app.session.Splunk_ML_Toolkit.scheduleExperimentTraining data: { app: Splunk_ML_Toolkit experiment_id: 46221dd8661d420aaa988ca7d41821ae experimentType: smart_forecast page: experiments scheduleEnabled: true } deploymentID: 88A80D96D80B30B6F48E3FF9A0B318 eventID: 629db0e3-0db1-0424-5e0d-f7e06e9965fb experienceID: f2c4f65b-a723-88af-875a-73737bbc9061 optInRequired: 3 timestamp: 1584480148 userID: 60749ba2789ec1eee0ada6a0b5680512460559541023017ad6f5b4a3b0172841 version: 3 visibility: anonymous,support }
`columns`	The number of columns being run through `fit` command.	{ "columns": 10 }
`command`	`fit`, `apply`, or `score`	{ "command":"fit" } { "command":"apply" } { "command":"score" }
`csv_parse_time`	CSV parse time.	{ "csv_parse_time": 0.019296 }
`csv_read_time`	CSV read time.	{ "csv_read_time": 0.019296 }
`csv_render_time`	CSV render time.	{ "csv_render_time" : 0.01162 }
`deployment.app`	Apps installed per Splunk instance.	component: deployment.app data: { enabled: true host: monitoring name: alert_webhook version: 7.0.1 } date: 2018-10-26 deploymentID: 99b6ffd8-2e80-5e3b-905c-8c6f6fd743a0 executionID: F0AE995E8653D768A360E73BE3F544 timestamp: 1540570045 transactionID: 89F7329E-86AD-BBFD-034F-209CB8A06F05 version: 3 visibility: anonymous, support
`example_name`	Name of the Showcase example being run.	{ 'example_name': "'Predict Server Power Consumption'" }
`experiment_id`	ID of the `fit` and `apply` run on the Experiments page. All preprocessing steps and final `fit` have the same ID.	{ "experiment_id": "6c47bca2776d4b6cb82685461d918180" }
`fit_time`	Amount of time it took to run the `fit` command.	{ "fit_time": 39.87447 }
`full_punct`	The punct of the data during `fit` or `apply`.	{ "full_punct": [ ...s-s-s[//:::.s-]s"s/-/////.s/."sss"://:/-//@:///-."s"/.s(;sssss)s/.s(,ss)s/...s/."s-ss ] }
`handle_time`	Time for the handler to handle the data.	{ "handle_time": 0.274072 }
`modelId`	Model ID in which user saves their model.	{ modelId: 56ce5ff2442604580eca0f57f36b5b9c }
`numColumns`	Total number of columns in the dataset.	{ numColumns: 16 }
`numRows`	Total number of rows (events) in the dataset.	{ numRows: 150 }
`num_fields`	Total number of fields.	{ "num_fields": 4 }
`num_fields_fs`	Number of fields that have the `fs` for Field Selector prefix.	{ "num_fields_fs": 9 }
`num_fields_PC`	Number of fields that have the `PC` for preprocessed prefix.	{ "num_fields_PC": 70 }
`num_fields_prefixed`	Total number of preprocessed fields.	{ "num_fields_prefixed": 28 }
`num_fields_RS`	Number of fields that have the `RS` for Robust Scaler prefix.	{ "num_fields_RS": 17 }
`num_fields_SS`	Number of fields that have the `SS` for Standard Scaler prefix.	{ "num_fields_SS": 30 }
`num_fields_tfidf`	Number of fields that have used term frequency-inverse document frequency preprocessing.	{ "num_fields_tfidf": 9 }
`orig_sourcetype`	The original sourcetype of the machine data.	{ "orig_sourcetype" : "access_combined_wcookie" }
`params`	Optional parameters used in `fit` step.	{ "params": "{{\"with_std\": \"true\", \"with_mean\": \"true\"}}" }
`partialFit`	Whether or not the `fit` is a type of partial fit action.	{ partialFit: True }
`PID`	Process identifer associated with the command.	{ "PID" : 63654 }
`pipeline_stage`	Each preprocessing step on the Experiments page is assigned a number starting from 0. This helps determine the order of the preprocessing steps and length of the pipeline.	{ "pipeline_stage": 0 }
`rows`	The number of rows being run through `fit` command.	{ 'rows': 15627 }
`scoringName`	Name of the scoring operation if whitelisted. If name is not whitelisted, logs the hash of the `scoringName`.	scoringName: mean_squared_error
`scoringTimeSec`	Time taken by the scoring operation.	scoringTimeSec: 3.398707
`UUID`	Universally unique identifier associated with command. This is 128-bit and used to keep each `fit`/`apply` unique.	{ "UUID": "7e0828e7-3059-4a43-8419-acc0e81f2f2d" }

Related answers from Splunk Community

Share data in the Machine Learning Toolkit

What data is collected

Comments

Share data in the Machine Learning Toolkit

Was this topic useful?