Dataset credits
The Machine Learning Toolkit contains datasets that were provided by others. We want to thank and acknowledge the contributors for them, and provide the licenses for their use.
Disclaimer
This application may contain certain sample files and datasets, which are provided for your convenience only. Such files and datasets contain information and data compiled by third parties, and Splunk makes no representation or warranty that the data contained in such files and datasets are true, accurate, complete or sanitized. In using the datasets, you understand and agree that the data contained therein are subject to error and cannot be relied upon to perform the task you intend. You understand and agree that your use of the data is at your sole risk. The datasets are made available on an "as is" and "as available" basis without any warranties of any kind, whether express or implied, including without limitation implied warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event will Splunk assume any legal liability or responsibility for loss or damages arising from the sample datasets.
App statistics
Dataset: apps.csv
Used in example: Predict App Usage from Other Apps
License terms: Free to use, collected by Splunk.
Bitcoin transactions
Dataset: bitcoin_transactions.csv
Used in example: Detect Outliers in Bitcoin Transactions
License terms: Free to use with citation: http://compbio.cs.uic.edu/data/bitcoin/
Bluetooth devices
Dataset: bluetooth.csv
Used in example: Forecast the Number of Bluetooth Devices
License terms: CRAWDAD Data License
Dear Licensee:
Thank you for your interest in obtaining and using data from the CRAWDAD archive, hereinafter referred to as "Data". CRAWDAD is the Community Resource for Archiving Wireless Data At Dartmouth, and is operated by Dartmouth College under a grant from the National Science Foundation. Data Licensing Information:
Dartmouth College hereby grants a nonexclusive, nontransferable license to use the Data for commercial, educational, and research purposes only. The Data shall not be redistributed without the express written prior approval of Dartmouth College.
Licensee agrees to respect the privacy of those human subjects whose wireless-network activity is captured by the Data. Do not attempt to reverse the anonymization process to identify specific MAC addresses, IP address, telephone number, or other identifiers, or to identify their actual location. Use only the header information in packet traces; do not attempt to extract further information. (Header information specifies the type of information that is being transferred over the network, and specifically excludes the contents of the data, such as usernames, passwords, filenames, files, or URLs.)
Licensee agrees to acknowledge the source of the Data in any publications reporting on Licensee's use of it. For example, "We gratefully acknowledge the use of wireless data from the CRAWDAD archive at Dartmouth College."
Dartmouth expressly reserves the right to use the Data by its faculty, staff and researchers, for educational and research purposes. Dartmouth further reserves the right to provide Data Providers with statistical information regarding licensee's access to and use of the Provider's Data and with the Licensee's name and address.
Dartmouth College provides the Data "AS IS," without any warranty or promise of technical support, and disclaims any liability of any kind for any damages whatsoever resulting from use of Data.
DARTMOUTH MAKES NO WARRANTIES, EXPRESS OR IMPLIED WITH RESPECT TO THE DATA, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY EXPRESSLY DISCLAIMED.
Your acceptance and use of the Data binds you to the terms and conditions of this License as stated herein.
Trustees of Dartmouth College David F. Kotz, Ph.D. Professor of Computer Science 6211 Sudikoff Lab Hanover, NH 03755 USA
E-mail: kotz@cs.dartmouth.edu
http://crawdad.org/nus/bluetooth/20070903/sql/
Churn
Dataset: churn.csv
Used in example: Predict Telecom Customer Churn
License terms: Free to use, with citation request: http://www.sgi.com/tech/mlc/db/churn.all
Cluster events
Datasets:
- sklearn_cluster_blobs.csv
- sklearn_cluster_no_structure.csv
- sklearn_cluster_noisy_circles.csv
- sklearn_cluster_noisy_moons.csv
License terms: http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html
Scikit-learn: Machine Learning in Python (http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html), Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Diabetes
Dataset: diabetes.csv
Used in example: Predict Incidence of Diabetes from Health Metrics
License terms: Free to use with citation: http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/
Lichman, M. (2013). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
Diabetic data
Dataset: diabetic.csv
Used in example: Detect Outliers in Diabetes Patient Records
License terms: Free to use with citation: Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, “Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,” BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014.
Disk failures
Dataset: disk_failures.csv
Used in examples:
- Detect Outliers in Disk Failure Events
- Predict the Failure of Hard Drives using SMART Metrics
License terms: Free to use with the following constraints:
A. We are encouraged to cite Backblaze as the source (not a mandatory requirement).
B. We accept that we are solely responsible for how we use the data.
C. We do not sell this data to anyone, it is free.
https://www.backblaze.com/hard-drive-test-data.html
Employee logins
Dataset: logins.csv
Used in examples: Forecast the Number of Employee Logins Detect Outliers in Number of Logins (vs. Predicted Value)
License terms: Free to use, collected by Splunk.
Firewall traffic
Dataset: firewall_traffic.csv
Used in example: Predict the Presence of Malware from Firewall Traffic
License terms: Free to use, collected by Splunk.
Housing
Dataset: housing.csv
Used in example: Predict Median House Value
License terms: Free to use. https://archive.ics.uci.edu/ml/datasets/Housing
Internet traffic
Dataset: internet_traffic.csv
Used in example: Forecast Internet Traffic
License terms: Free to use with citation: P. Cortez, M. Rio, M. Rocha and P. Sousa. Multiscale Internet Traffic Forecasting using Neural Networks and Time Series Methods. In Expert Systems, Wiley-Blackwell, In press.
Mortgage loans for New York
Dataset: mortgage_loan_ny.csv
Used in example: Detect Outliers in Mortgage Contract Data
License terms: http://www.fhfa.gov/AboutUs/Policies/Pages/API.aspx
This product uses FHFA Data but is neither endorsed nor certified by FHFA.
Phone usage
Dataset: phone_usage.csv
Used in example: Detect Outliers in Mobile Phone Activity
License terms: CRAWDAD Data License Dear Licensee:
Thank you for your interest in obtaining and using data from the CRAWDAD archive, hereinafter referred to as "Data". CRAWDAD is the Community Resource for Archiving Wireless Data At Dartmouth, and is operated by Dartmouth College under a grant from the National Science Foundation. Data Licensing Information:
Dartmouth College hereby grants a nonexclusive, nontransferable license to use the Data for commercial, educational, and research purposes only. The Data shall not be redistributed without the express written prior approval of Dartmouth College.
Licensee agrees to respect the privacy of those human subjects whose wireless-network activity is captured by the Data. Do not attempt to reverse the anonymization process to identify specific MAC addresses, IP address, telephone number, or other identifiers, or to identify their actual location. Use only the header information in packet traces; do not attempt to extract further information. (Header information specifies the type of information that is being transferred over the network, and specifically excludes the contents of the data, such as usernames, passwords, filenames, files, or URLs.)
Licensee agrees to acknowledge the source of the Data in any publications reporting on Licensee's use of it. For example, "We gratefully acknowledge the use of wireless data from the CRAWDAD archive at Dartmouth College."
Dartmouth expressly reserves the right to use the Data by its faculty, staff and researchers, for educational and research purposes. Dartmouth further reserves the right to provide Data Providers with statistical information regarding licensee's access to and use of the Provider's Data and with the Licensee's name and address.
Dartmouth College provides the Data "AS IS," without any warranty or promise of technical support, and disclaims any liability of any kind for any damages whatsoever resulting from use of Data.
DARTMOUTH MAKES NO WARRANTIES, EXPRESS OR IMPLIED WITH RESPECT TO THE DATA, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY EXPRESSLY DISCLAIMED.
Your acceptance and use of the Data binds you to the terms and conditions of this License as stated herein.
Trustees of Dartmouth College David F. Kotz, Ph.D. Professor of Computer Science 6211 Sudikoff Lab Hanover, NH 03755 USA
E-mail: kotz@cs.dartmouth.edu
http://crawdad.org/ctu/personal/20120315/
Power plant humidity
Dataset: power_plant.csv
Used in examples: Predict the Energy Output of a Power Plant, Detect Outliers in Power Plant Humidity
License terms: Free to use, with citation request: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant
Server power
Dataset: server_power.csv
Used in example: Predict Server Power Consumption
License terms: Free to use, with citation request: https://www.usenix.org/legacy/event/hotpower08/tech/full_papers/rivoire/rivoire.pdf
Server response time
Dataset: hostperf.csv
Used in example: Detect Outliers in Server Response Time
License terms: Free to use, collected by Splunk.
Souvenir sales
Dataset: souvenir_sales.csv
Used in example: Forecast Monthly Sales for a Souvenir Shop
License terms: Default open license. This data release is licensed as follows:
- You may copy and redistribute the data.
- You may make derivative works from the data.
- You may use the data for commercial purposes.
- You may not sublicense the data when redistributing it.
- You may not redistribute the data under a different license.
- Source attribution on any use of this data: Must refer to source:
Supermarket purchases
Dataset: supermarket.csv
Used in example: Detect Outliers in Supermarket Purchases
License terms: Free to use with citation: Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D. and Giannotti, F., ‘Explaining the Product Range Effect in Purchase Data’. In BigData, 2013.
Track day
Dataset: track_day.csv
Used in example: Predict Vehicle Type from Onboard Metrics
License terms: Free to use, collected by Splunk.
Third-party software credits |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 1.0.0, 1.1.0, 1.2.0, 1.3.0
Feedback submitted, thanks!