The Showcase is comprised of pre-populated use cases for the Machine Learning Toolkit. The page is organized around the different guided modeling Assistants within the toolkit. Each modeling Assistant provides a guided machine-learning experience.
Watch and learn from examples drawn from IT, Security, IoT, and Business Analytics. Optionally choose to filter by one of these sub-groups in the drop-down menu that is available in the top left of the screen.
The toolkit ships with all of the example datasets used in the Showcase. Use these datasets to practice machine learning concepts, or to re-create the Showcase examples in your own instance before working with your own data.
The Showcase contains the following examples, grouped by Assistant.
Predict Numeric Fields
Algorithm: Linear regression
Predict the value of a numeric field using a weighted combination of the values of other fields in that event. A common use of these predictions is to identify anomalies: predictions that differ significantly from the actual value may be considered anomalous.
Example Dataset Description Predict Server Power Consumption Server power (server_power.csv) Predicts the power usage of a machine based on other metrics such as CPU utilization and memory transactions. Predict VPN Usage App statistics (apps.csv) Predicts the VPN usage of employees based on the frequency of use of other apps. Predict Median House Value Housing (housing.csv) Predicts median home value in a region based on housing value-related predictor fields. Predict Power Plant Energy Output Power plant humidity (power_plant.csv) Predicts the power output of the power plant given other measured variables, such as ambient temperature and humidity. Predict Future Logins Business processes (cyclical_business_process.csv) Predict future logins in business process.
Predict Categorical Fields
Algorithm: Logistic regression
Predict the value of a categorical field using the values of other fields in that event. A common use of these predictions is to identify anomalies: predictions that differ significantly from the actual value may be considered anomalous.
Example Dataset Description Predict Hard Drive Failure Disk failures (disk_failures.csv) Predicts whether the hard drive is going to fail based on various indicators of drive reliability. Predict the Presence of Malware Firewall traffic (firewall_traffic.csv) Predicts whether the firewall is going to be affected by malware or has a vulnerability or not based on various traffic indicators on the firewall. Predict Telecom Customer Churn Churn (churn.csv) Predicts whether a customer will change providers (denoted as churn) based on the usage pattern of customers. Predict the Presence of Diabetes Diabetes (diabetes.csv) Predicts response in diabetes data. Predict Vehicle Make and Model Track day (track_day.csv) Predicts the vehicle type given other onboard metrics. Predict External Anomalies Business processes (cyclical_business_process_with_external_anomalies.csv) Predicts external anomalies in business process data.
Detect Numeric Outliers
Algorithm: Distribution statistics
Find values that differ significantly from previous values.
Example Dataset Description Detect Outliers in Server Response Time Server response time (hostperf.csv) Detects outliers in server response time. Detect Outliers in Number of Logins (vs. Predicted Value) Employee logins (logins.csv) Forecasts the number of logins by hour and identify when the actual number of logins differs significantly from our forecast. Detect Outliers in Supermarket Purchases Supermarket purchases (supermarket.csv) Detects outliers in the quantity of purchases at a supermarket. Detect Outliers in Power Plant Humidity Power plant humidity (power_plant.csv) Detects outliers in humidity of a power plant. Detect Outliers in Logins Business processes (cyclical_business_process.csv) Detects cyclical outliers in logins. Detect Outliers in Call Center Data Call center data (call_center.csv) Detects cyclical outliers in call center data.
Detect Categorical Outliers
Algorithm: Probabilistic measures
Find events that contain unusual combinations of values.
Example Dataset Description Detect Outliers in Disk Failures Disk failures (disk_failures.csv) Detects categorical outliers in disk failure data. Detect Outliers in Bitcoin Transactions Bitcoin transactions (bitcoin_transactions.csv) Detects outliers in bitcoin transactions that may reflect unusual activity. Detect Outliers in Supermarket Purchases Supermarket purchases (supermarket.csv) Detects outliers in the whole transaction at a supermarket. Detect Outliers in Mortgage Contracts Mortgage loans for New York (mortgage_loan_ny.csv) Detects outliers in mortgage loans in New York. Detect Outliers in Diabetes Patient Records Diabetic data (diabetic.csv) Detects outliers in diabetic data. Detect Outliers in Mobile Phone Activity Phone usage (phone_usage.csv) Detects outliers in the number of calls that are incoming, outgoing, or missed from various phones.
Forecast Time Series
Algorithm: State-space method using Kalman filter
Forecast future values given past values of a metric (numeric time series).
Example Dataset Description Forecast Internet Traffic Internet traffic (internet_traffic.csv) Forecasts the peak and off-peak times of internet usage given a few full cycles of internet traffic history. Forecast the Number of Employee Logins Employee logins (logins.csv) Forecasts the number of logins by hour. Forecast Monthly Sales Souvenir sales (souvenir_sales.csv) Forecasts the number of souvenir sales by month for a Souvenir Shop. Forecast the Number of Bluetooth Devices Bluetooth devices (bluetooth.csv) Forecasts the number of distinct Bluetooth contacts that are made to the access points placed in the busiest lecture halls on the campus of the National University of Singapore. Forecast Exchange Rate TWI using ARIMA Exchange Rate TWI (exchange.csv) Forecasts the trade weighted index of a currency
Cluster Numeric Events
Algorithms: K-means, DBSCAN, Spectral Clustering, Birch
Partition events with multiple numeric fields into clusters.
Example Dataset Description Cluster Hard Drives by SMART Metrics disk_failures.csv Clusters hard drives based on the self-monitoring metrics they generate. Cluster Behavior by App Usage app_usage.csv Clusters the behavior of employees based on how frequently they use business applications like Webmail or VPN. Cluster Neighborhoods by Properties housing.csv Clusters neighborhoods based on properties like crime rate and median house value. Cluster Vehicles by Onboard Metrics track_day.csv Clusters vehicles driven on a racetrack by onboard metrics like engine temperature and G-forces. Cluster Power Plant Operating Regimes power_plant.csv Clusters the operating regimes of a power plant based on ambient measurements like temperature and vacuum. Cluster Business Anomalies cyclical_business_process.csv Cluster business anomalies to reduce noise.
Using the Splunk Machine Learning Toolkit
Understanding the fit and apply commands
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.2.0