Showcase examples
The Showcase contains the following examples, grouped by analytic.
Predict Numeric Fields
Algorithm: Linear Regression
Example | Dataset | Description |
---|---|---|
Predict Server Power Consumption | Server power (server_power.csv) | Predicts the power usage of a machine based on other metrics such as CPU utilization and memory transactions. |
Predict VPN Usage | App statistics (apps.csv) | Predicts the VPN usage of employees based on the frequency of use of other apps. |
Predict Median House Value | Housing (housing.csv) | Predicts median home value in a region based on housing value-related predictor fields. |
Predict the Energy Output of a Power Plant | Power plant humidity (power_plant.csv) | Predicts the power output of the power plant given other measured variables, such as ambient temperature and humidity. |
Predict Categorical Fields
Algorithm: Logistic Regression
Example | Dataset | Description |
---|---|---|
Predict the Failure of Hard Drives using SMART Metrics | Disk failures (disk_failures.csv) | Predicts whether the hard drive is going to fail based on various indicators of drive reliability. |
Predict the Presence of Malware from Firewall Traffic | Firewall traffic (firewall_traffic.csv) | Predicts whether the firewall is going to be affected by malware or has a vulnerability or not based on various traffic indicators on the firewall. |
Predict Telecom Customer Churn | Churn (churn.csv) | Predicts whether a customer will change providers (denoted as churn) based on the usage pattern of customers. |
Predict Incidence of Diabetes from Health Metrics | Diabetes (diabetes.csv) | Predicts response in diabetes data. |
Predict Vehicle Type from Onboard Metrics | Track day (track_day.csv) | Predicts the vehicle type given other onboard metrics. |
Detect Numeric Outliers
Algorithm: Distribution statistics
Example | Dataset | Description |
---|---|---|
Detect Outliers in Server Response Time | Server response time (hostperf.csv) | Detects outliers in server response time. |
Detect Outliers in Number of Logins (vs. Predicted Value) | Employee logins (logins.csv) | Forecasts the number of logins by hour and identify when the actual number of logins differs significantly from our forecast. |
Detect Outliers in Supermarket Purchases | Supermarket purchases (supermarket.csv) | Detects outliers in the quantity of purchases at a supermarket. |
Detect Outliers in Power Plant Humidity | Power plant humidity (power_plant.csv) | Detects outliers in humidity of a power plant. |
Detect Categorical Outliers
Algorithm: Probabilistic measures
Example | Dataset | Description |
---|---|---|
Detect Outliers in Disk Failure Events | Disk failures (disk_failures.csv) | Detects categorical outliers in disk failure data. |
Detect Outliers in Bitcoin Transactions | Bitcoin transactions (bitcoin_transactions.csv) | Detects outliers in bitcoin transactions that may reflect unusual activity. |
Detect Outliers in Supermarket Purchases | Supermarket purchases (supermarket.csv) | Detects outliers in the whole transaction at a supermarket. |
Detect Outliers in Mortgage Contract Data | Mortgage loans for New York (mortgage_loan_ny.csv) | Detects outliers in mortgage loans in New York. |
Detect Outliers in Diabetes Patient Records | Diabetic data (diabetic.csv) | Detects outliers in diabetic data. |
Detect Outliers in Mobile Phone Activity | Phone usage (phone_usage.csv) | Detects outliers in the number of calls that are incoming, outgoing, or missed from various phones. |
Forecast Time Series
Algorithm: State-space Method using Kalman Filter
Example | Dataset | Description |
---|---|---|
Forecast Internet Traffic | Internet traffic (internet_traffic.csv) | Forecasts the peak and off-peak times of internet usage given a few full cycles of internet traffic history. |
Forecast the Number of Employee Logins | Employee logins (logins.csv) | Forecasts the number of logins by hour. |
Forecast Monthly Sales | Souvenir sales (souvenir_sales.csv) | Forecasts the number of souvenir sales by month for a Souvenir Shop. |
Forecast the Number of Bluetooth Devices | Bluetooth devices (bluetooth.csv) | Forecasts the number of distinct Bluetooth contacts that are made to the access points placed in the busiest lecture halls on the campus of the National University of Singapore. |
Cluster Numeric Events
Algorithms: K-means, DBSCAN, Spectral Clustering, BIRCH
Example | Dataset | Description |
---|---|---|
Cluster Hard Drives by SMART Metrics | disk_failures.csv | Clusters hard drives based on the self-monitoring metrics they generate. |
Cluster Behavior by App Usage | app_usage.csv | Clusters the behavior of employees based on how frequently they use business applications like Webmail or VPN. |
Cluster Neighborhoods by Properties | housing.csv | Clusters neighborhoods based on properties like crime rate and median house value. |
Cluster Vehicles by Onboard Metrics | track_day.csv | Clusters vehicles driven on a racetrack by onboard metrics like engine temperature and G-forces. |
Cluster Power Plant Operating Regimes | power_plant.csv | Clusters the operating regimes of a power plant based on ambient measurements like temperature and vacuum. |
Using the Splunk Machine Learning Toolkit | The basic process of machine learning |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 2.0.1, 2.1.0, 2.2.0, 2.2.1
Feedback submitted, thanks!