Showcase examples
The Showcase contains the following examples, grouped by analytic.
Predict Numerical Fields
Algorithm: Linear Regression
Example | Dataset | Description |
---|---|---|
Predict Server Power Consumption | Server power (server_power.csv) | Predicts the power usage of a machine based on other metrics such as CPU utilization and memory transactions. |
Predict VPN Usage | App statistics (apps.csv) | Predicts the VPN usage of employees based on the frequency of use of other apps. |
Predict Median House Value | Housing (housing.csv) | Predicts median home value in a region based on housing value-related predictor fields. |
Predict the Energy Output of a Power Plant | Power plant humidity (power_plant.csv) | Predicts the power output of the power plant given other measured variables, such as ambient temperature and humidity. |
Predict Categorical Fields
Algorithm: Logistic Regression
Example | Dataset | Description |
---|---|---|
Predict the Failure of Hard Drives using SMART Metrics | Disk failures (disk_failures.csv) | Predicts whether the hard drive is going to fail based on various indicators of drive reliability. |
Predict the Presence of Malware from Firewall Traffic | Firewall traffic (firewall_traffic.csv) | Predicts whether the firewall is going to be affected by malware or has a vulnerability or not based on various traffic indicators on the firewall. |
Predict Telecom Customer Churn | Churn (churn.csv) | Predicts whether a customer will change providers (denoted as churn) based on the usage pattern of customers. |
Predict Incidence of Diabetes from Health Metrics | Diabetes (diabetes.csv) | Predicts response in diabetes data. |
Predict Vehicle Type from Onboard Metrics | Track day (track_day.csv) | Predicts the vehicle type given other onboard metrics. |
Detect Numerical Outliers
Algorithm: Distribution statistics
Example | Dataset | Description |
---|---|---|
Detect Outliers in Server Response Time | Server response time (hostperf.csv) | Detects outliers in server response time. |
Detect Outliers in Number of Logins (vs. Predicted Value) | Employee logins (logins.csv) | Forecasts the number of logins by hour and identify when the actual number of logins differs significantly from our forecast. |
Detect Outliers in Supermarket Purchases | Supermarket purchases (supermarket.csv) | Detects outliers in the quantity of purchases at a supermarket. |
Detect Outliers in Power Plant Humidity | Power plant humidity (power_plant.csv) | Detects outliers in humidity of a power plant. |
Detect Categorical Outliers
Algorithm: Probabilistic measures
Example | Dataset | Description |
---|---|---|
Detect Outliers in Disk Failure Events | Disk failures (disk_failures.csv) | Detects categorical outliers in disk failure data. |
Detect Outliers in Bitcoin Transactions | Bitcoin transactions (bitcoin_transactions.csv) | Detects outliers in bitcoin transactions that may reflect unusual activity. |
Detect Outliers in Supermarket Purchases | Supermarket purchases (supermarket.csv) | Detects outliers in the whole transaction at a supermarket. |
Detect Outliers in Mortgage Contract Data | Mortgage loans for New York (mortgage_loan_ny.csv) | Detects outliers in mortgage loans in New York. |
Detect Outliers in Diabetes Patient Records | Diabetic data (diabetic.csv) | Detects outliers in diabetic data. |
Detect Outliers in Mobile Phone Activity | Phone usage (phone_usage.csv) | Detects outliers in the number of calls that are incoming, outgoing, or missed from various phones. |
Forecast Time Series
Algorithm: State-space Method using Kalman Filter
Example | Dataset | Description |
---|---|---|
Forecast Internet Traffic | Internet traffic (internet_traffic.csv) | Forecasts the peak and off-peak times of internet usage given a few full cycles of internet traffic history. |
Forecast the Number of Employee Logins | Employee logins (logins.csv) | Forecasts the number of logins by hour. |
Forecast Monthly Sales for a Souvenir Shop | Souvenir sales (souvenir_sales.csv) | Forecasts the number of souvenir sales by month. |
Forecast the Number of Bluetooth Devices | Bluetooth devices (bluetooth.csv) | Forecasts the number of distinct Bluetooth contacts that are made to the access points placed in the busiest lecture halls on the campus of the National University of Singapore. |
Cluster Numeric Events
Algorithms: K-means, DBSCAN, Spectral Clustering, BIRCH
The Clustering showcase groups data into like classes. The classes aren't known in advance, but are intuited because the data tend to have similar qualities. On a graph, notice that some points are clustered together in one spot while other points are clustered together in another spot. The Clustering algorithm identifies those clustered points as belonging to the same group.
Example | Dataset | Description |
---|---|---|
Noisy Circles | sklearn_cluster_noisy_circles.csv | Features two concentric circles of data ("noisy" because they're not perfectly circular). An algorithm useful to this dataset will separate the circles into separate groups: DBScan and SpectralClustering succeed. |
Noisy Moons | sklearn_cluster_noisy_moons.csv | Features two half circles, with the algorithm's task to separate these half circles as two different categories. As before, DBScan and SpectralClustering succeed. |
Blobs | sklearn_cluster_blobs.csv | Displays distinct clusters of points. |
No Structure | sklearn_cluster_no_structure.csv | Shows clustering on randomly spaced data, which has no pattern to it at all. Clustering in this case a bound to be spurious, so no algorithm does particularly well. |
Using the Machine Learning Toolkit and Showcase app | The basic process of machine learning |
This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 1.0.0, 1.1.0, 1.2.0, 1.3.0
Feedback submitted, thanks!