Splunk Machine Learning Toolkit Showcase

Use the Splunk Machine Learning Toolkit (MLTK) Showcase to explore machine learning concepts. Each end-to-end example is comprised of a pre-populated use case for the Splunk Machine Learning Toolkit and each of the guided modeling Assistants. Filter the available Showcases by machine learning operation or industry to see the examples that best match your machine learning goals.

You can filter the Showcase by machine learning operation or industry as shown in the following image:

MLTK ships with all of the example datasets used in the Showcase. You can use these datasets to practice machine learning concepts, or to re-create the Showcase examples in your own instance before working with your own data.

Showcase examples

The Showcase contains the following examples, grouped here by machine learning operation:

LLM Insights

View examples of a new ML-SPL search command that allows MLTK users to send Splunk platform data through externally hosted large language models (LLMs) and have the response presented back in the Splunk search pipeline.

Example	Description
Field Extraction	Uses the `ai` command to generate regex for extracting fields from Splunk internal log messages.
Summarization	Uses the `ai` command to summarize internal Splunk error messages.
Anomaly Detection	Uses the `ai` command to detect anomalies in time series metrics, in this case counts of log events in the internal Splunk index.

Smart Prediction

Algorithm: AutoPrediction
Predict the value of a categorical or numeric field based on one or more other fields in the event using a step-by-step guided workflow.

Example	Dataset	Description
Predict Disk Utilization	Server power (server_power.csv)	Predicts disk utilization from the fields of disk access and disk blocks.
Predict the Presence Vulnerabilities	Firewall traffic (firewall_traffic.csv)	Predicts vulnerabilities in firewall data from other fields in the data including bytes received, packets received, packets sent, and bytes sent.

Predict Numeric Fields

Algorithm: Linear regression
Predict the value of a numeric field using a weighted combination of the values of other fields in that event. A common use of these predictions is to identify anomalies: predictions that differ significantly from the actual value may be considered anomalous.

Example	Dataset	Description
Predict Server Power Consumption	Server power (server_power.csv)	Predicts the power usage of a machine based on other metrics such as CPU utilization and memory transactions.
Predict VPN Usage	App statistics (apps.csv)	Predicts the VPN usage of employees based on the frequency of use of other apps.
Predict Median House Value	Housing (housing.csv)	Predicts median home value in a region based on housing value-related predictor fields.
Predict Power Plant Energy Output	Power plant humidity (power_plant.csv)	Predicts the power output of the power plant given other measured variables, such as ambient temperature and humidity.
Predict Future Logins	Business processes (cyclical_business_process.csv)	Predict future logins in business process.
Predict Future VPN Usage (sinusodial time)	App usage (app_usage.csv)	Predict future VPN usage with sinusodial time data.
Predict Future VPN Usage (categoricall time)	App usage (app_usage.csv)	Predict future VPN usage with categorical time data.

Predict Categorical Fields

Algorithm: Logistic regression
Predict the value of a categorical field using the values of other fields in that event. A common use of these predictions is to identify anomalies: predictions that differ significantly from the actual value may be considered anomalous.

Example	Dataset	Description
Predict Hard Drive Failure	Disk failures (disk_failures.csv)	Predicts whether the hard drive is going to fail based on various indicators of drive reliability.
Predict the Presence of Malware	Firewall traffic (firewall_traffic.csv)	Predicts whether the firewall is going to be affected by malware or has a vulnerability or not based on various traffic indicators on the firewall.
Predict Telecom Customer Churn	Churn (churn.csv)	Predicts whether a customer will change providers (denoted as churn) based on the usage pattern of customers.
Predict the Presence of Diabetes	Diabetes (diabetes.csv)	Predicts response in diabetes data.
Predict Vehicle Make and Model	Track day (track_day.csv)	Predicts the vehicle type given other onboard metrics.
Predict External Anomalies	Business processes (cyclical_business_process_with_external_anomalies.csv)	Predicts external anomalies in business process data.

Smart Outlier Detection

Algorithm: DensityFunction
Find numeric outliers using a step-by-step guided workflow to leverage a density algorithm and segment data in advance of your anomaly search.

Example	Dataset	Description
Find Anomalies in Hard Drive Metrics	Disk failures (disk_failures.csv)	Finds anomalies in SMART (self-monitoring, analysis, and reporting technology) metrics across different hard drive models.
Find Anomalies in Supermarket Purchases	Supermarket purchases (supermarket.csv)	Finds anomalies in supermarket purchase quantity metrics across different shops.

Detect Numeric Outliers

Algorithm: Distribution statistics
Find values that differ significantly from previous values.

Example	Dataset	Description
Detect Outliers in Server Response Time	Server response time (hostperf.csv)	Detects outliers in server response time.
Detect Outliers in Number of Logins (vs. Predicted Value)	Employee logins (logins.csv)	Forecasts the number of logins by hour and identify when the actual number of logins differs significantly from our forecast.
Detect Outliers in Supermarket Purchases	Supermarket purchases (supermarket.csv)	Detects outliers in the quantity of purchases at a supermarket.
Detect Outliers in Power Plant Humidity	Power plant humidity (power_plant.csv)	Detects outliers in humidity of a power plant.
Detect Outliers in Call Center Data	Call center data (call_center.csv)	Detects cyclical outliers in call center data.
Detect Outliers in Logins	Business processes (cyclical_business_process.csv)	Detects cyclical outliers in logins.

Detect Categorical Outliers

Algorithm: Probabilistic measures
Find events that contain unusual combinations of values.

Example	Dataset	Description
Detect Outliers in Disk Failures	Disk failures (disk_failures.csv)	Detects categorical outliers in disk failure data.
Detect Outliers in Bitcoin Transactions	Bitcoin transactions (bitcoin_transactions.csv)	Detects outliers in bitcoin transactions that may reflect unusual activity.
Detect Outliers in Supermarket Purchases	Supermarket purchases (supermarket.csv)	Detects outliers in the whole transaction at a supermarket.
Detect Outliers in Mortgage Contracts	Mortgage loans for New York (mortgage_loan_ny.csv)	Detects outliers in mortgage loans in New York.
Detect Outliers in Diabetes Patient Records	Diabetic data (diabetic.csv)	Detects outliers in diabetic data.
Detect Outliers in Mobile Phone Activity	Phone usage (phone_usage.csv)	Detects outliers in the number of calls that are incoming, outgoing, or missed from various phones.

Smart Forecasting

Algorithm: StateSpaceForecast
Forecast future numeric time series data using a step-by-step guided workflow with the option to bring in data from different sources and account for calendar specific "special days" such as holidays, company-specific event days.

Example	Dataset	Description
Forecast the Number of Calls to a Call Center	Call center data (call_center.csv)	Forecasts the number of calls to a call center.
Forecast App Logons with Special Days	Apps logon count (applogonscount.txt) and Special days (specialdays.txt)	Forecasts the logons to an app while accounting for special calendar days.
Forecast App Expenses	App usage (app_usage.csv)	Forecasts the expenses for an app.

Forecast Time Series

Algorithm: State-space method using Kalman filter
Forecast future values given past values of a metric (numeric time series).

Example	Dataset	Description
Forecast Internet Traffic	Internet traffic (internet_traffic.csv)	Forecasts the peak and off-peak times of internet usage given a few full cycles of internet traffic history.
Forecast the Number of Employee Logins	Employee logins (logins.csv)	Forecasts the number of logins by hour.
Forecast Monthly Sales	Souvenir sales (souvenir_sales.csv)	Forecasts the number of souvenir sales by month for a Souvenir Shop.
Forecast the Number of Bluetooth Devices	Bluetooth devices (bluetooth.csv)	Forecasts the number of distinct Bluetooth contacts that are made to the access points placed in the busiest lecture halls on the campus of the National University of Singapore.
Forecast Exchange Rate TWI using ARIMA	Exchange Rate TWI (exchange.csv)	Forecasts the trade weighted index of a currency

Smart Clustering

Algorithm: K-means
Cluster numeric events using a step-by-step guided workflow.

Example	Dataset	Description
Cluster Houses by Property Descriptions	Housing (housing.csv)	Clusters housing data based on property descriptions.
Cluster Mortgage Loans	Mortgage loans for New York (mortgage_loan_ny.csv)	Clusters mortgage data based on a series of fields.

Cluster Numeric Events

Algorithms: K-means, DBSCAN, Spectral Clustering, Birch
Partition events with multiple numeric fields into clusters.

Example	Dataset	Description
Cluster Hard Drives by SMART Metrics	Disk failures (disk_failures.csv)	Clusters hard drives based on the self-monitoring metrics they generate.
Cluster Behavior by App Usage	App statistics (app_usage.csv)	Clusters the behavior of employees based on how frequently they use business applications like Webmail or VPN.
Cluster Neighborhoods by Properties	Housing (housing.csv)	Clusters neighborhoods based on properties like crime rate and median house value.
Cluster Vehicles by Onboard Metrics	Track day (track_day.csv)	Clusters vehicles driven on a racetrack by onboard metrics like engine temperature and G-forces.
Cluster Power Plant Operating Regimes	Power plant humidity (power_plant.csv)	Clusters the operating regimes of a power plant based on ambient measurements like temperature and vacuum.
Cluster Business Anomalies to Reduce Noise	Cyclical business process (cyclical_business_process.csv)	Cluster business anomalies to reduce noise.

Related answers from Splunk Community

Splunk Machine Learning Toolkit Showcase

Showcase examples

LLM Insights

Smart Prediction

Predict Numeric Fields

Predict Categorical Fields

Smart Outlier Detection

Detect Numeric Outliers

Detect Categorical Outliers

Smart Forecasting

Forecast Time Series

Smart Clustering

Cluster Numeric Events

Comments

Splunk Machine Learning Toolkit Showcase

Was this topic useful?