Splunk® Machine Learning Toolkit

User Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Forecast Time Series

MLApp ForecastTimeSeries.png

The Forecast Time Series assistant is specifically for time series data to predict the next value of a sequence of data. The result is both a predicted value and a measure of the uncertainty of that prediction.

Forecasting is similar to prediction, however forecasting is attempting to tell the future, where prediction is used to validate test data that you already have currently.

Algorithm

  • State-space method using Kalman filter

Workflow

The basic steps for forecasting a time series are as follows:

  1. Enter a search to retrieve your data, then click the search button to run it.
  2. Select the field you want to predict. This list of fields is populated by the search you just ran.
  3. Select a forecasting method. These algorithms consider subsets of features such as local level (an average of recent values), trend (a slope of line that fits through recent values), and seasonality (repeating patterns).
  4. Specify a value for the number of values to withhold, which indicates how many search results to use for validating the quality of the forecast. The larger the withholding, the less you have to train your model.
  5. Specify a value for the number of values to forecast, which indicates how far beyond the data to try to predict. The size of the confidence interval is used to gauge how confident the algorithm is in its forecast.
  6. Select a value for the confidence interval, which is the percentage of the future data you expect to fall inside of the confidence envelope.
  7. Select a value for period, which indicates the period of any known repeating patterns in the data to assist the algorithm. For example, if your data includes monthly sales data that follows annual patterns, specify "12" for the period.
  8. Click Forecast.


Interpret and validate

  • Raw Data Preview: Displays the raw data from the search.
  • Forecast: In shades of brown and beige, a graph displays the actual value as a solid line and the predicted value as a dotted line, surrounded by a confidence envelope. Values that fall outside the confidence envelope are outliers. A vertical line indicates where training data stops and test data begins. When the real data ends, forecasted values are displayed in shades of green.
  • Interpretation: The larger the envelope, the less confidence we have about forecasts around that time. The size of the envelope is directly related to the specified confidence interval percentage.

  • R2 Statistic: Explains how well the model explains the variability of the result. 100% (a value of 1) means the model fits perfectly.
  • Interpretation: The closer the value is to 1 (100%), the better the result.

  • Root Mean Squared Error: Explains the variability of the result, which is essentially the standard deviation of the residual. The formula takes the difference between actual and predicted values, squares this value, takes an average, and then takes a square root.
  • Interpretation: This value can be arbitrarily large and just gives you an idea of how close or far the model is. These values only make sense within one dataset however, and shouldn't be compared to values outside of it.

  • Prediction Outliers: Shows the total number of outliers that were detected.

Refine the forecast

After you have created a forecast, you can select a different algorithm to see whether a different choice yields better results, but the quality of the forecast mostly depends on how predictable the data is.

Deploy the forecast

Once you have validated and refined a forecast and are satisfied with it, review the options in the Deploy Forecast section:

  • Clicking any title takes you to a new Search tab, filled out with a search query that uses all data (not just the training set).
  • Some options show different computations.
  • Show future predictions that match specific criteria is useful for setting up an alert. When the forecast predicts a certain value you set, you can create an alert using the conditions.
Last modified on 05 August, 2016
Detect Categorical Outliers   What's new

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 1.0.0, 1.1.0, 1.2.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters