predict
Description
The predict
command performs future predictions for time-series data.
The command can fill in missing data in a time-series and provide predictions for the next several time steps. The command provides confidence intervals for all of its estimates. The command adds a predicted value and an upper and lower 95th percentile range to each event in the time-series.
Syntax
predict <variable_to_predict> [AS <newfield>] [<predict_options>]
Required arguments
- <variable_to_predict>
- Syntax: <field>
- Description: The field name for the variable that you want to predict.
Optional arguments
- <newfield>
- Syntax: <string>
- Description: Renames the field name for <variable_to_predict>.
- <predict_options>
- Syntax: algorithm=<algorithm_name> | correlate_field=<field> | future_timespan=<number> | holdback=<number> | period=<number> | lowerXX=<field> | upperYY=<field>
- Description: Forecasting options. All options can be specified anywhere in any order.
Predict options
- algorithm
- Syntax: algorithm= LL | LLP | LLT | LLB | LLP5
- Description: Specify the name of the forecasting algorithm to apply: LL (local level), LLP (seasonal local level), LLT (local level trend), LLB (bivariate local level), or LLP5 (which combines LLP and LLT). Each algorithm expects a minimum number of data points; for more information, see "Algorithm options" below.
- Default: LLP5
- correlate
- Syntax: correlate=<field>
- Description: For bivariate model, indicates the field to correlate against.
- future_timespan
- Syntax: future_timespan=<number>
- Description: The length of prediction into the future. Must be a non-negative number. You would not use this option if algorithm=LLB.
- holdback
- Syntax: holdback=<number>
- Description: Specifies the <number> of data points from the end that are NOT used to build the model. For example, 'holdback=10' computes the prediction for the last 10 values. Typically, this is used to compare the predicted values to the actual data. Required when algorithm=LLB.
- lowerXX
- Syntax: lower<int>=<field>
- Description: Specifies a field name for the lower <int> percentage confidence interval. <int> is greater than or equal to 0 and less than 100.
- Default: lower95, in which 95% of predictions are expected to fall.
- period
- Syntax: period=<number>
- Description: If algorithm is LLP or LLP5, specify the seasonal period of the time series data. If not specified, the period is estimated using the data's auto-correlation. If algorithm is not LLP or LLP5, this is ignored.
- upperYY
- Syntax: upper<int>=<field>
- Description: Specifies a field name for the upper <int> percentage confidence interval. <int> is greater than or equal to 0 and less than 100.
- Default: upper95, in which 95% of predictions are expected to fall.
Algorithm options
All the algorithms are variations based on the Kalman filter. The algorithm names are: LL, LLP, LLT, LLB, and LLP5. Each algorithm above expects a minimum number of data points. If not enough effective data points are supplied, an error message is displayed. For instance, the field itself might have more than enough data points, but the number of effective data points might be small if the holdback is large.
Algorithm option | Algorithm name | Description |
---|---|---|
LL | Local level | This is a univariate model with no trends and no seasonality. Requires a minimum of 2 data points. |
LLP | Seasonal local level | This is a univariate model with seasonality. The periodicity of the time series is automatically computed. Requires the minimum number of data points to be twice the period. |
LLT | Local level trend | This is a univariate model with trend but no seasonality. Requires a minimum of 3 data points. |
LLB | Bivariate local level | This is a bivariate model with no trends and no seasonality. Requires a minimum of 2 data points. LLB uses one set of data to make predictions for another. For example, assume it uses dataset Y to make predictions for dataset X. If the holdback=10, this means LLB takes the last 10 data points of Y to make predictions for the last 10 data points of X. |
LLP5 | Combines LLT and LLP models for its prediction. |
Confidence intervals
The lower and upper confidence interval parameters default to lower95 and upper95. This specifies a confidence interval where 95% of the predictions are expected to fall.
It is typical for some of the predictions to fall outside the confidence interval because:
- The confidence interval does not cover 100% of the predictions.
- The confidence interval is about a probabilistic expectation and results do not match the expectation exactly.
Examples
Example 1:
Predict future downloads based on the previous download numbers.
index=download | timechart span=1d count(file) as count | predict count
Example 2:
Predict the values of foo using LL or LLP, depending on whether foo is periodic.
... | timechart span="1m" count AS foo | predict foo
Example 3:
Upper and lower confidence intervals do not have to match.
... | timechart span="1m" count AS foo | predict foo as fubar algorithm=LL upper90=high lower97=low future_timespan=10 holdback=20
Example 4:
Illustrates the LLB algorithm. The foo2 field is predicted by correlating it with the foo1 field.
... | timechart span="1m" count(x) AS foo1 count(y) AS foo2 | predict foo2 as fubar algorithm=LLB correlate=foo1 holdback=100
See also
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has about using the predict command.
PREVIOUS pivot |
NEXT rangemap |
This documentation applies to the following versions of Splunk^{®} Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14
Comments
Hi,<br /><br />While explaining Predict Options, for lowerXX and upperXX it is mentioned that "in which 95% of predictions are expected to fail". Whereas while explaining confidence intervals, it is mentioned that "interval where 95% of the predictions are expected to fall". Please look at the word fail and fall. I think fall is the right word.<br /><br />Thanks,<br />Strive
Thanks Alacer! I've corrected the typo. :D
Example 3: Should probably read as *Upper and lower confidence intervals DO not have to match*
Thank you for correcting the typo :)