Splunk® Enterprise

Search Reference

Splunk Enterprise version 7.2 is no longer supported as of April 30, 2021. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.

predict

Description

The predict command forecasts values for one or more sets of time-series data. The command can also fill in missing data in a time-series and provide predictions for the next several time steps.

The predict command provides confidence intervals for all of its estimates. The command adds a predicted value and an upper and lower 95th percentile range to each event in the time-series. See the Usage section in this topic.

Use current Splunk machine learning (ML) tools to take advantage of the latest algorithms and get the most powerful results. See About the Splunk Machine Learning Toolkit in the Splunk Machine Learning Toolkit.

Syntax

predict <field-list> [AS <newfield>] [<predict_options>]

Required arguments

<field-list>
Syntax: <field>...
Description: The names of the fields for the variable that you want to predict. You can specify one or more fields.

Optional arguments

<newfield>
Syntax: <string>
Description: Renames the fields that are specified in the <field-list>. You do not need to rename every field that you specify in the <field-list>. However, for each field that you want to rename, you must specify a separate AS <newfield> clause.
<predict_options>
Syntax: algorithm=<algorithm_name> | correlate_field=<field> | future_timespan=<number> | holdback=<number> | period=<number> | suppress=<bool> | lowerXX=<field> | upperYY=<field>
Description: Options you can specify to control the predictions. You can specify one or more options, in any order. Each of these options is described in the Predict options section.

Predict options

algorithm
Syntax: algorithm= LL | LLT | LLP | LLP5 | LLB | BiLL
Description: Specify the name of the forecasting algorithm to apply. LL, LLT, LLP, and LLP5 are univariate algorithms. LLB and BiLL are bivariate algorithms. All the algorithms are variations based on the Kalman filter. Each algorithm expects a minimum number of data points. If not enough effective data points are supplied, an error message is displayed. For instance, the field itself might have more than enough data points, but the number of effective data points might be small if the holdback value that you specify is large.
Default: LLP5
Algorithm option Algorithm type Description
LL Local level A univariate model with no trends and no seasonality. Requires a minimum of 2 data points. The LL algorithm is the simplest algorithm and computes the levels of the time series. For example, each new state equals the previous state, plus the Gaussian noise.
LLT Local level trend A univariate model with trend, but no seasonality. Requires a minimum of 3 data points.
LLP Seasonal local level A univariate model with seasonality. The number of data points must be at least twice the number of periods, using the period attribute. The LLP algorithm takes into account the cyclical regularity of the data, if it exists. If you know the number of periods, specify the period argument. If you do not set the period, this algorithm tries to calculate it. LLP returns an error message if the data is not periodic.
LLP5 Combines LLT and LLP models for its prediction. If the time series is periodic, LLP5 computes two predictions, one using LLT and the other using LLP. The algorithm then takes a weighted average of the two values and outputs that as the prediction. The confidence interval is also based on a weighted average of the variances of LLT and LLP.
LLB Bivariate local level A bivariate model with no trends and no seasonality. Requires a minimum of 2 data points. LLB uses one set of data to make predictions for another. For example, assume it uses dataset Y to make predictions for dataset X. If holdback=10, LLB takes the last 10 data points of Y to make predictions for the last 10 data points of X.
BiLL Bivariate local level A bivariate model that predicts both time series simultaneously. The covariance of the two series is taken into account.
correlate
Syntax: correlate=<field>
Description: Specifies the time series that the LLB algorithm uses to predict the other time series. Required when you specify the LLB algorithm. Not used for any other algorithm.
Default: None
future_timespan
Syntax: future_timespan=<num>
Description: Specifies how many future predictions the predict command will compute. This number must be a non-negative number. You would not use the future_timespan option if algorithm=LLB.
Default: 5
holdback
Syntax: holdback=<num>
Description: Specifies the number of data points from the end that are not to be used by the predict command. Use in conjunction with the future_timespan argument. For example, 'holdback=10 future_timespan=10' computes the predicted values for the last 10 values in the data set. You can then judge how accurate the predictions are by checking whether the actual data point values fall into the predicted confidence intervals.
Default: 0
lowerXX
Syntax: lower<int>=<field>
Description: Specifies a percentage for the confidence interval and a field name to use for the lower confidence interval curve. The <int> value is a percentage that specifies the confidence level. The integer must be a number between 0 and 100. The <field> value is the field name.
Default: The default confidence interval is 95%. The default field name is 'lower95(prediction(X))' where X is the name of the field to be predicted.
period
Syntax: period=<num>
Description: Specifies the length of the time period, or recurring cycle, in the time series data. The number must be at least 2. The LLP and LLP5 algorithms attempt to compute the length of time period if no value is specified. If you specify the span argument with the timechart command, the unit that you specify for span is the unit used for period. For example, if your search is ...|timechart span=1d foo2| predict foo2 period=3. The spans are 1 day and the period for the predict is 3 days. Otherwise, the unit for the time period is a data point. For example, if there are a thousand events, then each event is a unit. If you specify period=7, that means the data recycles after every 7 data points, or events.
Default: None
suppress
Syntax: suppress=<field>
Description: Used with the multivariate algorithms. Specifies one of the predicted fields to hide from the output. Use suppress when it is difficult to see all of the predicted visualizations at the same time.
Default: None
upperYY
Syntax: upper<int>=<field>
Description: Specifies a percentage for the confidence interval and a field name to use for the upper confidence interval curve. The <int> value is a percentage that specifies the confidence level. This must be a number between 0 and 100. The <field> value is the field name.
Default: The default confidence interval is 95%. The default field name is 'upper95(prediction(X))' where X is the name of the field to be predicted.

Confidence intervals

The lower and upper confidence interval parameters default to lower95 and upper95. These values specify a confidence interval where 95% of the predictions are expected fall.

It is typical for some of the predictions to fall outside the confidence interval.

  • The confidence interval does not cover 100% of the predictions.
  • The confidence interval is about a probabilistic expectation and results do not match the expectation exactly.

Usage

Command sequence requirement

The predict command must be preceded by the timechart command. The predict command requires time series data. See the Examples section for more details.

How it works

The predict command models the data by stipulating that there is an unobserved entity which progresses through time in different states.

To predict a value, the command calculates the best estimate of the state by considering all of the data in the past. To compute estimates of the states, the command hypothesizes that the states follow specific linear equations with Gaussian noise components.

Under this hypothesis, the least-squares estimate of the states are calculated efficiently. This calculation is called the Kalman filter, or Kalman-Bucy filter. For each state estimate, a confidence interval is obtained. The estimate is not a point estimate. The estimate is a range of values that contain the observed, or predicted, values.

The measurements might capture only some aspect of the state, but not necessarily the whole state.

Missing values

The predict command can work with data that has missing values. The command calculates the best estimates of the missing values.

Do not remove events with missing values, Removing the events might distort the periodicity of the data. Do not specify cont=false with the timechart command. Specifying cont=false removes events with missing values.

Specifying span

The unit for the span specified with the timechart command must be seconds or higher. The predict command cannot accept subseconds as an input when it calculates the period.

Examples

1. Predict future access

This example uses the sample data from the Search Tutorial but should work with any format of Apache web access log. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range All time when you run the search.

Predict future access based on the previous access numbers that are stored in Apache web access log files. Count the number of access attempts using a span of 1 day.

sourcetype=access_combined_* | timechart span=1d count(file) as count | predict count

The results appear on the Statistics tab. Click the Visualization tab. If necessary change the chart type to a Line Chart.

This image shows a line chart. There are two lines, one for the actual count and one for the predicted count.  There are also areas on either side of the lines to show the lower95 and upper95 prediction ranges.

2. Predict future purchases for a product

This example uses the sample data from the Search Tutorial. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range All time when you run the search.

Chart the number of purchases made daily for a specific product.

sourcetype=access_* action=purchase arcade | timechart span=1d count

  • This example searches for all purchases events, defined by the action=purchase for the arc, and pipes those results into the timechart command.
  • The span=1day argument buckets the count of purchases into daily chunks.

The results appear on the Statistics tab and look something like this:

_time count
2018-06-11 17
2018-06-12 63
2018-06-13 94
2018-06-14 82
2018-06-15 63
2018-06-16 76
2018-06-17 70
2018-06-18 72

Add the predict command to the search to calculate the prediction for the number of purchases of the Arcade games that might be sold in the near future.

sourcetype=access_* action=purchase arcade | timechart span=1d count | predict count

The results appear on the Statistics tab. Click the Visualization tab. If necessary change the chart type to a Bar Chart.

This image shows a combination bar and line chart. The bars represent the actual count. A line represents the predicted count.  There are also areas on either side of the line to show the lower95 and upper95 prediction ranges.


3. Predict the values using the default algorithm

Predict the values of foo using the default LLP5 algorithm, an algorithm that combines the LLP and LLT algorithms.

... | timechart span="1m" count AS foo | predict foo

4. Predict multiple fields using the same algorithm

Predict multiple fields using the same algorithm. The default algorithm in this example.

... | timechart ... | predict foo1 foo2 foo3

5. Specifying different upper and lower confidence intervals

When specifying confidence intervals, the upper and lower confidence interval values do not need to match. This example predicts 10 values for a field using the LL algorithm, holding back the last 20 values in the data set.

... | timechart span="1m" count AS foo | predict foo AS foobar algorithm=LL upper90=high lower97=low future_timespan=10 holdback=20

6. Predict the values using the LLB algorithm

This example illustrates the LLB algorithm. The foo3 field is predicted by correlating it with the foo2 field.

... | timechart span="1m" count(x) AS foo2 count(y) AS foo3 | predict foo3 AS foobar algorithm=LLB correlate=foo2 holdback=100

7. Omit the last 5 data points and predict 5 future values

In this example, the search abstains from using the last 5 data points and makes 5 future predictions. The predictions correspond to the last 5 values in the data. You can judge how accurate the predictions are by checking whether the observed values fall into the predicted confidence intervals.

... | timechart ... | predict foo holdback=5 future_timespan=5

8. Predict multiple fields using the same algorithm and the same future_timespan and holdback

Predict multiple fields using the same algorithm and same future_timespan and holdback.

... | timechart ... | predict foo1 foo2 foo3 algorithm=LLT future_timespan=15 holdback=5

9. Specify aliases for fields

Use aliases for the fields by specifying the AS keyword for each field.

... | timechart ... | predict foo1 AS foobar1 foo2 AS foobar2 foo3 AS foobar3 algorithm=LLT future_timespan=15 holdback=5

10. Predict multiple fields using different algorithms and options

Predict multiple fields using different algorithms and different options for each field.

... | timechart ... | predict foo1 algorithm=LL future_timespan=15 foo2 algorithm=LLP period=7 future_timespan=7

11. Predict multiple fields using the BiLL algorithm

Predict values for foo1 and foo2 together using the bivariate algorithm BiLL.

... | timechart ... | predict foo1 foo2 algorithm=BiLL future_timespan=10

See also

trendline, x11

Last modified on 11 December, 2023
pivot   rangemap

This documentation applies to the following versions of Splunk® Enterprise: 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.3.0, 9.3.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters