tstats command to perform statistical queries on indexed fields in tsidx files. The indexed fields can be from normal index data, tscollect data, or accelerated data models.
| tstats [prestats=<bool>] [local=<bool>] [append=<bool>] [summariesonly=<bool>] [allow_old_summaries=<bool>] [chunk_size=<unsigned int>] <stats-func>... [FROM ( <namespace> | sid=<tscollect-job-id> | datamodel=<data_model-name> )] [WHERE <search-query>] [BY <field-list> [span=<timespan>] ]
- Syntax: count(<field>) | ( avg | dc | earliest | estdc | exactperc | first | last | latest | median | max | min | mode | perc | p | range | stdev | stdevp | sum | sumsq | upperperc | values | var | varp )(<field>) [AS <string>]
- Description: Either perform a basic count of a field or perform a function on a field. You can provide any number of aggregates to perform. You can also rename the result using 'AS', unless you are in prestats mode. For the complete list of functions with examples, see Statistical and charting functions.
- Syntax: append=<bool>
- Description: When in prestats mode (
append=twhere the prestats results append to existing results, instead of generating them.
- Default: false
- Syntax: allow_old_summaries=true | false
- Description: Only applies when selecting from an accelerated data model. To return results from summary directories only when those directories are up-to-date, set this parameter to false. If the data model definition has changed, summary directories that are older than the new definition are not used when producing output from tstats. This default ensures that the output from tstats will always reflect your current configuration. When set to true, tstats will use both current summary data and summary data that was generated prior to the definition change. Essentially this is an advanced performance feature for cases where you know that the old summaries are "good enough".
- Default: false
- Syntax: chunk_size=<unsigned_int>
- Description: Advanced option. This argument controls how many events are retrieved at a time within a single TSIDX file when answering queries. Only consider supplying a lower value for this if you find a particular query is using too much memory. The case that could cause this would be an excessively high cardinality split-by, such as grouping by several fields that have a very large amount of distinct values. Setting this value too low can negatively impact the overall run time of your query.
- Default: 10000000 (10 million)
- Syntax: datamodel=<data_model-name>
- Description: The name of an accelerated data model.
- Syntax: <field>, ...
- Description: Specify one or more fields to group results.
- Syntax: local=true | false
- Description: If true, forces the processor to be run only on the search head.
- Default: false
- Syntax: <string>
- Description: Define a location for the tsidx file with
$SPLUNK_DB/tsidxstats. If you have Splunk Enterprise, you can configure this location by editing
indexes.confand setting the
- Syntax: prestats=true | false
- Description: Specifies whether to use the prestats format. The prestats format is a Splunk internal format that is designed to be consumed by commands that generate aggregate calculations. When using the prestats format you can pipe the data into the chart, stats, or timechart commands, which are designed to accept the prestats format. When
prestats=true, AS instructions are not relevant. The field names for the aggregates are determined by the command that consumes the prestats format and produces the aggregate output.
- Default: false
- Syntax: sid=<tscollect-job-id>
- Description: The job ID string of a tscollect search (that generated tsidx files).
- Syntax: summariesonly=<bool>
- Description: Only applies when selecting from an accelerated data model. When false, generates results from both summarized data and data that is not summarized. For data not summarized as TSIDX data, the full search behavior will be used against the original index data. If set to true, 'tstats' will only generate results from the TSIDX data that has been automatically generated by the acceleration and non-summarized data will not be provided.
- Default: false
- Syntax: span=<timespan>
- Description: The span of each time bin. If you use the BY clause to group by
_time, use the
spanargument to group the time buckets. You can specify timespans such as
...BY _time span=1hor
BY _time span=5d. If you do not specify a <timespan>, the default is
auto, which means that the number of time buckets adjusts to produce a reasonable number of results. For example if initially seconds are used for the <timespan> and too many results are being returned, the <timespan> is changed to a longer value, such as minutes, to return fewer time buckets.
- Default: auto
- Syntax: auto | <int><timescale>
- Syntax: <sec> | <min> | <hr> | <day> | <month>
- Description: Time scale units. For the
tstatscommand, the <timescale> does not support subseconds.
- Default: sec
Time scale Syntax Description <sec> s | sec | secs | second | seconds Time scale in seconds. <min> m | min | mins | minute | minutes Time scale in minutes. <hr> h | hr | hrs | hour | hours Time scale in hours. <day> d | day | days Time scale in days. <month> mon | month | months Time scale in months.
tstats command is a generating command. Generating commands use a leading pipe character.
tstats command must be the first command in a search pipeline, except when (
tstats command does not support wildcard characters in field values in aggregate functions or BY clauses.
For example, you cannot specify
| tstats avg(foo*) or
| tstats count WHERE host=x BY source*.
Samples of aggregate functions include avg(), count(), max(), min(), and sum().
Any results returned where the aggregate function or BY clause includes a wildcard character are only the most recent few minutes of data that has not been summarized. Include the
summariesonly=t argument with your
tstats command to return only summarized data.
Functions and memory usage
Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the
distinct_count function requires far more memory than the
count function. The
list functions also can consume a lot of memory.
If you are using the
distinct_count function without a split-by field or with a low-cardinality split-by by field, consider replacing the
distinct_count function with the the
estdc function (estimated distinct count). The
estdc function might result in significantly lower memory usage and run times.
Memory and maximum results
limits.conf file, the
maxresultrows setting in the
[searchresults] stanza specifies the maximum number of results to return. The default value is 50,000. Increasing this limit can result in more memory usage.
max_mem_usage_mb setting in the
[default] stanza is used to limit how much memory the
tstats command uses to keep track of information. If the
tstats command reaches this limit, the command stops adding the requested fields to the search results. You can increase the limit, contingent on the available system memory.
If you are using Splunk Cloud and want to change either of these limits, file a Support ticket.
Complex aggregate functions
tstats command does not support complex aggregate functions such as
Consider the following query. This query will not return accurate results because complex aggregate functions are not supported by the
| tstats summariesonly=false values(Authentication.tag) as tag,
values(Authentication.app) as app,
count(eval('Authentication.action'=="failure")) as failure,
as success from datamodel=Authentication by Authentication.src
| search success>0 |
where failure > 5
Instead, separate out the aggregate functions from the eval functions, as shown in the following search.
| tstats `summariesonly` values(Authentication.app) as app,
count from datamodel=Authentication.Authentication by Authentication.action, Authentication.src
| eval success=if(action="success",count,0), failure=if(action="failure",count,0)
| stats values(app) as app, sum(failure) as failure, sum(success) as success by src
You can generate sparkline charts with the
tstats command only if you specify the
_time field in the BY clause and use the
stats command to generate the actual sparkline. For example:
| tstats count from datamodel=Authentication.Authentication BY _time, Authentication.src span=1h | stats sparkline(sum(count),1h) AS sparkline, sum(count) AS count BY Authentication.src
tstats command to perform statistical queries on indexed fields in
tsidx files. You can select the data for the indexed fields in several ways.
- Normal index data
- Use a FROM clause to specify a namespace, search job ID, or data model. If you do not specify a FROM clause, the Splunk software selects from index data in the same way as the
searchcommand. You are restricted to selecting data from your allowed indexes by user role. You control exactly which indexes you select data from by using the WHERE clause. If no indexes are mentioned in the WHERE clause, the Splunk software uses the default indexes. By default, role-based search filters are applied, but can be turned off in the limits.conf file.
- Data manually collected with the tscollect command
- You can select data from your namespace by specifying
FROM <namespace>. If you did not specify a namespace with the
tscollectcommand, the data is collected into the dispatch directory of that job. If the data is in the dispatch directory, you select the data by specifying
- An accelerated data model
- You can select data from a high-performance analytics store, which is a collection of
.tsidxdata summaries, for an accelerated data model. You can select data from this accelerated data model by using
Search filters cannot be applied to accelerated data models. This includes both role-based and user-based search filters.
- An accelerated data model dataset
- When you select data within an accelerated data model, you can further constrain your search by indicating a dataset within that data model that you want to select data from. You do this by using a
whereclause to indicate the
nodenameof the data model dataset. The
nodenamevalue indicates where the dataset is in a data model hierarchy.
- When you use
nodenamein a search, you always use the following construction:
FROM datamodel=<data_model_name> where nodename=<root_dataset_name>.<parent_dataset_name>.<...>.<target_dataset_name>.
- For example, say you want to search on a dataset named
internal_serverdata model. In that data model, the
scheduled_reportsdataset is a child of the
schedulerdataset, which in turn is a child of the
serverroot event dataset. This means that you should represent the
scheduled_reportdataset in your search as
- If you run that search and decide you want to search on the contents of the
schedulerdata model dataset instead, you would use
nodename=server.schedulerin your new search.
Search filters cannot be applied to accelerated data model datasets. This includes both role-based and user-based search filters.
You might see a count mismatch in the events retrieved when searching
tsidx files. It is not possible to distinguish between indexed field tokens and raw tokens in
tsidx files. On the other hand, it is more explicit to run the
tstats on accelerated data models or from a
tscollect, where only the fields and values are stored and not the raw tokens.
Filtering with where
You can provide any number of aggregates (
aggregate-opt) to perform and also have the option of providing a filtering query using the WHERE keyword. This query looks like a normal query you would use in the search processor. This supports all the same time arguments as search, such as earliest=-1y.
Grouping by _time
You can provide any number of GROUPBY fields. If you are grouping by
_time, supply a timespan with
span for grouping the time buckets, for example
...BY _time span=1h or
...BY _time span=3d.
Example 1: Gets the count of all events in the
| tstats count FROM mydata
Example 2: Returns the average of the field
mydata, specifically where
value2 and the value of
baz is greater than 5.
| tstats avg(foo) FROM mydata WHERE bar=value2 baz>5
Example 3: Gives the count by source for events with host=x.
| tstats count WHERE host=x BY source
Example 4: Gives a timechart of all the data in your default indexes with a day granularity.
| tstats prestats=t count BY _time span=1d | timechart span=1d count
Example 5: Use prestats mode in conjunction with append to compute the median values of foo and bar, which are in different namespaces.
| tstats prestats=t median(foo) FROM mydata | tstats prestats=t append=t median(bar) FROM otherdata | stats median(foo) median(bar)
Example 6: Uses the
summariesonly argument to get the time range of the summary for an accelerated data model named
| tstats summariesonly=t min(_time) AS min, max(_time) AS max FROM datamodel=mydm | eval prettymin=strftime(min, "%c") | eval prettymax=strftime(max, "%c")
Example 7: Uses
summariesonly in conjunction with
timechart to reveal what data has been summarized over the past hour for an accelerated data model titled
| tstats summariesonly=t prestats=t count FROM datamodel=mydm BY _time span=1h | timechart span=1h count
Example 8: Uses the
values statistical function to provide lists of values for each field returned by the "Splunk's Internal Server Logs" data model.
| tstats values FROM datamodel=internal_server.server
Example 9: Uses the
values statistical function to provide lists of values for each field returned by the Alerts dataset within the "Splunk's Internal Server Logs" data model.
| tstats values FROM datamodel=internal_server where nodename=server.scheduler.alerts
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the tstats command.
This documentation applies to the following versions of Splunk Cloud™: 6.5.0, 6.5.1, 6.5.1612, 6.6.0, 6.6.1, 6.6.3