Use the tstats command to perform statistical queries on indexed fields in tsidx files, which could come from normal index data, tscollect data, or accelerated datamodels.
Performs statistics on indexed fields in tsidx files.
tstats [prestats=<bool>] [local=<bool>] [append=<bool>] [summariesonly=<bool>] [allow_old_summaries=<bool>] [chunk_size=<unsigned int>] <stats-func> [ FROM ( <namespace> | sid=<tscollect-job-id> | datamodel=<datamodel-name> )] [WHERE <search-query>] [( by | GROUPBY ) <field-list> [span=<timespan>] ]
- Syntax: count(<field>) | ( avg | dc | earliest | estdc | exactperc | first | last | latest | median | max | min | mode | perc | p | range | stdev | stdevp | sum | sumsq | upperperc | values | var | varp )(<field>) [AS <string>]
- Description: Either perform a basic count of a field or perform a function on a field. You can provide any number of aggregates to perform. You can also rename the result using 'AS', unless you are in prestats mode. For the complete list of functions with examples, see "Functions for stats".
- Syntax: <string>
- Description: Define a location for the tsidx file with
$SPLUNK_DB/tsidxstats. This namespace location is also configurable in
indexes.conf, with the attribute
- Syntax: sid=<tscollect-job-id>
- Description: The job ID string of a tscollect search (that generated tsidx files).
- Syntax: datamodel=<datamodel-name>
- Description: The name of an accelerated data model.
- Syntax: append=<bool>
- Description: When in prestats mode (
append=twhere the prestats results append to existing results, instead of generating them.
- Syntax: allow_old_summaries=true | false
- Description: Only applies when selecting from an accelerated datamodel. When false, Splunk only provides results from summary directories when those directories are up-to-date. That is, if the datamodel definition has changed, those summary directories which are older than the new definition are not used when producing output from tstats. This default ensures that the output from tstats will always reflect your current configuration. When set to true, tstats will use both current summary data and summary data that was generated prior to the definition change. Essentially this is an advanced performance feature for cases where you know that the old summaries are "good enough". Defaults to false.
- Syntax: chunk_size=<unsigned_int>
- Description: Advanced option. This argument controls how many events are retrieved at a time within a single TSIDX file when answering queries. Only consider supplying a lower value for this if you find a particular query is using too much memory. The case that could cause this would be an excessively high cardinality split-by, such as grouping by several fields that have a very large amount of distinct values. Setting this value too low can negatively impact the overall run time of your query. Defaults to 10000000.
- Syntax: local=true | false
- Description: If true, forces the processor to be run only on the search head. Defaults to false.
- Syntax: prestats=true | false
- Description: Use this to output the answer in prestats format, which enables you to pipe the results to a different type of processor, such as chart or timechart, that takes prestats output. This is very useful for creating graph visualizations. Defaults to false.
- Syntax: summariesonly=<bool>
- Description: Only applies when selecting from an accelerated datamodel. When false, generates results from both summarized data and data that is not summarized. For data not summarized as TSIDX data, the full search behavior will be used against the original index data. If set to true, 'tstats' will only generate results from the TSIDX data that has been automatically generated by the acceleration and non-summarized data will not be provided. Defaults to false.
- Syntax: <field>, <field>, ...
- Description: Specify a list of fields to group results.
tstats command is a generating processor, so it must be the first command in a search pipeline except in append mode (
Use the tstats command to perform statistical queries on indexed fields in tsidx fields. You can select from data in several different ways:
1. Normal index data: If you do not supply a FROM clause (to specify a namespace, search job ID, or datamodel), Splunk selects from index data in the same way as search. You are restricted to selecting from your allowed indexes by role, and you can control exactly which indexes you select from in the WHERE clause. If no indexes are mentioned in the WHERE clause search, Splunk uses the default index(es). By default, role-based search filters are applied, but can be turned off in limits.conf.
2. Data manually collected with tscollect: Select from your namespace with
FROM <namespace>. If you didn't supply a namespace to tscollect, the data was collected into the dispatch directory of that job. In that case, select from that data with
3. A high-performance analytics store (collection of
.tsidx data summaries) for an accelerated data model: Select from this accelerated data model with
You might see a count mismatch in the events retrieved when searching tsidx files. This is because it's not possible to distinguish between indexed field tokens and raw tokens in tsidx files. On the other hand, it is more explicit to run tstats on accelerated datamodels or from a
tscollect, where only the fields and values are stored and not the raw tokens.
Filtering with where
You can provide any number of aggregates (
aggregate-opt) to perform and also have the option of providing a filtering query using the WHERE keyword. This query looks like a normal query you would use in the search processor. This supports all the same time arguments as search, such as earliest=-1y.
Grouping by _time
You can provide any number of GROUPBY fields. If you are grouping by
_time, you should supply a timespan with
span for grouping the time buckets. This timespan looks like any normal timespan in Splunk, such as
'3d'. It also supports 'auto'.
Example 1: Gets the count of all events in the
| tstats count FROM mydata
Example 2: Returns the average of the field
mydata, specifically where
value2 and the value of
baz is greater than 5.
| tstats avg(foo) FROM mydata WHERE bar=value2 baz>5
Example 3: Gives the count by source for events with host=x.
| tstats count where host=x by source
Example 4: Gives a timechart of all the data in your default indexes with a day granularity.
| tstats prestats=t count by _time span=1d | timechart span=1d count
Example 5: Use prestats mode in conjunction with append to compute the median values of foo and bar, which are in different namespaces.
| tstats prestats=t median(foo) from mydata | tstats prestats=t append=t median(bar) from otherdata | stats median(foo) median(bar)
Example 6: Uses the
summariesonly argument to get the time range of the summary for an accelerated data model named
| tstats summariesonly=t min(_time) as min, max(_time) as max from datamodel=mydm | eval prettymin=strftime(min, "%c") | eval prettymax=strftime(max, "%c")
Example 7: Uses
summariesonly in conjunction with
timechart to reveal what data has been summarized over the past hour for an accelerated data model titled
| tstats summariesonly=t prestats=t count from datamodel=mydm by _time span=1h | timechart span=1h count
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the tstats command.