Search Reference

 


tstats

tstats

Use the tstats command to perform statistical queries on indexed fields in tsidx files, which could come from normal index data, tscollect data, or accelerated datamodels.

Synopsis

Performs statistics on indexed fields in tsidx files.

Syntax

tstats [prestats=<bool>] [local=<bool>] [append=<bool>] [summariesonly=<bool>] [allow_old_summaries=<bool>] [chunk_size=<unsigned int>] <stats-func> [ FROM ( <namespace> | sid=<tscollect-job-id> | datamodel=<datamodel-name> )] [WHERE <search-query>] [( by | GROUPBY ) <field-list> [span=<timespan>] ]

Required arguments

<stats-func>
Syntax: count(<field>) | ( avg | dc | earliest | estdc | exactperc | first | last | latest | median | max | min | mode | perc | p | range | stdev | stdevp | sum | sumsq | upperperc | values | var | varp )(<field>) [AS <string>]
Description: Either perform a basic count of a field or perform a function on a field. You can provide any number of aggregates to perform. You can also rename the result using 'AS', unless you are in prestats mode. For the complete list of functions with examples, see "Functions for stats".
namespace
Syntax: <string>
Description: Define a location for the tsidx file with $SPLUNK_DB/tsidxstats. This namespace location is also configurable in indexes.conf, with the attribute tsidxStatsHomePath.
sid
Syntax: sid=<tscollect-job-id>
Description: The job ID string of a tscollect search (that generated tsidx files).
datamodel
Syntax: datamodel=<datamodel-name>
Description: The name of an accelerated data model.

Optional arguments

append
Syntax: append=<bool>
Description: When in prestats mode (prestats=t), enables append=t where the prestats results append to existing results, instead of generating them.
allow_old_summaries
Syntax: allow_old_summaries=true | false
Description: Only applies when selecting from an accelerated datamodel. When false, Splunk only provides results from summary directories when those directories are up-to-date. That is, if the datamodel definition has changed, those summary directories which are older than the new definition are not used when producing output from tstats. This default ensures that the output from tstats will always reflect your current configuration. When set to true, tstats will use both current summary data and summary data that was generated prior to the definition change. Essentially this is an advanced performance feature for cases where you know that the old summaries are "good enough". Defaults to false.
chunk_size
Syntax: chunk_size=<unsigned_int>
Description: Advanced option. This argument controls how many events are retrieved at a time within a single TSIDX file when answering queries. Only consider supplying a lower value for this if you find a particular query is using too much memory. The case that could cause this would be an excessively high cardinality split-by, such as grouping by several fields that have a very large amount of distinct values. Setting this value too low can negatively impact the overall run time of your query. Defaults to 10000000.
local
Syntax: local=true | false
Description: If true, forces the processor to be run only on the search head. Defaults to false.
prestats
Syntax: prestats=true | false
Description: Use this to output the answer in prestats format, which enables you to pipe the results to a different type of processor, such as chart or timechart, that takes prestats output. This is very useful for creating graph visualizations. Defaults to false.
summariesonly
Syntax: summariesonly=<bool>
Description: Only applies when selecting from an accelerated datamodel. When false, generates results from both summarized data and data that is not summarized. For data not summarized as TSIDX data, the full search behavior will be used against the original index data. If set to true, 'tstats' will only generate results from the TSIDX data that has been automatically generated by the acceleration and non-summarized data will not be provided. Defaults to false.
<field-list>
Syntax: <field>, <field>, ...
Description: Specify a list of fields to group results.

Description

The tstats command is a generating processor, so it must be the first command in a search pipeline except in append mode (append=t).

Use the tstats command to perform statistical queries on indexed fields in tsidx fields. You can select from data in several different ways:

1. Normal index data: If you do not supply a FROM clause (to specify a namespace, search job ID, or datamodel), Splunk selects from index data in the same way as search. You are restricted to selecting from your allowed indexes by role, and you can control exactly which indexes you select from in the WHERE clause. If no indexes are mentioned in the WHERE clause search, Splunk uses the default index(es). By default, role-based search filters are applied, but can be turned off in limits.conf.

2. Data manually collected with tscollect: Select from your namespace with FROM <namespace>. If you didn't supply a namespace to tscollect, the data was collected into the dispatch directory of that job. In that case, select from that data with FROM sid=<tscollect-job-id>.

3. A high-performance analytics store (collection of .tsidx data summaries) for an accelerated data model: Select from this accelerated data model with FROM datamodel=<datamodel-name>.

You might see a count mismatch in the events retrieved when searching tsidx files. This is because it's not possible to distinguish between indexed field tokens and raw tokens in tsidx files. On the other hand, it is more explicit to run tstats on accelerated datamodels or from a tscollect, where only the fields and values are stored and not the raw tokens.

Filtering with where

You can provide any number of aggregates (aggregate-opt) to perform and also have the option of providing a filtering query using the WHERE keyword. This query looks like a normal query you would use in the search processor. This supports all the same time arguments as search, such as earliest=-1y.

Grouping by _time

You can provide any number of GROUPBY fields. If you are grouping by _time, you should supply a timespan with span for grouping the time buckets. This timespan looks like any normal timespan in Splunk, such as span='1hr' or '3d'. It also supports 'auto'.

Examples

Example 1: Gets the count of all events in the mydata namespace.

| tstats count FROM mydata

Example 2: Returns the average of the field foo in mydata, specifically where bar is value2 and the value of baz is greater than 5.

| tstats avg(foo) FROM mydata WHERE bar=value2 baz>5

Example 3: Gives the count by source for events with host=x.

| tstats count where host=x by source

Example 4: Gives a timechart of all the data in your default indexes with a day granularity.

| tstats prestats=t count by _time span=1d | timechart span=1d count

Example 5: Use prestats mode in conjunction with append to compute the median values of foo and bar, which are in different namespaces.

| tstats prestats=t median(foo) from mydata | tstats prestats=t append=t median(bar) from otherdata | stats median(foo) median(bar)

Example 6: Uses the summariesonly argument to get the time range of the summary for an accelerated data model named mydm.

| tstats summariesonly=t min(_time) as min, max(_time) as max from datamodel=mydm | eval prettymin=strftime(min, "%c") | eval prettymax=strftime(max, "%c")

Example 7: Uses summariesonly in conjunction with timechart to reveal what data has been summarized over the past hour for an accelerated data model titled mydm.

| tstats summariesonly=t prestats=t count from datamodel=mydm by _time span=1h | timechart span=1h count

See also

stats, tscollect

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the tstats command.

This documentation applies to the following versions of Splunk: 6.1 , 6.1.1 , 6.1.2 , 6.1.3 , 6.1.4 , 6.1.5 , 6.2.0 , 6.2.1 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!