Splunk® Enterprise

Search Reference

Download manual as PDF

Download topic as PDF

tstats

Description

Use the tstats command to perform statistical queries on indexed fields in tsidx files. The indexed fields can be from normal index data, tscollect data, or accelerated data models.

Syntax

| tstats [prestats=<bool>] [local=<bool>] [append=<bool>] [summariesonly=<bool>] [allow_old_summaries=<bool>] [chunk_size=<unsigned int>] <stats-func>... [ FROM ( <namespace> | sid=<tscollect-job-id> | datamodel=<data_model-name> )] [WHERE <search-query>] [BY <field-list> [span=<timespan>] ]

Required arguments

<stats-func>...
Syntax: count(<field>) | ( avg | dc | earliest | estdc | exactperc | first | last | latest | median | max | min | mode | perc | p | range | stdev | stdevp | sum | sumsq | upperperc | values | var | varp )(<field>) [AS <string>]
Description: Either perform a basic count of a field or perform a function on a field. You can provide any number of aggregates to perform. You can also rename the result using 'AS', unless you are in prestats mode. For the complete list of functions with examples, see "Statistical and charting functions".
namespace
Syntax: <string>
Description: Define a location for the tsidx file with $SPLUNK_DB/tsidxstats. If you have Splunk Enterprise, you can configure this location by editing indexes.conf and setting the tsidxStatsHomePath attribute.
sid
Syntax: sid=<tscollect-job-id>
Description: The job ID string of a tscollect search (that generated tsidx files).
datamodel
Syntax: datamodel=<data_model-name>
Description: The name of an accelerated data model.

Optional arguments

append
Syntax: append=<bool>
Description: When in prestats mode (prestats=t), enables append=t where the prestats results append to existing results, instead of generating them.
Default: false
allow_old_summaries
Syntax: allow_old_summaries=true | false
Description: Only applies when selecting from an accelerated data model. To return results from summary directories only when those directories are up-to-date, set this parameter to false. If the data model definition has changed, summary directories that are older than the new definition are not used when producing output from tstats. This default ensures that the output from tstats will always reflect your current configuration. When set to true, tstats will use both current summary data and summary data that was generated prior to the definition change. Essentially this is an advanced performance feature for cases where you know that the old summaries are "good enough".
Default: false
chunk_size
Syntax: chunk_size=<unsigned_int>
Description: Advanced option. This argument controls how many events are retrieved at a time within a single TSIDX file when answering queries. Only consider supplying a lower value for this if you find a particular query is using too much memory. The case that could cause this would be an excessively high cardinality split-by, such as grouping by several fields that have a very large amount of distinct values. Setting this value too low can negatively impact the overall run time of your query.
Default: 10000000
local
Syntax: local=true | false
Description: If true, forces the processor to be run only on the search head.
Default: false
prestats
Syntax: prestats=true | false
Description: Use this to output the answer in prestats format, which enables you to pipe the results to a different type of processor, such as chart or timechart, that takes prestats output. This is very useful for creating graph visualizations.
Default: false
summariesonly
Syntax: summariesonly=<bool>
Description: Only applies when selecting from an accelerated data model. When false, generates results from both summarized data and data that is not summarized. For data not summarized as TSIDX data, the full search behavior will be used against the original index data. If set to true, 'tstats' will only generate results from the TSIDX data that has been automatically generated by the acceleration and non-summarized data will not be provided.
Default: false
<field-list>
Syntax: <field>, ...
Description: Specify one or more fields to group results.

Usage

The tstats command is a generating command. Generating commands use a leading pipe character. The tstats command must be the first command in a search pipeline, except when (append=true).

Wildcard characters

The tstats command does not support wildcard characters in field values in aggregate functions or BY clauses.

For example, you cannot specify | tstats avg(foo*) or | tstats count WHERE host=x BY source*.

Samples of aggregate functions include avg(), count(), max(), min(), and sum().

Any results returned where the aggregate function or BY clause includes a wildcard character are only the most recent few minutes of data that has not been summarized. Include the summariesonly=t argument with your tstats command to return only summarized data.

Selecting data

Use the tstats command to perform statistical queries on indexed fields in tsidx files. You can select the data for the indexed fields in several ways.

Normal index data
Use a FROM clause to specify a namespace, search job ID, or data model. If you do not specify a FROM clause, the Splunk software selects from index data in the same way as the search command. You are restricted to selecting data from your allowed indexes by user role. You control exactly which indexes you select data from by using the WHERE clause. If no indexes are mentioned in the WHERE clause, the Splunk software uses the default indexes. By default, role-based search filters are applied, but can be turned off in the limits.conf file.
Data manually collected with the tscollect command
You can select data from your namespace by specifying FROM <namespace>. If you did not specify a namespace with the tscollect command, the data is collected into the dispatch directory of that job. If the data is in the dispatch directory, you select the data by specifying FROM sid=<tscollect-job-id>.
An accelerated data model
You can select data from a high-performance analytics store, which is a collection of .tsidx data summaries, for an accelerated data model. You can select data from this accelerated data model by using FROM datamodel=<data_model_name>.
An accelerated data model object
When you select data within an accelerated data model, you can further constrain your search by indicating an object within that data model that you want to select data from. You do this by using a where clause to indicate the nodename of the data model object. The nodename value indicates where the object is in a data model hierarchy.
When you use nodename in a search, you always use the following construction: FROM datamodel=<data_model_name> where nodename=<root_object_name>.<parent_object_name>.<...>.<target_object_name>.
For example, say you want to search on an object named scheduled_reports in your internal_server data model. In that data model, the scheduled_reports object is a child of the scheduler object, which in turn is a child of the server root event object. This means that you should represent the scheduled_report object in your search as nodename=server.scheduler.scheduled_reports.
If you run that search and decide you want to search on the contents of the scheduler data model object instead, you would use nodename=server.scheduler in your new search.

You might see a count mismatch in the events retrieved when searching tsidx files. It is not possible to distinguish between indexed field tokens and raw tokens in tsidx files. On the other hand, it is more explicit to run the tstats on accelerated data models or from a tscollect, where only the fields and values are stored and not the raw tokens.

Filtering with where

You can provide any number of aggregates (aggregate-opt) to perform and also have the option of providing a filtering query using the WHERE keyword. This query looks like a normal query you would use in the search processor. This supports all the same time arguments as search, such as earliest=-1y.

Grouping by _time

You can provide any number of GROUPBY fields. If you are grouping by _time, supply a timespan with span for grouping the time buckets, for example span='1hr' or '3d'. This parameter also supports 'auto'.

Examples

Example 1: Gets the count of all events in the mydata namespace.

| tstats count FROM mydata

Example 2: Returns the average of the field foo in mydata, specifically where bar is value2 and the value of baz is greater than 5.

| tstats avg(foo) FROM mydata WHERE bar=value2 baz>5

Example 3: Gives the count by source for events with host=x.

| tstats count WHERE host=x BY source

Example 4: Gives a timechart of all the data in your default indexes with a day granularity.

| tstats prestats=t count BY _time span=1d | timechart span=1d count

Example 5: Use prestats mode in conjunction with append to compute the median values of foo and bar, which are in different namespaces.

| tstats prestats=t median(foo) FROM mydata | tstats prestats=t append=t median(bar) FROM otherdata | stats median(foo) median(bar)

Example 6: Uses the summariesonly argument to get the time range of the summary for an accelerated data model named mydm.

| tstats summariesonly=t min(_time) AS min, max(_time) AS max FROM datamodel=mydm | eval prettymin=strftime(min, "%c") | eval prettymax=strftime(max, "%c")

Example 7: Uses summariesonly in conjunction with timechart to reveal what data has been summarized over the past hour for an accelerated data model titled mydm.

| tstats summariesonly=t prestats=t count FROM datamodel=mydm BY _time span=1h | timechart span=1h count

See also

stats, tscollect

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the tstats command.

PREVIOUS
tscollect
  NEXT
typeahead

This documentation applies to the following versions of Splunk® Enterprise: 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.4.0, 6.4.1, 6.4.2, 6.4.3 View the Article History for its revisions.


Comments

Just in case anyone else got confused by the nodename option for an accelerated data model object (like I did)
You need to reference any fields in that object with the whole tree name within the datamodel.
So to count the values for field_name in the server.scheduler.scheduled_reports object , you would reference it like this :-
tstats
count(server.scheduler.scheduled_reports.field_name)
FROM
datamodel=internal_server
where
nodename=server.scheduler.scheduled_reports

Hope that helps someone else, as it wasn't obvious to me from the documentation.

Kmugglet
August 7, 2016

Landen99 - The fields that are available to the tstats command are any indexed field. If you have access to the TSIDX files, you can use the walklex command from the CLI on a TSIDX file. You would look for "*::*" and the fields are on the left hand side of the double colon.

Here is a link to more information about the walklex command.

http://docs.splunk.com/Documentation/Splunk/6.2.0/Troubleshooting/CommandlinetoolsforusewithSupport

Lstewart splunk, Splunker
November 16, 2015

We need a list of fields available to tstats, please.

Landen99
November 12, 2015

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters