tstats command to perform statistical queries on indexed fields in tsidx files. The indexed fields can be from normal index data, tscollect data, or accelerated data models.
The required syntax is in bold.
- | tstats
- [chunk_size=<unsigned int>]
- [FROM ( <namespace> | sid=<tscollect-job-id> | datamodel=<data_model_name>.<root_dataset_name> [where nodename = <root_dataset_name>.<...>.<target_dataset_name>)]]
- [WHERE <search-query> | <field> IN (<value-list>)]
- [BY (<field-list> | (PREFIX(<field>))) [span=<timespan>] ]
- Syntax: (count [<field>] | <function>(PREFIX(<string>) | <field>))... [AS <string>]
- Description: Either perform a basic count of a field or perform a function on a field. For a list of the supported functions for the
tstatscommand, refer to the table below. You must specify one or more functions. You can apply the function to a field, or to a
PREFIX()directive if you want to aggregate a raw segment in your indexed events as if it were an extracted field-value pair. You can also rename the result using the
ASkeyword, unless you are in prestats mode (
- You cannot specify functions without applying them to fields or
evalexpressions that resolve into fields. You cannot use wildcards to specify field names.
- See Usage to learn more about using
PREFIX(), and about searches you can run to find raw segments in your data.
- The following table lists the supported functions by type of function. Use the links in the table to see descriptions and examples for each function. For an overview about using functions with commands, see Statistical and charting functions.
Type of function Supported functions and syntax Aggregate functions
Event order functions
Multivalue stats and chart functions
- Syntax: append=<bool>
- Description: When in prestats mode (
append=truewhere the prestats results append to existing results, instead of generating them.
- Default: false
- Syntax: allow_old_summaries=true | false
- Description: Only applies when selecting from an accelerated data model. To return results from summary directories only when those directories are up-to-date, set this parameter to false. If the data model definition has changed, summary directories that are older than the new definition are not used when producing output from tstats. This default ensures that the output from tstats will always reflect your current configuration. When set to true, tstats will use both current summary data and summary data that was generated prior to the definition change. Essentially this is an advanced performance feature for cases where you know that the old summaries are "good enough".
- Default: false
- Syntax: chunk_size=<unsigned_int>
- Description: Advanced option. This argument controls how many events are retrieved at a time from a single
tsidxfile when the Splunk software processes searches. Lower this setting from its default only when you find a particular
tstatssearch is using too much memory, or when it infrequently returns events. This can happen when a search groups by excessively high-cardinality fields (fields with very large amounts of distinct values). In such situations, a lower
chunk_sizevalue can make
tstatssearches more responsive, but potentially slower to complete. A higher
chunk_size, on the other hand, can help long-running searches to complete faster, with the potential tradeoff of causing the search to be less responsive. For
chunk_sizecannot be set lower than 10000.
- Default: 10000000 (10 million)
The default value for the the
chunk_sizeargument is set by the
chunk_sizesetting for the
limits.conf. If you have Splunk Cloud Platform, file a Support ticket to change this setting.
- Description: This argument sets a user-specified value that the
tstatscommand substitutes for null values for any field within its group-by field list. Null values include field values that are missing from a subset of the returned events as well as field values that are missing from all of the returned events. If you do not provide a
tstatsomits rows for events with one or more null field values from its results.
- Default: no default value
- Syntax: include_reduced_buckets=true | false
- Description: This setting only applies when
indexes.conf. When set to false, the
tstatscommand generates results only from index buckets that are not reduced. Set to
trueif you want
tstatsto use results from reduced buckets.
- Default: false
- Syntax: local=true | false
- Description: If true, forces the processor to be run only on the search head.
- Default: false
- Syntax: prestats=true | false
- Description: Specifies whether to use the prestats format. The prestats format is a Splunk internal format that is designed to be consumed by commands that generate aggregate calculations. When using the prestats format you can pipe the data into the chart, stats, or timechart commands, which are designed to accept the prestats format. When
prestats=true, AS instructions are not relevant. The field names for the aggregates are determined by the command that consumes the prestats format and produces the aggregate output.
- Default: false
- Syntax: summariesonly=<bool>
- Description: Only applies when selecting from an accelerated data model. When false, generates results from both summarized data and data that is not summarized. For data not summarized as TSIDX data, the full search behavior will be used against the original index data. If set to true, 'tstats' will only generate results from the TSIDX data that has been automatically generated by the acceleration and non-summarized data will not be provided.
- Default: false
FROM clause arguments
The FROM clause is optional. You can specify either a
sid, or a
datamodel. See Selecting data for more information about this clause.
- Syntax: <string>
- Description: Define a location for the tsidx file with
$SPLUNK_DB/tsidxstats. If you have Splunk Enterprise, you can configure this location by editing the local version of the
indexes.conffile and setting the
tsidxStatsHomePathattribute. See How to edit a configuration file in the Admin manual.
- Syntax: sid=<tscollect-job-id>
- Description: The job ID string of a tscollect search (that generated tsidx files).
- Syntax: datamodel=<data_model_name>.<root_dataset_name> [where nodename = <root_dataset_name>.<...>.<target_dataset_name>]
- Description: The name of a data model, concatenated with the name of the root dataset that you are searching. If you wish to filter on a child dataset, you need to use a
whereclause that uses
nodenameto reference a specific child dataset in a dataset hierarchy in the data model. See Selecting data for more information.
WHERE clause arguments
The WHERE clause is optional. This clause is used as a filter. You can specify either a search or a field and a set of values with the IN operator.
- Specify search criteria to filter on.
- <field> IN (<value-list>)
- For the
field, specify a list of values to include in the search results.
BY clause arguments
The BY clause is optional. You cannot use wildcards in the
BY clause with the
tstats command. See Usage. If you use the BY clause, you must specify a
field-list. You can also specify a
- Syntax: <field>, ...
- Description: Specify one or more fields to group results.
- Syntax: PREFIX(<string>)
- Description: Specify a raw segment in your indexed events that you want to split by as if it were an extracted field-value pair. See Usage for more information about the
PREFIX()directive, and for a search you can run to find raw segments in your indexed data.
- Syntax: span=<timespan>
- Description: The span of each time bin. If you use the BY clause to group by
_time, use the
spanargument to group the time buckets. You can specify timespans such as
BY _time span=1hor
BY _time span=5d. If you do not specify a <timespan>, the default is
auto, which means that the number of time buckets adjusts to produce a reasonable number of results. For example if initially seconds are used for the <timespan> and too many results are being returned, the <timespan> is changed to a longer value, such as minutes, to return fewer time buckets.
- Default: auto
- Syntax: auto | <int><timescale>
- Syntax: <sec> | <min> | <hr> | <day> | <month>
- Description: Time scale units. For the
tstatscommand, the <timescale> does not support subseconds.
- Default: sec
Time scale Syntax Description <sec> s | sec | secs | second | seconds Time scale in seconds. <min> m | min | mins | minute | minutes Time scale in minutes. <hr> h | hr | hrs | hour | hours Time scale in hours. <day> d | day | days Time scale in days. <month> mon | month | months Time scale in months.
Generating commands use a leading pipe character and should be the first command in a search, except when
tstats command to perform statistical queries on indexed fields in
tsidx files. You can select the data for the indexed fields in several ways.
- Normal index data
- Use a FROM clause to specify a namespace, search job ID, or data model. If you do not specify a FROM clause, the Splunk software selects from index data in the same way as the
searchcommand. You are restricted to selecting data from your allowed indexes by user role. You control exactly which indexes you select data from by using the WHERE clause. If no indexes are mentioned in the WHERE clause, the Splunk software uses the default indexes. By default, role-based search filters are applied, but can be turned off in the limits.conf file.
- Data manually collected with the tscollect command
- You can select data from your namespace by specifying
FROM <namespace>. If you did not specify a namespace with the
tscollectcommand, the data is collected into the dispatch directory of that job. If the data is in the dispatch directory, you select the data by specifying
- An accelerated data model
- You can select data from a high-performance analytics store, which is a collection of
.tsidxdata summaries, for an accelerated data model. You can select data from this accelerated data model by using
- When you select a data model for a
tstatssearch, you also have to select the root dataset within that data model that you intend to search. You cannot select all of the root datasets within a data model at once.
Search filters cannot be applied to accelerated data models. This includes both role-based and user-based search filters.
- A child dataset in an accelerated data model
- You can select data from a child dataset within an accelerated data model. Use a
WHEREclause to specify the
nodenameof the child dataset. The
nodenameargument indicates where the target dataset is in the data model hierarchy. The syntax looks like this:
...| tstats <stats-func> FROM datamodel=<data_model_name>.<root_dataset_name> where nodename=<root_dataset_name>.<...>.<target_dataset_name>
- For example, say you have a data model with three root datasets, each with their own dataset hierarchies.
ButtercupGamesPromos - NYC (BaseEvent) - TShirtStore (NYC) - FashionShows (TShirtStore) - Giveaways (TShirtStore) - Chicago (BaseEvent) - BeerAndBrautsPopup (Chicago) - BeerSales (BeerAndBrautsPopup) - BrautSales (BeerAndBrautsPopup) - Tokyo (BaseSearch) - GiantRobotBattles (Tokyo) - UFORobotGrendizer (GiantRobotBattles) - MechaGodzilla (GiantRobotBattles)
- With this hierarchy, if you wanted run a
tstatssearch that selects from the dataset containing records of the MechaGodzilla giant robot battles staged by the Tokyo office, you would use the following search:
... | tstats count FROM datamodel=ButtercupGamesPromos.Tokyo where nodename=Tokyo.GiantRobotBattles.MechaGodzilla
Search filters cannot be applied to accelerated data model datasets. This includes both role-based and user-based search filters.
- You might see a count mismatch in the events retrieved when searching
tsidxfiles. It is not possible to distinguish between indexed field tokens and raw tokens in
tsidxfiles. On the other hand, it is more explicit to run the
tstatscommand on accelerated data models or from a
tscollectcommand, where only the fields and values are stored and not the raw tokens.
Limitations of CIDR matching with tstats
As with the
search command, you can use the
tstats command to filter events with CIDR match on fields that contain IPv4 and IPv6 addresses. However, unlike the
search command, the
tstats command may not correctly filter strings containing non-numeric wildcard octets. As a result, your searches may return unpredictable results.
If you are filtering fields with a CIDR match using the
tstats command in a BY clause, you can work around this issue and correctly refilter your results by appending your search with a
regex command, or WHERE clause. Unfortunately, you can't use this workaround if the search doesn't include the filtered field in a BY clause.
Example of using CIDR match with tstats in a BY clause
Let's take a look at an example of how you could use CIDR match with the
tstats command in a BY clause. Say you create a file called data.csv containing the following lines:
ip,description 18.104.22.168,"An IP address" 22.214.171.124,"Another IP address" this.is.a.hostname,"A hostname" this.is.another.hostname,"Another hostname"
Then follow these steps:
- Upload the file and set the sourcetype to
csv, which ensures that all fields in the file are indexed as required by the
- Run the following search against the index you specified when you uploaded the file. This example uses the main index.
| tstats count where index=main source=*data.csv ip="0.0.0.0/0" by ip
The results look like this:
Even though only two addresses are legitimate IP addresses, all four rows of addresses are displayed in the results. Invalid IP addresses are displayed along with the valid IP addresses because the
tstats command uses string matching to satisfy search requests and doesn't directly support IP address-based searches. The
tstats command does its best to return the correct results for CIDR search clauses, but the tstats search may return more results than you want if the source data contains mixed IP and non-IP data such as host names.
To make sure your searches only return the results you want, make sure that your data set is clean and only contains data in the correct format. If that is not possible, use the
search command or WHERE clause to do post-filtering of the search results. For example, the following search using the
search command displays correct results because the piped
search command further filters the results from the
| tstats count where index=main source=*data.csv ip="0.0.0.0/0" by ip
| search ip="0.0.0.0/0"
Alternatively, you can use the WHERE clause to filter your results, like this.
| tstats count where index=main source=*data.csv ip="0.0.0.0/0" by ip
| WHERE cidrmatch("0.0.0.0/0", ip)
Both of these searches using the
search command and the WHERE clause return only the valid IP addresses in the results, which look like this:
tstats command does not support wildcard characters in field values in aggregate functions or BY clauses.
For example, you cannot specify
| tstats avg(foo*) or
| tstats count WHERE host=x BY source*.
Aggregate functions include
sum(). For more information, see Aggregate functions.
Any results returned where the aggregate function or BY clause includes a wildcard character are only the most recent few minutes of data that has not been summarized. Include the
summariesonly=t argument with your
tstats command to return only summarized data.
Statistical functions must have named fields
With the exception of
tstats command supports only statistical functions that are applied to fields or
eval expressions that resolve into fields. For example, you cannot specify
| tstats sum or
| tstats sum(). Instead the
tstats syntax requires that at least one field argument be provided for the function:
| tstats sum(<field>).
Nested eval expressions not supported
You cannot use
eval expressions inside aggregate functions with the
| tstats count(eval(...)) is not supported.
While nested eval expressions are supported with the
stats command, they are not supported with the
Functions and memory usage
Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the
distinct_count function requires far more memory than the
count function. The
list functions also can consume a lot of memory.
If you are using the
distinct_count function without a split-by field or with a low-cardinality split-by by field, consider replacing the
distinct_count function with the the
estdc function (estimated distinct count). The
estdc function might result in significantly lower memory usage and run times.
Use PREFIX() to aggregate or group by raw tokens in indexed data
PREFIX() directive allows you to search on a raw segment in your indexed data as if it were an extracted field. This causes the search to run over the
tsidx file in your indexers rather than the log line. This is a practice that can significantly reduce the CPU load on your indexers.
PREFIX() directive is similar to the
TERM() directives in that it matches strings in your raw data. You can use
PREFIX() to locate a recurring segment in your raw event data that is actually a key-value pair separated by a delimiter that is also a minor breaker, like
:. You give
PREFIX() the text that precedes the value—the "prefix"—and then it returns the values that follow the prefix. This enables you to group by those values and aggregate them with
tstats functions. The values can be strings or purely numeric.
For example, say you have indexed segments in your event data that look like
kbps=333. You can isolate the numerical values in these segments and perform aggregations or group-by operations on them by using the
PREFIX() directive to identify
kbps= as a common prefix string. Run a
tstats search with
PREFIX(kbps=) against your event data and it will return
333. These values are perfect for
tstats aggregation functions that require purely numeric input.
Notice that in this example you need to include the
= delimiter. If you run
PREFIX(kbps), the search returns
=333. Efforts to aggregate on such results may return unexpected results, especially if you are running them through aggregation functions that require purely numeric values.
The text you provide for the
PREFIX() directive must be in lower case. For example, the
tstats search processor will fail to process
PREFIX(connectiontype=) instead. It will still match
connectionType= strings in your events.
The Splunk software separates events into raw segments when it indexes data, using rules specified in
segmenters.conf. You can run the following search to identify raw segments in your indexed events:
| walklex index=<target-index> type=term | stats sum(count) by term
You cannot apply the
PREFIX() directive to segment prefixes and values that contain major breakers such as spaces, square or curly brackets, parentheses, semicolons, or exclamation points.
For more information about the
TERM() directives, see Use CASE() and TERM() to match phrases in the Search Manual.
For more information about the segmentation of indexed events, see About event segmentation in Getting Data In
For more information about minor and major breakers in segments, see Event segmentation and searching in the Search Manual.
Memory and tstats search performance
A pair of
limits.conf settings strike a balance between the performance of
tstats searches and the amount of memory they use during the search process, in RAM and on disk. If your
tstats searches are consistently slow to complete you can adjust these settings to improve their performance, but at the cost of increased search-time memory usage, which can lead to search failures.
If you have Splunk Cloud Platform, you need to file a Support ticket to change these settings.
For more information, see Memory and stats search performance in the Search Manual.
Complex aggregate functions
tstats command does not support complex aggregate functions such as
Consider the following query. This query will not return accurate results because complex aggregate functions are not supported by the
| tstats summariesonly=false values(Authentication.tag) as tag,
values(Authentication.app) as app,
count(eval('Authentication.action'=="failure")) as failure,
as success from datamodel=Authentication by Authentication.src
| search success>0 |
where failure > 5
Instead, separate out the aggregate functions from the eval functions, as shown in the following search.
| tstats `summariesonly` values(Authentication.app) as app,
count from datamodel=Authentication.Authentication by Authentication.action, Authentication.src
| eval success=if(action="success",count,0), failure=if(action="failure",count,0)
| stats values(app) as app, sum(failure) as failure, sum(success) as success by src
You can generate sparkline charts with the
tstats command only if you specify the
_time field in the BY clause and use the
stats command to generate the actual sparkline. For example:
| tstats count from datamodel=Authentication.Authentication BY _time, Authentication.src span=1h
| stats sparkline(sum(count),1h) AS sparkline, sum(count) AS count BY Authentication.src
Filtering with WHERE
You can provide any number of aggregates (
aggregate-opt) to perform and also have the option of providing a filtering query using the WHERE keyword. This query looks like a normal query you would use in the search processor. This supports all the same time arguments as search, such as earliest=-1y.
Grouping by _time
You can provide any number of BY fields. If you are grouping by
_time, supply a timespan with
span for grouping the time buckets, for example
...BY _time span=1h or
...BY _time span=3d.
Tstats and tsidx bucket reduction
tstats searches over indexes that have undergone tsidx bucket reduction will return incorrect results.
For more information see Reduce tsidx disk usage in Managing indexers and clusters of indexers.
Multiple time ranges
tstats command is unable to handle multiple time ranges. This is because the
tstats command is a generating command and doesn't perform post-search filtering, which is required to return results for multiple time ranges.
The following example of a search using the
tstats command on events with relative times of 5 seconds to 1 second in the past displays a warning that the results may be incorrect because the
tstats command doesn't support multiple time ranges.
| tstats count where index="_internal" (earliest =-5s latest=-4s) OR (earliest=-3s latest=-1s)
If you want to search events in multiple time ranges, use another command such as
stats, or use multiple
tstats commands with
append as shown in the following example.
| tstats prestats=t count where index=_internal earliest=-5s latest=-4s
| tstats prestats=t append=true count where index=_internal earliest=-3s latest=-2s
| stats count
The results in this example look something like this.
Example 1: Gets the count of all events in the
| tstats count FROM mydata
Example 2: Returns the average of the field
mydata, specifically where
value2 and the value of
baz is greater than 5.
| tstats avg(foo) FROM mydata WHERE bar=value2 baz>5
Example 3: Gives the count by source for events with host=x.
| tstats count WHERE host=x BY source
Example 4: Gives a timechart of all the data in your default indexes with a day granularity.
| tstats prestats=t count BY _time span=1d
| timechart span=1d count
Example 5: Use prestats mode in conjunction with append to compute the median values of foo and bar, which are in different namespaces.
| tstats prestats=t median(foo) FROM mydata
| tstats prestats=t append=t median(bar) FROM otherdata
| stats median(foo) median(bar)
Example 6: Uses the
summariesonly argument to get the time range of the summary for an accelerated data model named
| tstats summariesonly=t min(_time) AS min, max(_time) AS max FROM datamodel=mydm
| eval prettymin=strftime(min, "%c")
| eval prettymax=strftime(max, "%c")
Example 7: Uses
summariesonly in conjunction with
timechart to reveal what data has been summarized over the past hour for an accelerated data model titled
| tstats summariesonly=t prestats=t count FROM datamodel=mydm BY _time span=1h
| timechart span=1h count
Example 8: Uses the
values statistical function to provide a list of all distinct values for
source returned by the "Splunk's Internal Server Logs" data model. The list is returned as a multivalue entry.
| tstats values(source) FROM datamodel=internal_server
The results look something like this:
If you don't have the
internal_server data model defined, check under Settings->Data models for a list of the data models you have access to.
Example 9: Uses the
values statistical function to provide provide a list of all distinct values for
source returned by the Alerts dataset within the "Splunk's Internal Server Logs" data model.
| tstats values(source) FROM datamodel=internal_server where nodename=server.scheduler.alerts
Example 10: Gets the count and average of a raw, unindexed term using the
kbps=, then splits this by an indexed source and another unindexed term using the
| tstats count avg(PREFIX(kbps=)) where index=_internal by source PREFIX(group=)
This documentation applies to the following versions of Splunk® Enterprise: 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6