Use reporting commands
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Use reporting commands
You can add reporting commands directly to a search string to help with the production of reports and the summarizing of search results.
A reporting command primer
This subsection covers the major categories of reporting commands and provides examples of how they can be used in a search.
The primary reporting commands are:
-
chart: used to create charts that can display any series of data that you want to plot. You can decide what field is tracked on the x-axis of the chart. -
timechart: used to create "trend over time" reports, which means that_timeis always the x-axis. -
top: generates charts that display the most common values of a field. -
rare: creates charts that display the least common values of a field. -
stats,eventstats, andstreamstats: generate reports that display summary statistics -
associate,correlate, anddiff: create reports that enable you to see associations, correlations, and differences between fields in your data.
Note: As you'll see in the following examples, you always place your reporting commands after your search commands, linking them with a pipe operator ("|").
chart, timechart, stats, eventstats, and streamstats are all designed to work in conjunction with statistical functions. The list of available statistical functions includes:
- count, distinct count
- mean, median, mode
- min, max, range, percentiles
- standard deviation, variance
- sum
- first occurrence, last occurrence
To find more information about statistical functions and how they're used, see "Functions for stats, chart, and timechart" in the Search Reference Manual. Some statistical functions only work with the timechart command.
Note: All searches with reporting commands generate specific structures of data. The different chart types available in Splunk require these data structures to be set up in particular ways. For example not all searches that enable the generation of bar, column, line, and area charts also enable the generation of pie charts. Read the "Chart data structure requirements" subtopic of the "Chart gallery" topic in this manual to learn more.
Creating time-based charts
Use the timechart reporting command to create useful charts that display statistical trends over time, with time plotted on the x-axis of the chart. You can optionally split data by another field, meaning that each distinct value of the "split by" field is a separate series in the chart. Typically these reports are formatted as line or area charts, but they can also be column charts.
For example, this report uses internal Splunk log data to visualize the average indexing thruput (indexing kbps) of Splunk processes over time, broken out by processor:
index=_internal "group=thruput" | timechart avg(instantaneous_eps) by processorCreating charts that are not (necessarily) time-based
Use the chart reporting command to create charts that can display any series of data. Unlike the timechart command, charts created with the chart command use an arbitrary field as the x-axis. You use the over keyword to determine what field takes the x-axis.
Note: The over keyword is specific to the chart command. You won't use it with timechart, for example, because the _time default field is already being used as the x-axis.
For example, the following report uses web access data to show you the average count of unique visitors over each weekday.
index=sampledata sourcetype=access* | chart avg(clientip) over date_wdayYou can optionally split data by another field, meaning that each distinct value of the "split by" field is a separate series in the chart. If your search includes a "split by" clause, place the over clause before the "split by" clause.
The following report generates a chart showing the sum of kilobytes processed by each clientip within a given timeframe, split by host. The finished chart shows the kb value taking the y-axis while clientip takes the x-axis. The delay value is broken out by host. You might want to use the Report Builder to format this report as a stacked bar chart.
index=sampledata sourcetype=access* | chart sum(kb) over clientip by hostAnother example: say you want to create a stacked bar chart that splits out the http and https requests hitting your servers. To do this you would first create ssl_type, a search-time field extraction that contains the inbound port number or the incoming URL request, assuming that is logged. The finished search would look like this:
sourcetype=whatever | chart count over ssl_typeAgain, you can use the Report Builder to format the results as a stacked bar chart.
Visualizing the highs and lows
Use the top and rare reporting commands to create charts that display the most and least common values.
This set of commands generates a report that sorts through firewall information to show you a list of the top 100 destination ports used by your system:
index=sampledata | top limit=100 dst_portThis string, on the other hand, utilizes the same set of firewall data to generate a report that shows you the source ports with the lowest number of denials. If you don't specify a limit, the default number of values displayed in a top or rare is ten.
index=sampledata action=Deny | rare src_portA more complex example of the top command
Say you're indexing an alert log from a monitoring system, and you have two fields:
-
msgis the message, such asCPU at 100%. -
mc_hostis the host that generates the message, such aslog01.
How do you get a report that displays the top msg and the values of mc_host that sent them, so you get a table like this:
| Messages by mc_host |
| CPU at 100% |
| log01 |
| log02 |
| log03 |
| Log File Alert |
| host02 |
| host56 |
| host11 |
To do this, set up a search that finds the top message per mc_host (using limit=1 to only return one) and then sort by the message count in descending order:
source="mcevent.csv" | top limit=1 msg by mc_host | sort -countCreate reports that display summary statistics
Use the stats and eventstats reporting commands to generate reports that display summary statistics related to a field.
To fully utilize the stats command, you need to include a "split by" clause. For example, the following report won't provide much information:
sourcetype=access_combined | stats avg(kbps)It gives you the average of kbps for all events with a sourcetype of access_combined--a single value. The resulting column chart contains only one column.
But if you break out the report with a split by field, Splunk generates a report that breaks down the statistics by that field. The following report generates a column chart that sorts through the access_combined logs to get the average thruput (kbps), broken out by host:
sourcetype=access_combined | stats avg(kbps) by hostHere's a slightly more sophisticated example of the stats command, in a report that shows you the CPU utilization of Splunk processes sorted in descending order:
index=_internal "group=pipeline" | stats sum(cpu_seconds) by processor | sort sum(cpu_seconds) descThe eventstats command works in exactly the same manner as the stats command, except that the aggregation results of the command are added inline to each event, and only the aggregations that are pertinent to each event.
You specify the field name for the eventstats results by adding the as argument. So the first example above could be restated with "avgkbps" being the name of the new field that contains the results of the eventstats avg(kbps) operation:
sourcetype=access_combined | eventstats avg(kbps) as avgkbps by hostWhen you run this set of commands, Splunk adds a new avgkbps field to each sourcetype=access_combined event that includes the kbps field. The value of avgkbps is the average kbps for that event.
In addition, Splunk uses that set of commands to generate a chart displaying the average kbps for all events with a sourcetype of access_combined, broken out by host.
Look for associations, statistical correlations, and differences in search results
Use the associate, correlate and diff commands to find associations, similarities and differences among field values in your search results.
The associate reporting command identifies events that are associated with each other through field/field value pairs. For example, if one event has a referer_domain of "http://www.google.com/" and another event has a referer_domain with the same URL value, then they are associated.
You can "tune" the results gained by the associate command with the supcnt, supfreq, and improv arguments. For more information about these arguments see the Associate page in the Search Reference.
For example, this report searches the access sourcetypes and identifies events that share at least three field/field-value pair associations:
sourcetype=access* | associate supcnt=3The correlate reporting command calculates the statistical correlation between fields. It uses the cocur operation to calculate the percentage of times that two fields exist in the same set of results.
The following report searches across all events where eventtype=goodaccess, and calculates the co-occurrence correlation between all of those fields.
eventtype=goodaccess | correlate type=cocurUse the diff reporting command to compare the differences between two search results. By default it compares the raw text of the search results you select, unless you use the attribute argument to focus on specific field attributes.
For example, this report looks at the 44th and 45th events returned in the search and compares their ip address values:
eventtype=goodaccess | diff pos1=44 pos2=45 attribute=ipThis documentation applies to the following versions of Splunk: 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 View the Article History for its revisions.