Use reporting commands
You can add reporting commands directly to a search string to help with the production of reports and the summarizing of search results.
A reporting command primer
This subsection covers the major categories of reporting commands and provides examples of how they can be used in a search.
The primary reporting commands are:
chart: used to create charts that can display any series of data that you want to plot. You can decide what field is tracked on the x-axis of the chart.
timechart: used to create "trend over time" reports, which means that
_timeis always the x-axis.
top: generates charts that display the most common values of a field.
rare: creates charts that display the least common values of a field.
streamstats: generate reports that display summary statistics
diff: create reports that enable you to see associations, correlations, and differences between fields in your data.
Note: As you'll see in the following examples, you always place your reporting commands after your search commands, linking them with a pipe operator ("|").
streamstats are all designed to work in conjunction with statistical functions. The list of available statistical functions includes:
- count, distinct count
- mean, median, mode
- min, max, range, percentiles
- standard deviation, variance
- first occurrence, last occurrence
To find more information about statistical functions and how they're used, see "Functions for stats, chart, and timechart" in the Search Reference Manual. Some statistical functions only work with the
Note: All searches with reporting commands generate specific structures of data. The different chart types available in Splunk require these data structures to be set up in particular ways. For example not all searches that enable the generation of bar, column, line, and area charts also enable the generation of pie charts. Read the "Chart data structure requirements" subtopic of the "Chart gallery" topic in this manual to learn more.
Creating time-based charts
Use the timechart reporting command to create useful charts that display statistical trends over time, with time plotted on the x-axis of the chart. You can optionally split data by another field, meaning that each distinct value of the "split by" field is a separate series in the chart. Typically these reports are formatted as line or area charts, but they can also be column charts.
For example, this report uses internal Splunk log data to visualize the average indexing thruput (indexing kbps) of Splunk processes over time, broken out by processor:
index=_internal "group=thruput" | timechart avg(instantaneous_eps) by processor
Creating charts that are not (necessarily) time-based
Use the chart reporting command to create charts that can display any series of data. Unlike the
timechart command, charts created with the
chart command use an arbitrary field as the x-axis. You use the
over keyword to determine what field takes the x-axis.
over keyword is specific to the
chart command. You won't use it with
timechart, for example, because the
_time default field is already being used as the x-axis.
For example, the following report uses web access data to show you the average count of unique visitors over each weekday.
index=sampledata sourcetype=access* | chart avg(clientip) over date_wday
You can optionally split data by another field, meaning that each distinct value of the "split by" field is a separate series in the chart. If your search includes a "split by" clause, place the
over clause before the "split by" clause.
The following report generates a chart showing the sum of kilobytes processed by each
clientip within a given timeframe, split by
host. The finished chart shows the
kb value taking the y-axis while
clientip takes the x-axis. The delay value is broken out by host. You might want to use the Report Builder to format this report as a stacked bar chart.
index=sampledata sourcetype=access* | chart sum(kb) over clientip by host
Another example: say you want to create a stacked bar chart that splits out the http and https requests hitting your servers. To do this you would first create
ssl_type, a search-time field extraction that contains the inbound port number or the incoming URL request, assuming that is logged. The finished search would look like this:
sourcetype=whatever | chart count over ssl_type
Again, you can use the Report Builder to format the results as a stacked bar chart.
Visualizing the highs and lows
This set of commands generates a report that sorts through firewall information to show you a list of the top 100 destination ports used by your system:
index=sampledata | top limit=100 dst_port
This string, on the other hand, utilizes the same set of firewall data to generate a report that shows you the source ports with the lowest number of denials. If you don't specify a limit, the default number of values displayed in a
rare is ten.
index=sampledata action=Deny | rare src_port
A more complex example of the top command
Say you're indexing an alert log from a monitoring system, and you have two fields:
msgis the message, such as
CPU at 100%.
mc_hostis the host that generates the message, such as
How do you get a report that displays the top
msg and the values of
mc_host that sent them, so you get a table like this:
|Messages by mc_host|
|CPU at 100%|
|Log File Alert|
To do this, set up a search that finds the top
limit=1 to only return one) and then
sort by the message
count in descending order:
source="mcevent.csv" | top limit=1 msg by mc_host | sort -count
Create reports that display summary statistics
To fully utilize the
stats command, you need to include a "split by" clause. For example, the following report won't provide much information:
sourcetype=access_combined | stats avg(kbps)
It gives you the average of
kbps for all events with a sourcetype of
access_combined--a single value. The resulting column chart contains only one column.
But if you break out the report with a split by field, Splunk generates a report that breaks down the statistics by that field. The following report generates a column chart that sorts through the
access_combined logs to get the average thruput (kbps), broken out by host:
sourcetype=access_combined | stats avg(kbps) by host
Here's a slightly more sophisticated example of the
stats command, in a report that shows you the CPU utilization of Splunk processes sorted in descending order:
index=_internal "group=pipeline" | stats sum(cpu_seconds) by processor | sort sum(cpu_seconds) desc
eventstats command works in exactly the same manner as the
stats command, except that the aggregation results of the command are added inline to each event, and only the aggregations that are pertinent to each event.
You specify the field name for the
eventstats results by adding the
as argument. So the first example above could be restated with "avgkbps" being the name of the new field that contains the results of the
eventstats avg(kbps) operation:
sourcetype=access_combined | eventstats avg(kbps) as avgkbps by host
When you run this set of commands, Splunk adds a new
avgkbps field to each
sourcetype=access_combined event that includes the
kbps field. The value of
avgkbps is the average kbps for that event.
In addition, Splunk uses that set of commands to generate a chart displaying the average kbps for all events with a sourcetype of
access_combined, broken out by host.
Look for associations, statistical correlations, and differences in search results
The associate reporting command identifies events that are associated with each other through field/field value pairs. For example, if one event has a
referer_domain of "http://www.google.com/" and another event has a
referer_domain with the same URL value, then they are associated.
You can "tune" the results gained by the
associate command with the supcnt, supfreq, and improv arguments. For more information about these arguments see the Associate page in the Search Reference.
For example, this report searches the access sourcetypes and identifies events that share at least three field/field-value pair associations:
sourcetype=access* | associate supcnt=3
The correlate reporting command calculates the statistical correlation between fields. It uses the
cocur operation to calculate the percentage of times that two fields exist in the same set of results.
The following report searches across all events where
eventtype=goodaccess, and calculates the co-occurrence correlation between all of those fields.
eventtype=goodaccess | correlate type=cocur
Use the diff reporting command to compare the differences between two search results. By default it compares the raw text of the search results you select, unless you use the attribute argument to focus on specific field attributes.
For example, this report looks at the 44th and 45th events returned in the search and compares their ip address values:
eventtype=goodaccess | diff pos1=44 pos2=45 attribute=ip
About reports, dashboards, and data visualizations
This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7