Admin Manual

 


Work with metrics.log

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Work with metrics.log

Splunk's internal metrics.log file is a sampling of the top ten items in each category for 30-second intervals, based on the size of _raw. It does not give you an exact accounting of all your inputs, just the top 10 hot data sources. You can examine its contents for limited analysis of volume trends for data inputs. It is different from the numbers reported by LicenseManager, which include the indexed fields. Also, the default configuration only maintains the metrics data in the internal index a few days, but by going to the files you can see trends over a period of months if your rolled files go that far back.

Note: You can change the number of series that metrics.log tracks from the default of 10 by editing the value of maxseries in the [metrics] stanza in limits.conf.

You can find more information about metrics.log in this developer blog posting about forwarder and indexer metrics and in this more recent post on the Splunk Community Wiki.

Use metrics log to troubleshoot issues with data inputs

You might want to identify a data input that has suddenly begun to generate uncharacteristically large numbers of events. If this input is hidden in a large quantity of similar data, it can be difficult to determine which one is actually the problem. You can find it by searching the internal index (add index=_internal to your search) or just look in metrics.log itself in $SPLUNK_HOME/var/log/splunk.

A typical metrics.log has stuff like this:

03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=tail, processor=tail, cpu_seconds=0.000000, executes=31, cumulative_hits=73399
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=annotator, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=clusterer, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=readerin, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=sendout, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=thruput, name=index_thruput, instantaneous_kbps=0.302766, instantaneous_eps=2.129032, average_kbps=0.000000, total_k_processed=19757, load_average=0.124023
03-13-2008 10:48:55.620 INFO Metrics - group=per_host_thruput, series="fthost", kbps=0.019563, eps=0.096774, kb=0.606445
03-13-2008 10:48:55.620 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.283203, eps=2.032258, kb=8.779297
03-13-2008 10:48:55.620 INFO Metrics - group=per_index_thruput, series="_internal", kbps=0.275328, eps=1.903226, kb=8.535156
03-13-2008 10:48:55.620 INFO Metrics - group=per_index_thruput, series="_thefishbucket", kbps=0.019563, eps=0.096774, kb=0.606445
03-13-2008 10:48:55.620 INFO Metrics - group=per_index_thruput, series="default", kbps=0.007876, eps=0.129032, kb=0.244141
03-13-2008 10:48:55.620 INFO Metrics - group=per_source_thruput, series="/applications/splunk3.2/var/log/splunk/metrics.log", kbps=0.272114, eps=1.870968, kb=8.435547
03-13-2008 10:48:55.620 INFO Metrics - group=per_source_thruput, series="/applications/splunk3.2/var/log/splunk/splunkd.log", kbps=0.003213, eps=0.032258, kb=0.099609
03-13-2008 10:48:55.620 INFO Metrics - group=per_source_thruput, series="/var/log/apache2/somedomain_access_log", kbps=0.007876, eps=0.096774, kb=0.244141
03-13-2008 10:48:55.620 INFO Metrics - group=per_source_thruput, series="filetracker", kbps=0.019563, eps=0.096774, kb=0.606445
03-13-2008 10:48:55.620 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.007876, eps=0.129032, kb=0.244141
03-13-2008 10:48:55.620 INFO Metrics - group=per_sourcetype_thruput, series="filetrackercrclog", kbps=0.019563, eps=0.096774, kb=0.606445
03-13-2008 10:48:55.620 INFO Metrics - group=per_sourcetype_thruput, series="splunkd", kbps=0.275328, eps=1.903226, kb=8.535156
03-13-2008 10:48:55.620 INFO Metrics - group=queue, name=aeq, max_size=10, filled_count=0, empty_count=0, current_size=0, largest_size=0, smallest_size=0
03-13-2008 10:48:55.620 INFO Metrics - group=queue, name=aq, max_size=10, filled_count=0, empty_count=0, current_size=0, largest_size=0, smallest_size=0
03-13-2008 10:48:55.620 INFO Metrics - group=queue, name=tailingq, current_size=0, largest_size=0, smallest_size=0
03-13-2008 10:48:55.620 INFO Metrics - group=queue, name=udp_queue, max_size=1000, filled_count=0, empty_count=0, current_size=0, largest_size=0, smallest_size=0

There's a lot more there than just volume data, but for now let's focus on investigating data inputs.

group identifies what type of thing is being reported on and series gives the particular item.

For incoming events, the amount of data processed is in the thruput group, as in per_host_thruput. In this example, you're only indexing data from one host, so per_host_thruput actually can tell us something useful: that right now host "grumpy" indexes around 8k in a 30-second period. Since there is only one host, you can add it all up and get a good picture of what you're indexing, but if you had more than 10 hosts you would only get a sample.

03-13-2008 10:49:57.634 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.245401, eps=1.774194, kb=7.607422
03-13-2008 10:50:28.642 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.237053, eps=1.612903, kb=7.348633
03-13-2008 10:50:59.648 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.217584, eps=1.548387, kb=6.745117
03-13-2008 10:51:30.656 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.245621, eps=1.741935, kb=7.614258
03-13-2008 10:52:01.661 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.311051, eps=2.290323, kb=9.642578
03-13-2008 10:52:32.669 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.296938, eps=2.322581, kb=9.205078
03-13-2008 10:53:03.677 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.261593, eps=1.838710, kb=8.109375
03-13-2008 10:53:34.686 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.263136, eps=2.032258, kb=8.157227
03-13-2008 10:54:05.692 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.261530, eps=1.806452, kb=8.107422
03-13-2008 10:54:36.699 INFO Metrics - group=per_host_thruput, series="grumpy", kbps=0.313855, eps=2.354839, kb=9.729492

For example, you might know that access_common is a popular sourcetype for events on this Web server, so it would give you a good idea of what was happening:

03-13-2008 10:51:30.656 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.022587, eps=0.193548, kb=0.700195
03-13-2008 10:52:01.661 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.053585, eps=0.451613, kb=1.661133
03-13-2008 10:52:32.670 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.031786, eps=0.419355, kb=0.985352
03-13-2008 10:53:34.686 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.030998, eps=0.387097, kb=0.960938
03-13-2008 10:54:36.700 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.070092, eps=0.612903, kb=2.172852
03-13-2008 10:56:09.722 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.023564, eps=0.290323, kb=0.730469
03-13-2008 10:56:40.730 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.006048, eps=0.096774, kb=0.187500
03-13-2008 10:57:11.736 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.017578, eps=0.161290, kb=0.544922
03-13-2008 10:58:13.748 INFO Metrics - group=per_sourcetype_thruput, series="access_common", kbps=0.025611, eps=0.225806, kb=0.793945

But you have probably got more than 10 sourcetypes, so at any particular time some other one could spike and access_common wouldn't be reported. per_index_thruput and per_source_thruput work similarly.

With this in mind, let's examine the standard saved search "KB indexed per hour last 24 hours".

index::_internal metrics group=per_index_thruput NOT debug NOT sourcetype::splunk_web_access | timechart fixedrange=t span=1h sum(kb) | rename sum(kb) as totalKB

This means: look in the internal index for metrics data of group per_index_thruput, ignore some internal stuff and make a report showing the sum of the kb values. For cleverness, we'll also rename the output to something meaningful, "totalKB". The result looks like this:

sum of kb vs. time for results in the past day
_time totalKB
1 03/12/2008 11:00:00 922.466802
2 03/12/2008 12:00:00 1144.674811
3 03/12/2008 13:00:00 1074.541995
4 03/12/2008 14:00:00 2695.178730
5 03/12/2008 15:00:00 1032.747082
6 03/12/2008 16:00:00 898.662123

Those totalKB values just come from the sum of kb over a one hour interval. If you like, you can change the search and get just the ones from grumpy:

index::_internal metrics grumpy group=per_host_thruput | timechart fixedrange=t span=1h sum(kb) | rename sum(kb) as totalKB
sum of kb vs. time for results in the past day
_time totalKB
1 03/12/2008 11:00:00 746.471681
2 03/12/2008 12:00:00 988.568358
3 03/12/2008 13:00:00 936.092772
4 03/12/2008 14:00:00 2529.226566
5 03/12/2008 15:00:00 914.945313
6 03/12/2008 16:00:00 825.353518
index::_internal metrics access_common group=per_sourcetype_thruput | timechart fixedrange=t span=1h sum(kb) | rename sum(kb) as totalKB
sum of kb vs. time for results in the past day
_time totalKB
1 03/12/2008 11:00:00 65.696285
2 03/12/2008 12:00:00 112.035162
3 03/12/2008 13:00:00 59.775395
4 03/12/2008 14:00:00 35.008788
5 03/12/2008 15:00:00 62.478514
6 03/12/2008 16:00:00 14.173828

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has about working with metrics.log.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!