Application Management

 


Monitoring

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Monitoring

You can use Splunk to monitor your application deployment for availability, performance, and normalized status. In Splunk, it's not that big a step from troubleshooting to monitoring. You can take the same information you use to troubleshoot and create alerts, reports, and dashboards. You can track things that you know cause problems, like operating system degradation, web server errors or missing pages, or increases in transaction time. In addition, since the information you have in Splunk gives a deep picture of your log data over time, you can step back from problems and use Splunk to map and understand what is normal for your environment. Understanding what's typical helps you to build smarter alerts and manage your systems in a proactive manner.

Note: This topic is primarily an outline, which will be expanded over time. The outline and contents may change.

What to monitor

You can monitor using the same logs and inputs that you use for troubleshooting. But you may also find that you want to incorporate additional information to get a more holistic view of your application and its environment. For example, monitoring OS and device metrics may allow you to find the causes of performance degradation before it becomes a problem.

Sources to monitor that are specifically related to availability and performance include:

Monitor performance and availability

The following are suggested methods for monitoring performance and availability with Splunk.

Additional resources for monitoring availablity may come from dev, who often have an availability-centric tool or page you can leverage.

Monitor normalized status

Gathering information about the baseline performance of your application can help you create intelligent alerts. Splunk includes an extensive set of statistical operators that you can use to find average values over time, the most common or most rare values of a field, and so on.

Here are some ways to look for anomalies in your data:

error | cluster showcount=true | sort - cluster_count | head 5
sourcetype=sendmail_syslog | anomalousvalue delay action=filter pthresh=0.02
* | anomalies blacklist=boringevents
sourcetype=top | timechart avg(cpu_seconds) by host

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!