Use the Alerts dashboard
The Alerts dashboard gives you information on the alerts that the Content Pack for Unix Dashboards and Reports has triggered, when those alerts triggered, and which hosts the alerts have triggered on. It also displays alert severity.
Perform the following steps to open the Alerts dashboard:
- In Splunk Web, open ITSI.
- Go to Dashboards > Dashboards.
- Open Alerts - Unix dashboard (with the App name as DA-ITSI-CP-unix-dashboards).
Ensure that Content Pack for Unix Dashboards and Reports is configured according to the guidelines in Configure the Content Pack for Unix and Dashboards and Reports before using this dashboard.
Example: How the Alerts dashboard works
The following is an example scenario of how the Alerts dashboard works.
It's Monday morning and, as the data center manager for your company, you receive a report of a system outage the previous night. To investigate what went wrong, you open the Content Pack for Unix Dashboards and Reports and review the alerts that triggered overnight in the Alerts dashboard. First, you click the time range picker in the Alert Time Range panel and select "Last 24 hours" because you know from the report that the outage occurred within that period. The Content Pack for Unix Dashboards and Reports updates the page to show alerts that have triggered in the last 24 hours. You notice that a large number of alerts occurred around 1:30 that morning.
You select the time around 1:30 in the morning from the time range selector and the Content Pack for Unix Dashboards and Reports updates the Statistics and Summary panels to show alerts that occurred during that time. You see that all of your application servers triggered Memory_Exceeds_Percent_by_Host alerts. You click an alert link in the Summary panel and the content pack shows information about when the alert triggered, the host that triggered the alert, and a snapshot of CPU, memory, process, and commands that were running when the alert was triggered. Using this panel, you find out that something caused your application servers to consume all available memory and crash. This coincides with the report that services went offline.
You take screenshots of the failure and email the engineering, software development, and management teams with the details. The software development team acknowledges that the latest code changes might have introduced a bug which, in certain circumstances, causes application servers to exhaust all available memory. They roll back the change and, after a few days of tests, find and fix the memory exhaustion bug. Soon afterward, they roll out updated code to the application servers with no adverse effects. The Content Pack for Unix Dashboards and Reports helped resolve and prevent future outages.
Alerts dashboard overview
The Alerts dashboard contains the following panels:
- The Alert time range panel displays a timeline that shows the number of alerts that have arrived within a given time period. You can also select a custom time period from the time range picker.
- The Statistics panel displays information about which hosts have triggered alerts, which alerts have triggered, and the severity of those alerts. You can drill down into specifics about hosts that triggered alerts and find out how many alerts the host triggered.
- The Summary panel on the lower right shows a listing of the most recent alerts that have triggered.
Choose the alert time range
Use the time range picker to select the time range. The Content Pack for Unix Dashboards and Reports updates the Statistics and Summary panel to include only events that have occurred within the time period (from the chosen point in the past up to now).
The Statistics panel displays three donut charts which show the following information:
- The number of Hosts that have fired alerts in the time range selected in the Alert Time Range panel.
- The Names of the alerts that have fired in this time period.
- The Severity of the alerts that have triggered in this period.
Each donut chart is divided into different color slices depending on how many hosts, alerts, or severity levels are present in the selected time range.
Get information about a single host by clicking one of the color slices in the Hosts donut chart. The donut chart updates to show you how many alerts that host triggered during the selected time range, and the Summary panel updates to show information on alerts that include the selected host.
Similarly, the Name donut chart allows you to filter which alerts have fired. When you click on a donut chart slice for a specific alert, the chart updates to show you how many times that alert has fired in the selected time range. The Summary panel also updates to show you only those alerts.
The Severity donut chart allows you to filter alerts based on severity. When you click a slice in that chart, the chart updates to show the number of times that alerts of the selected severity level have triggered in the selected time frame. The Summary panel also updates with only alerts of the selected severity level.
You can reset the filter for each donut chart by clicking the reset link inside each chart.
Use the Summary panel to see triggered alerts
The Summary panel shows you information about the alerts that have triggered in a time range you selected using the Alert Time Range picker or custom range selector, as well what you filter by using the Statistics donut charts.
For the selected time range and filter level, the following information is displayed:
- The time the alert fired.
- The name of the alert that fired.
- The alert's severity.
- The host(s) which triggered the alert.
- A link that allows you to open the underlying search which fired the alert.
- A description of the alert.
The Summary panel displays 10 alerts per page by default. You can see earlier alerts by using the pagination links on the upper right corner of the Summary panel.
Get details on an alert
The Name column in the Alert Summary shows a list of the names of the alerts that have fired in the selected time range. When you click on the name for a specific alert, the Content Pack for Unix Dashboards and Reports opens a page that contains detailed information about the specific alert.
The detailed page about the specific alert lists the following information:
- The time that the alert fired.
- A description of the alert.
- The alert's severity.
- A list of hosts that triggered the alert at that time. You can select the other hosts that triggered the same alert at the same time.
- Graphs that show historical information about CPU usage, memory usage, number of processes, and number of threads around the time that the alert fired. You can click each graph to get search results that power the graph.
A System Status subpanel that shows statistics on commands that were running at the time the alert fired. You can select the available commands and sort them by various statistics.
To close the information page on the alert, click anywhere on the screen outside of the alert page.
Use the Hosts dashboard
Troubleshoot the Content Pack for Unix Dashboards and Reports
This documentation applies to the following versions of Splunk® Content Packs for ITSI and IT Essentials Work: current