Use the Alerts dashboard
The Alerts dashboard gives you information on the alerts that the Content Pack for Unix Dashboards and Reports has triggered. The dashboard displays when alerts get triggered, which hosts the alerts triggered on, and the alert severity.
Perform the following steps to open the Alerts dashboard:
- In Splunk Web, open ITSI or IT Essentials Work.
- From the navigation bar, choose Dashboards > Dashboards.
- Open the Alerts - Unix dashboard. The App name is DA-ITSI-CP-unix-dashboards.
Prerequisite
Before you can use the Alerts dashboard, you must first configure alerts for the Content Pack for Unix Dashboards and Reports. For detailed steps, see Configure the Content Pack for Unix and Dashboards and Reports.
How the Alerts dashboard works
The following is an example scenario of how the Alerts dashboard works:
It's Monday morning and, as the data center manager for your company, you receive a report of a system outage the previous night. To investigate what went wrong, you open the Content Pack for Unix Dashboards and Reports and review the alerts that triggered overnight in the Alerts dashboard.
First, you click the time range picker in the Alert Time Range panel and select "Last 24 hours" because you know from the report that the outage occurred within that period. The Content Pack for Unix Dashboards and Reports updates the page to show alerts that have triggered in the last 24 hours. You notice that a large number of alerts occurred around 1:30 that morning.
You select the time around 1:30 in the morning from the time range selector and the Content Pack for Unix Dashboards and Reports updates the Statistics and Summary panels to show alerts that occurred during that time. You see that all of your application servers triggered Memory_Exceeds_Percent_by_Host alerts. You click an alert link in the Summary panel and the content pack shows information about when the alert triggered, the host that triggered the alert, and a snapshot of CPU, memory, process, and commands that were running when the alert was triggered. Using this panel, you find out that something caused your application servers to consume all available memory and crash. This coincides with the report that services went offline.
You take screenshots of the failure and email the engineering, software development, and management teams with the details. The software development team acknowledges that the latest code changes might have introduced a bug which, in certain circumstances, causes application servers to exhaust all available memory. They roll back the change and, after a few days of tests, find and fix the memory exhaustion bug. Soon afterward, they roll out updated code to the application servers with no adverse effects.
Alerts dashboard overview
The Alerts dashboard includes the following panels:
Panel name | Description |
---|---|
Alert time range | Displays a timeline that shows the number of alerts that arrived within a given time period. You can select a custom time period from the time range picker. |
Statistics | Displays information about which hosts have triggered alerts, which alerts have triggered, and the severity of those alerts. You can drill down into specifics about hosts that triggered alerts and find out how many alerts the host triggered. |
Summary | Shows a listing of the most recent alerts that triggered. |
Choose the alert time range
Use the time range picker to select the time range. The Content Pack for Unix Dashboards and Reports updates the Statistics and Summary panel to only include events that occurred within the time period, from the chosen point in the past up to now.
Use the Statistics panel chart views
The Statistics panel displays three donut charts which show the following information:
- The number of Hosts that fired alerts in the time range selected in the Alert Time Range panel.
- The Names of the alerts that fired in this time period.
- The Severity of the alerts that triggered in this period.
Each donut chart is divided into different color slices depending on how many hosts, alerts, or severity levels are present in the selected time range.
Get information about a single host by clicking one of the color slices in the Hosts donut chart. The donut chart updates to show you how many alerts that host triggered during the selected time range. The Summary panel updates to show information on alerts that include the selected host.
Similarly, the Name donut chart allows you to filter by alerts that fired. When you click on a donut chart slice for a specific alert, the chart updates to show you how many times that alert fired in the selected time range. The Summary panel updates to only show those specific alerts.
The Severity donut chart allows you to filter alerts based on severity. When you click a slice in that chart, the chart updates to show the number of times that alerts of the selected severity level triggered in the selected time frame. The Summary panel updates to only show alerts of the selected severity level.
You can reset the filter for each donut chart by clicking the reset link inside each chart.
Use the Summary panel to see triggered alerts
The Summary panel shows you information about the alerts that triggered in a time range you selected using the Alert Time Range picker or custom range selector. The panel also shows what you filtered by using the Statistics donut charts.
For the selected time range and filter level, the following information is displayed:
- The time the alert fired.
- The name of the alert that fired.
- Severity of the alert.
- The host(s) which triggered the alert.
- A link that allows you to open the underlying search that fired the alert.
- A description of the alert.
The Summary panel displays 10 alerts per page by default. You can see earlier alerts by using the pagination links on the upper right corner of the Summary panel.
Get details on an alert
The Name column in the Alert Summary shows a list of the names of the alerts that fired in the selected time range. When you click on the name for a specific alert, the Content Pack for Unix Dashboards and Reports opens a page that contains detailed information about the specific alert.
The detailed page about the specific alert lists the following information:
- The time that the alert fired.
- A description of the alert.
- Severity of the alert.
- A list of hosts that triggered the alert at that time. You can select the other hosts that triggered the same alert at the same time.
- Graphs that show historical information about CPU usage, memory usage, number of processes, and number of threads around the time that the alert fired. You can click each graph to get search results that power the graph.
The System Status subpanel shows statistics on commands that were running at the time the alert fired. You can select the available commands and sort them by various statistics.
To close the information page on the alert, click anywhere on the screen outside of the alert page.
Use the Hosts dashboard | Troubleshoot the Content Pack for Unix Dashboards and Reports |
This documentation applies to the following versions of Content Pack for Unix Dashboards and Reports: 1.1.3, 1.1.4, 1.1.5
Feedback submitted, thanks!