About Universal Alerting in the Content Pack for ITSI Monitoring and Alerting
Universal Alerting simplifies and speeds up the process of onboarding external alert sources (such as Nagios or Solarwinds) into ITSI, and quickly makes those alert sources more useful.
Benefits of Universal Alerting include:
- Eliminating the need to create a correlation search for each external alert source. The pre-built Universal Correlation Search finds and onboards all external alerts which adhere to the Universal Alerting Field Normalization Standard.
- Eliminating the need to create Notable Event Aggregation Policies (NEAPs) for each alert source. The pre-built NEAPs in the Content Pack for ITSI Monitoring and Alerting group Universal (normalized) Notable Events into Episodes, correlating across all alert sources.
- Providing consistent, reusable functionality within ITSI for Notable Events and Episodes, requiring little or no ITSI tweaking by the customer/implementer.
- Providing automatic and seamless correlation across alert sources.
- Providing better deduplication (across the Last Hour, rather than just the Last Minute).
- Providing raw alert backfill across the last hour, to catch missed alerts.
- Providing a consistent Alarm State structure across all Notable Events.
Prepare to use Universal Alerting
Each alert source must have a few fields normalized (as few as four fields) in the Splunk platform, using the Settings > Fields GUI menu in the Splunk platform. Examples of alert sources are Nagios, Solarwinds, or Splunk Infrastructure Monitoring.
After you add the new normalized fields as Splunk search-time Knowledge Objects for an alert source, all alerts for this source are onboarded into ITSI and grouped into Episodes.
- Raw alerts which have been normalized are processed and onboarded as Notable Events by the Universal Correlation Search, which provides deduplication, back-fill (up to an hour), consistent alarm state structure, and ITSI Service Context lookup.
- Normalized Notable Events are grouped into multiple types of Episodes using the Aggregation Policies in the Content Pack for ITSI Monitoring and Alerting. Examples of universal aggregation policies include Episodes by Alarm, Episodes by Src, and Episodes by ITSI Service.
For more details, see About the Aggregation Policies in the Content Pack for ITSI Monitoring and Alerting.
All inbound alerts, regardless of source, are handled efficiently and consistently inside ITSI. You don't need to add new correlation searches and Notable Event Aggregation Policies.
You can add new aggregation policies that take advantage of the Universal Alerting Normalized fields, which act on all normalized alerts.
Normalize an external alert source to use Universal Alerting
To normalize an external alert source, perform the following steps.
- Install the packages.
- Disable and enable the relevant ITSI components.
- For each alert source, fill out a cheat sheet and create the normalized fields.
Step 1: Install the packages
Step 2: Disable and enable the relevant ITSI components
- Disable all existing ITSI correlation searches, especially if custom-built searches exist for the alert sources that you will normalize. If the searches remain enabled, the resulting Notable Events will be duplicated or otherwise confusing.
- Disable all Notable Event Aggregation Policies (NEAPs), especially custom-built policies for the alert sources that you are normalizing. If they remain enabled, the resulting episodes will be duplicated or otherwise confusing.
- Later, thoughtfully enable the correlation searches and NEAPs, if they don't overlap with the new Universal components.
- Enable the ITSI correlation searches:
- Universal Correlation Search
- Episode Monitoring - Set Episode to Highest Alarm Severity
- Enable the Notable Event Aggregation Policies:
- Episodes by Alarm
- Episodes by Alert Group
- Episodes by ITSI service
- Episodes by src
Step 3: Complete a cheat sheet for one alert source
Identify the normalized fields from the raw alert source. In some cases, you can alias an existing field. In other cases, you might use eval expressions on raw fields to get what we need for a normalized field. You might need to create an extraction (use a regular expression, suitable for 'rex') to get the field(s) you need.
Keep the following tips in mind as you create the cheat sheet:
- Use the simpler 'eval' approach whenever possible. For example, if the raw alert has a field called
src_host, which is suitable to use for the normalized field
src, use the 'eval' approach to create an alias, like this: for 'src', use eval expression: src_host
- For 'eval' fields, use any 'eval' expression for SPL, as described in the search reference.
- For 'extract' fields, use 'rex' syntax for SPL, as described in the search reference. The 'rex' expression must be able to work on field=_raw.
- At search time, extractions are performed first, then 'eval' functions. This means that an 'extract' field can be used within the eval function of an 'eval' field, but an 'eval' field cannot be referenced within the eval function of another 'eval' field, and an 'eval' field is not available for 'extract' field use. Only text within the original (_raw) alert can be used in an 'extract' expression.
To create a cheat sheet, perform the following steps.
- Make a copy of the blank cheat sheet to work with.
- In the Splunk platform, open a search bar.
- Create an SPL search to show the alerts for this alert source, everything up to the first pipe ("|"). For example,
- Run the search over a large time period (such as Last 24 hours), to get a good sample of fields and values.
- Starting with index and source/sourcetype, fill out the cheat sheet. Initially, focus on these fields:
- itsiInclude - set this to "false" while creating the normalized fields for this alert source. After the normalized fields are ready, change the value of itsiInclude to "true", to enable the Universal Correlation Search to onboard these alerts.
- signature - the type of alert
- src - the host/instance/thing which this alert is for
- vendor_severity - the vendor-specific health/status/severity string
- severity_id - normalized to ITSI-specific severity: 1-6
- app - short name of the monitoring system generating these alerts
- description - more verbose info about the alert
- subcomponent - Optional sub-component for this alert. Further defines 'src'. Not relevant for most alerts.
You can add other fields later, such as URLs for drilldown.
If more than one alert source needs to be onboarded, repeat the process to create a cheat sheet for each alert source.
Step 4: Create normalized fields for one alert source
You can use the Fields user interface in the Splunk platform to add normalized fields by performing the following steps:
- To access the Field Knowledge Object window in the Splunk platform user interface, click Settings > Fields.
- From your cheat sheet, each
evalentry will be a new Calculated Field. Each
extractentry will be a new Field Extraction, as shown in the following example where the eval field itsiInclude is added.
Note: Add itsiInclude first, and set the value to "false"; this will be changed at the end of the process, after all the intended fields have been added and tested.
- After you select Save, modify Sharing permissions to All Apps (recommended) or This App.
- Repeat the process for each normalized field on your cheat sheet, creating at least the four mandatory fields.
- Ensure that Interesting Fields shows the new fields you added as shown in the following example:
- After you have confirmed that all the desired fields have been successfully added, change the value of itsiInclude from false to true. If the Universal Correlation Search in ITSI is enabled, alerts from this new source will now be ingested as notable events.
Deduplication, alerts, and alarms
An alert is a time-series message indicating that a threshold has been crossed, a state has changed, or a failure has occurred. Alerts are often created by monitoring systems. An alert always includes:
- The object of the alert, often a host or instance
- The type of alert, such as "CPU too high", "Not responding", or similar.
- The severity and/or state of the alert, such as up, down, ok, normal, critical, etc.
Individual alerts can be grouped as alarms. A string of alerts for a single object and single type are considered to be a single alarm. An alarm has a lifespan which might include multiple state changes; an alarm is considered finished when its state changes to "ok". An alarm with a current non-green state is considered "active"; an alarm with a current state of green/ok is considered "cleared". An active alarm will continue to be considered active until a clearing alert is received, typically, or until a long period of inactivity has passed.
After an alarm clears, if the same alarm reappears, it is considered a new alarm, separate from the earlier one.
For Universal Alerts, the following normalized fields are used with deduplication and alarm grouping:
Alerts with the same values of signature, src, and subcomponent are considered to be a single alarm. The values of these fields are combined into the ITSI field,
index=itsi_tracked_alerts. Such an alarm changes state when its value of vendor_severity changes. The possible values of vendor_severity are translated to standard ITSI severity values (1-6) in severity_id.
Using the examples from above (which have no subcomponent elements), the alerts look like this:
- 01:02 - signature="CPU usage", src="server42", subcomponent="-", vendor_severity="over 90%", severity_id=6
- 01:02 - signature="Availability", src="server11", subcomponent="-", vendor_severity="down", severity_id=6
- 01:07 - signature="CPU usage", src="server42", subcomponent="-", vendor_severity="over 70%", severity_id=3
- 01:07 - signature="CPU usage", src="server77", subcomponent="-", vendor_severity="over 90%", severity_id=6
- 01:12 - signature="CPU usage", src="server42", subcomponent="-", vendor_severity="ok", severity_id=2
- 01:12 - signature="Availability", src="server11", subcomponent="-", vendor_severity="up", severity_id=2
These alerts would be grouped as alarms in ITSI like this:
"CPU usage - server42 - -"
"Availability - server11 - -"
"CPU usage - server77 - -"
The Universal Correlation Search looks back over the Last Hour for raw Normalized Alerts (for example, which have the fields, signature, src, subcomponent, etc.), then groups them into alerts and determines the most recent state (vendor_severity) for each alarm. It then compares to see if these alarms already exist in ITSI (index=itsi_tracked_alerts). The Universal Correlation Search runs every minute, though it looks back over the Last Hour each time.
For each alarm found in the raw alerts:
- If the alarm doesn't exist in ITSI (for the Last Hour): add the alert as a Notable Event to ITSI
- If the alarm exists in ITSI, but has a different "current state" (vendor_severity) than the raw alarm: add the alert as a Notable Event to ITSI
- If the alarm exists in ITSI, but has the same "current state" (vendor_severity) as the raw alarm: disregard the alert (do NOT add as a Notable Event to ITSI)
Practical outcomes of this approach include:
- Significant noise reduction from raw alerts to notable events
- A consistent alarm grouping scheme that can work with any alert source
- Being able to correlate and group alerts and alarms easily, across all alert sources
Universal Alerting normalized fields
Throughout this section, "Default value" refers to the value which will appear when the raw alert is onboarded into ITSI as a Notable Event, using the Universal Correlation Search (included in the Content Pack for ITSI Monitoring and Alerting).
For the Universal Correlation Search to properly process a normalized alert, the four Required Fields must be included, at a minimum. Recommended Fields are quite helpful, but the Universal Correlation Search does not require them. Optional Fields are available for more advanced integrations, such as providing drilldowns.
The specific object, element, or instance which is the target of this alert. It is usually a host, node, or device, but can also be a service or method (via URL, for example). This is usually be a field alias field alias (via 'eval') or extraction (via 'extract'). There is no default.
This is a string which uniquely identifies this type of alert. The value of 'signature' should remain the same, even as the severity values change. Examples of
signature values: "Host Status", "Web API Check", "Auth Status". This will usually be a field alias (via 'eval') or static definition (via 'eval'). There is no default.
The original vendor-specific severity/health/status string. Examples of
vendor_severity include up, down, ok, normal, critical, warning, etc. This will usually be a field alias (via 'eval'). There is no default.
vendor_severity normalized to the ITSI severity values (1-6):
- 1 = Info or Unknown
- 2 = Normal or Cleared
- 3 = Low
- 4 = Medium
- 5 = High
- 6 = Critical
There is no default. This will be a calculated field (via 'eval'), based on the original field used to alias vendor_severity. For example, you can't use an 'eval' field to populate another 'eval' field. You must use an 'original' field.
The sub-component object for this alert. Further defines 'src'. For example, for a "Filesystem Full" alert on "server42" for "/var":
- signature = "Filesystem Full"
- src = "server42"
- subcomponent = "/var"
Most alerts will not have a sub-component object. If the alert does contain a sub-component object, you must include this field. It will be a field alias (via 'eval') or extraction (via 'extract'). The default is "-"
Boolean indicating whether this alert should automatically be brought into ITSI as a Notable Event by the Universal Correlation Search. If absent, ITSI will assume itsiInclude="true". If itsiInclude="false" or "f", ITSI will not onboard the alert. This is useful for testing, or to select which raw alerts to onboard as Notable Events. This will be a static definition (via 'eval'). During testing:
- Initially set this field to 'false' as other fields are created
- After all the fields have been added, change the value to 'true'
- If you want to only onboard certain raw alerts, use itsiInclude with a thoughtful eval statement (such as 'case' or 'coalesce') to set 'itsiInclude="true"' for selected alerts
There is no default, but "true" is assumed, unless specified otherwise.
Text string with more verbose information about the alert. This will be field alias (via 'eval'), extraction (via 'extract'), or concatenation of several fields (via 'eval'). The Default is "<src> is <vendor_severity>".
Short name of the monitoring system generating these alerts. This will be a static string (via 'eval'). For example, "Nagios", "OMD", "SignalFX", "SIM", etc. There is no default.
Specifies what the Notable Event Title will say. The default is "<signature> - <src> (<subcomponent>)".
SPL to drill down into the details of this alert. The default is "index=* signature="<signature>" src="<src>".
External link for this alert. For example: "https://bakookanet.com/alerts&alertid=1234567". There is no default.
Display name for the link included in itsiDrilldownURI. The default is "External Drilldown for <itsiNotableTitle>".
Text or Markdown instructions for a human on how to handle this type of alert. This can handle a link if it is encoded as Markdown.
- Plain text example: "Try kicking it with your other foot"
- Markdown example: "[BazookaNet Manager](https://bazookanetmanager.com/alerts&alertid=123456)"
- Markdown for an image example: "![BazookaNetMgr](https://bazookanetmanager.com/images&alertid=123456)"
This will often be a lookup, probably based on signature, or perhaps based on a URL included in the raw alert, or might be able to be constructed via 'eval' using info in existing fields. There is no default.
Used for the 'Entity Lookup Field', in the Universal Correlation Search. The default is <src>.
About dashboards in the Content Pack for ITSI Monitoring and Alerting
Normalizing cheat sheets for the Content Pack for ITSI Monitoring and Alerting
This documentation applies to the following versions of Content Pack for ITSI Monitoring and Alerting: 2.2.0