About the aggregation policies in the Content Pack for ITSI Monitoring and Alerting
Aggregation policies provide configuration for grouping related notable events together in useful ways. The Content Pack for ITSI Monitoring and Alerting ships with several aggregation policies that are enabled by default.
There are configurations common to all aggregation policies and configurations that are unique to each aggregation policy in this content pack. All aggregation policies operate with minimal adjustments but can be customized as needed. Prior to making any changes to the default settings, ensure you have reviewed and understood the following aggregation policy configuration descriptions.
Configurations common to all policies
The following attributes are common to both aggregation policies in the Content Pack for ITSI Monitoring and Alerting.
Policy naming convention
Both notable event aggregation policies included in this content pack have a policy name that starts with Episodes by
followed by the attributes of the notable events they use for grouping. For example, Episodes by ITSI Service
groups together all notable events that share the same ITSI service.
Criteria to break an episode
Each aggregation policy in this content pack breaks the episode when the flow of events into the episode stops for more than eight hours. If an episode stops receiving notable events for more than eight hours, it's a strong indication that the problem has subsided. A new flow of events subsequently creates a new episode.
Episode information: Episode title
Episode titles play an important role in content pack design because they tie together multiple related episodes over time. For instance, if the same alert is triggered on the same host most days, the content pack is able to relate these episodes together because the episode title will be appropriately unique to that host and alert, but otherwise remain consistent across time. The aggregation policies are configured to keep episode titles concise and appropriately descriptive. Keep this configuration unmodified unless you need to make slight modifications to still allow for episode association over time, associated by the episode title.
Episode information: Episode severity
While not an ideal implementation, episode severity has been configured to reflect the severity of the most recent notable event added to the episode. When the "Episode Monitoring - Set Episode to Highest Alarm Severity" correlation search is enabled, you get a reasonably accurate representation of the current episode severity. You can refine or modify this configuration as necessary.
Action rule to close an episode
Changing the episode status to Closed
allows for better episode lifecycle management. Additionally, the Episode Monitoring correlation searches exclude closed episodes, so no additional alerting happens after an episode is closed.
Each aggregation policy in this content pack changes an episode's status to Closed
under the following circumstances.
- When the episode breaks. It also adds a comment to the episode to track the action execution.
- When the first notable event you receive for an episode has severity=2 (normal/green). It rarely makes sense to start an episode with a clearing event, so these episodes are closed immediately so they can be ignored.
Action rule to add an alert comment
Each aggregation policy in this content pack adds a comment to an episode when a notable event from an episode monitoring rule is added to the episode. This action exists to demonstrate how to configure aggregation policies to take action automatically when an Episode Monitoring correlation search detects an alert condition. You can keep, remove, or configure this action as necessary depending on your specific environment.
Action rule to modify the episode severity
Each policy contains a rule to modify the episode's severity, under the control of the correlation search, Episode Monitoring - Set Episode to Highest Alarm Severity, which will adjust the episode's severity to match the highest Event Type swim lane or alarm severity. The episode severity is set by the arriving Notable Event itself, but this action rule also adds a comment.
Action rule to modify the episode status to Closed
Each policy contains a rule to set the episode's status to Closed
, under the control of the correlation search, Episode Monitoring - Set Episode to Highest Alarm Severity, which will close the episode when all of the following conditions are met:
- All Event Type swim lanes have cleared (current severity is green for all swim lanes)
- The episode has been "quiet" (has not received any Notable Events) for at least 12 minutes.
The action rule also adds a comment. After setting the episode status to Closed
, ITSI will also "break" the episode, to prevent additional Notable Events from being added. Any relevant Notable Events which appear after the episode has been broken, are added to a new episode.
Policy-specific configurations
The following attributes are unique to each aggregation policy included in the Content Pack for ITSI Monitoring and Alerting.
Episodes by Alarm
This aggregation policy groups notable events into episodes as alarms, using the Universal Alerting fields, signature
, src
, and subcomponent
. For Notable Events created by the Universal Correlation Search from external alerts, these three fields are the basis for episode Event Types (alarms), seen as "swim lanes" in an episode. For more details, see About Deduplication, Alerts and Alarms.
Unlike the other Aggregation Policies, this policy creates episodes which have one, and only one, Event Type swim lane (alarm). This allows you to examine alarms individually.
When the alarm clears, the associated episode will break and change status to Closed
immediately.
This policy works with all notable events that use Universal Alerting, and is recommended when starting to use this content pack.
For more details about this policy, go to Configuration > Notable Event Aggregation Policies, then select this policy.
Episodes by ITSI Service
This aggregation policy groups notable events into episodes according to the service they're associated with. Enable this policy if you want to associate alerts with each affected service. This policy works with a variety of notable events, and is recommended when starting to use this content pack.
This aggregation policy splits notable events by the serviceid
field, which is automatically generated by ITSI to indicate service context. Not all notable events will have this field.
It is possible for an entity to be associated with more than one ITSI service, resulting in a multivalue in serviceid
. If a notable event contains a multivalue serviceid field, this policy creates an episode for the first service only. This is a known limitation in ITSI.
For more details about this policy, see Configuration > Notable Event Aggregation Policies and select this policy.
Episodes by Alert Group
This aggregation policy groups notable events into episodes by an event's alert_group
field. This field allows for logically related notable events associated with different services and KPIs to be grouped together. Enable this policy if you want to group the notable events associated with multiple related services together into a single episode.
Before you enable this policy, make sure the itsi_kpi_attributes lookup is up to date and that the alert_group
field within the lookup is configured so similar services and KPIs share the same alert_group
name. For instructions, see Step 3: Update the itsi_kpi_attributes lookup in Install and configure the Content Pack for ITSI Monitoring and Alerting. The lookup automatically obtains the alert_group
field from services and KPIs.
For more details about this policy, go to Configuration > Notable Event Aggregation Policies, then select this policy.
Episodes by src
This aggregation policy groups notable events into episodes by an event's src
field, which is a Universal Alerting field that describes the object of an alert, such as a host or instance name. This policy works with all notable events that use Universal Alerting, and is recommended when starting to use this content pack.
For more details about this policy, go to Configuration > Notable Event Aggregation Policies, then select this policy.
About the correlation searches in the Content Pack for ITSI Monitoring and Alerting | About Universal Alerting in the Content Pack for ITSI Monitoring and Alerting |
This documentation applies to the following versions of Content Pack for ITSI Monitoring and Alerting: 2.0.2, 2.0.3
Feedback submitted, thanks!