About aggregation policies in the Content Pack for ITSI Monitoring and Alerting

Aggregation policies provide configuration for grouping related notable events together in useful ways. The Content Pack for ITSI Monitoring and Alerting ships with several aggregation policies that are disabled by default.

There are configurations common to all aggregation policies and configurations that are unique to each aggregation policy in this content pack. All aggregation policies operate with minimal adjustments but can be customized as needed. Prior to making any changes to the default settings, ensure you have reviewed and understood the following aggregation policy configuration descriptions.

Configurations common to all policies

The following attributes are common to both aggregation policies in the Content Pack for ITSI Monitoring and Alerting.

Policy naming convention

Both notable event aggregation policies included in this content pack have a policy name that starts with Episodes by followed by the attributes of the notable events they use for grouping. For example, Episodes by ITSI Service groups together all notable events that share the same ITSI service.

Criteria to break an episode

Each aggregation policy in this content pack breaks the episode when the flow of events into the episode stops for more than eight hours. If an episode stops receiving notable events for more than eight hours, it's a strong indication that the problem has subsided. A new flow of events subsequently creates a new episode.

Episode information: Episode title

Episode titles play an important role in content pack design because they tie together multiple related episodes over time. For example, if the same alert is triggered on the same host most days, the content pack is able to relate these episodes together because the episode title will be appropriately unique to that host and alert, but otherwise remain consistent over time. The aggregation policies are configured to keep episode titles concise and appropriately descriptive. Keep this configuration unmodified unless you need to make slight modifications to still allow for episode association over time, associated by the episode title.

Episode information: Episode severity

While not an ideal implementation, episode severity has been configured to reflect the severity of the most recent notable event added to the episode. When the "Episode Monitoring - Set Episode to Highest Alarm Severity" correlation search is enabled, you get a reasonably accurate representation of the current episode severity. You can refine or modify this configuration as necessary.

Action rule to close an episode

Changing the episode status to Closed allows for better episode lifecycle management. Additionally, the Episode Monitoring correlation searches exclude closed episodes, so no additional alerting happens after an episode is closed.

Each aggregation policy in this content pack changes an episode's status to Closed under the following circumstances.

When the episode breaks. It also adds a comment to the episode to track the action execution.
When the first notable event you receive for an episode has severity=2 (normal/green). It rarely makes sense to start an episode with a clearing event, so these episodes are closed immediately so they can be ignored.

Action rule to add an alert comment

Each aggregation policy in this content pack adds a comment to an episode when a notable event from an episode monitoring rule is added to the episode. This action exists to demonstrate how to configure aggregation policies to take action automatically when an Episode Monitoring correlation search detects an alert condition. You can keep, remove, or configure this action as necessary depending on your specific environment.

Action rule to modify the episode severity

Each policy contains a rule to modify the episode's severity, under the control of the correlation search, Episode Monitoring - Set Episode to Highest Alarm Severity, which will adjust the episode's severity to match the highest Event Type swim lane or alarm severity. The episode severity is set by the arriving Notable Event itself, but this action rule also adds a comment.

Action rule to modify the episode status to Closed

Each policy contains a rule to set the episode's status to Closed, under the control of the correlation search, Episode Monitoring - Set Episode to Highest Alarm Severity, which closes the episode when all of the following conditions are met:

All Event Type swim lanes have cleared (current severity is green for all swim lanes)
The episode has been "quiet" (has not received any Notable Events) for at least 12 minutes.

The action rule also adds a comment. After setting the episode status to Closed, ITSI also "breaks" the episode, to prevent additional Notable Events from being added. Any relevant Notable Events that appear after the episode has been broken are added to a new episode.

Policy-specific configurations

The following attributes are unique to each aggregation policy included in the Content Pack for ITSI Monitoring and Alerting.

Episodes by Alarm

This aggregation policy groups notable events into episodes as alarms, using the Universal Alerting fields, signature, src, and subcomponent. For Notable Events created by the Universal Correlation Search from external alerts, these three fields are the basis for episode Event Types (alarms), seen as "swim lanes" in an episode. For more details, see About Deduplication, Alerts and Alarms.

Unlike the other Aggregation Policies, this policy creates episodes which have one, and only one, Event Type swim lane (alarm). This allows you to examine alarms individually.

When the alarm clears, the associated episode will break and change status to Closed immediately.

This policy works with all notable events that use Universal Alerting, and is recommended when starting to use this content pack.

For more details about this policy, go to Configuration > Notable Event Aggregation Policies, then select this policy.

Episodes by ITSI Service

This aggregation policy groups notable events into episodes according to the service they're associated with. Enable this policy if you want to associate alerts with each affected service. This policy works with a variety of notable events, and is recommended when starting to use this content pack.

This aggregation policy splits notable events by the serviceid field, which is automatically generated by ITSI to indicate service context. Not all notable events will have this field.

It is possible for an entity to be associated with more than one ITSI service, resulting in a multivalue in serviceid. If a notable event contains a multivalue serviceid field, this policy creates an episode for the first service only. This is a known limitation in ITSI.

For more details about this policy, go to Configuration > Notable Event Aggregation Policies and select this policy.

Episodes by Alert Group

This aggregation policy groups notable events into episodes by an event's alert_group field. This field allows for logically related notable events associated with different services and KPIs to be grouped together. Enable this policy if you want to group the notable events associated with multiple related services together into a single episode.

Before you enable this policy, make sure the itsi_kpi_attributes lookup is up to date and that the alert_group field within the lookup is configured so similar services and KPIs share the same alert_group name. For instructions, see Step 3: Update the itsi_kpi_attributes lookup in Install and configure the Content Pack for ITSI Monitoring and Alerting. The lookup automatically obtains the alert_group field from services and KPIs.

For more details about this policy, go to Configuration > Notable Event Aggregation Policies, then select this policy.

Episodes by src

This aggregation policy groups notable events into episodes by an event's src field, which is a Universal Alerting field that describes the object of an alert, such as a host or instance name. This policy works with all notable events that use Universal Alerting, and is recommended when starting to use this content pack.

ITSI Alert and Episode Monitoring

This aggregation policy supports the ITSI alert and episode analytics and monitoring functionality. Its purpose is to create episodes that reflect widespread alert and episode issues such as alert storms and episode storms.

This aggregation policy captures all notable events from the ITSI Alert and Episode Monitoring services. The aggregation policy is configured to alert immediately upon the first notable event it receives. The episode title is hardcoded to be ITSI Alert and Episode Storm Activity Detected but can be customized. The aggregation policy is configured with an Episode Dashboard designed to highlight critical insights about the volume and field value distribution of incoming alerts.

For more details about this policy, go to Configuration > Notable Event Aggregation Policies, then select this policy.

About aggregation policies in the Content Pack for ITSI Monitoring and Alerting

Configurations common to all policies

Policy naming convention

Criteria to break an episode

Episode information: Episode title

Episode information: Episode severity

Action rule to close an episode

Action rule to add an alert comment

Action rule to modify the episode severity

Action rule to modify the episode status to Closed

Policy-specific configurations

Episodes by Alarm

Episodes by ITSI Service

Episodes by Alert Group

Episodes by src

ITSI Alert and Episode Monitoring

Comments

About aggregation policies in the Content Pack for ITSI Monitoring and Alerting

Was this topic useful?