Overview of Episode Review in ITSI
The Episode Review dashboard in IT Service Intelligence (ITSI) enables organizations to manage communications related to major business issues or incidents. Episode Review allows incident communications administrators to bring together all involved users during these episodes and establish quick and easy communication within the group.
For example, a major issue occurs in a server room, which leads to a high-priority incident being raised. The episode can potentially impact all users, so it's important to bring together key representatives to communicate quickly and effectively. Episode Review can facilitate this communication process and help resolve the source issue.
The goal of Episode Review is to restore normal service operation while minimizing impact to business operations and maintaining quality. The dashboard displays a unified view of all your service-impacting alerts as episodes in their current state.
ITSI Episode Review supports the episode management process in the following ways:
- Classify episodes by impact and urgency to prioritize work.
- Assign to appropriate users for quick resolution.
- Escalate as necessary for further investigation.
- Resolve the episode and notify the user who logged it.
Any user can assign an episode to themselves and track it through the entire episode lifecycle until a service is restored and the issue is resolved.
What are notable events?
A notable event represents an anomalous incident detected by an ITSI multi-KPI alert, a correlation search, or anomaly detection algorithms. For example, a notable event can represent:
- An alert that ITSI ingests from a third-party product into the
- A single KPI (such as cpu_load_percent) that exceeds a pre-defined threshold.
- The result of a multi-KPI alert that correlates the status of multiple KPIs based on multiple trigger conditions.
- The result of a correlation search that looks for relationships between data points.
- An anomaly that has been detected when anomaly detection is enabled.
What's an episode?
An episode represents a disruption of service operation causing impact to business operations. It's a deduplicated group of notable events occurring as part of a larger sequence, or an incident or period considered in isolation.
Examples of application-level episodes could be service unavailability, a data issue, an application bug, or disk-usage threshold exceeded. Examples of hardware episodes include server issues, network issues, or system issues. The Episode Review dashboard is designed to separate and organize these issues in your IT environment so that analysts can effectively triage, investigate, escalate, and resolve them.
Episodes are generated through notable event aggregation policies. Aggregation policies group similar or related notable events into episodes based on the filtering criteria you define. For more information about configuring aggregation policies, see Overview of aggregation policies in ITSI.
Use Episode Review to gain insight into the severity of episodes occurring in your system or network. Triage new episodes, assign episodes to analysts for review, and examine episode details for investigative leads. You can perform various actions on episodes, including running a script, sending an email, creating a ticket in an external ticketing system, adding a link to a ticket in an external system, and any other custom actions you configure.
Note: Monitor episodes and actions in Episode Review with the Event Analytics Audit dashboard. For more information, see Event Analytics Audit dashboard.
Lifecycle of an episode
Episode Review is responsible for managing the life cycle of episodes from creation to closure. The episode management process has several statuses, each of which is important to the success of the process and the quality of service delivered.
An episode can have the following default statuses:
|Unassigned||Used by ITSI when an error prevents the episode from having a valid status assignment.|
|New||Default status. The episode is logged but has not been triaged.|
|In Progress||The episode is assigned and the owner is investigating the issue.|
|Pending||The responsibility for the episode shifts temporarily to another entity to provide further information, evidence, or a resolution. An action must occur before the episode can be closed.|
|Resolved||The owner has addressed the cause of the episode and is waiting for verification. A satisfactory fix is provided to ensure it doesn't occur again.|
|Closed||It's confirmed that the episode is satisfactorily resolved.|
Episode management workflow
Use this example workflow to triage and work on episodes in Episode Review:
- An IT operations analyst monitors the Episode Review, sorting and performing high-level triage on newly-created episodes.
- When an episode warrants investigation, the analyst acknowledges the episode, which moves the status from New to In Progress.
- The analyst researches and collects information on the episode using the drilldowns and fields in the episode details. The analyst records the details of their research in the Comments section of the episode.
- If the analyst cannot immediately find the root cause of the episode, the analyst might open a ticket in Remedy or ServiceNow.
- After the analyst has addressed the cause of the episode and any remediation tasks have been escalated or solved, the analyst sets the episode status to Resolved.
- The analyst assigns the episode to a final analyst for verification.
- The final analyst reviews and validates the changes made to resolve the episode, and sets the status to Closed.
Closing an episode created by an aggregation policy breaks the episode, so no more events can be added to it, even if the breaking criteria specified in the aggregation policy were not met.
Group similar events with Smart Mode in ITSI
Triage episodes in ITSI
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.9.0, 4.9.1, 4.9.2, 4.9.3, 4.10.0 Cloud only, 4.10.1 Cloud only, 4.10.2