Configure a Waiting Room for incidents đź”—
Many resilient monitoring systems will automatically resolve a problem without any human interaction required. Paging users for incidents that could auto-resolve creates unnecessary noise for your on-call users. To avoid this issue, you may want to set up a waiting room.
When you set up a waiting room escalation policy it will temporarily hold an incident for a configurable time period to allow for an automated resolution to an issue. When this action takes place, the incident is then automatically closed in Splunk On-Call and on-call users will not be paged. If the incident remains open longer than the chosen interval, it is then routed to the responsible team or escalation policy as a triggered alert.
Note
The Rules Engine section of this configuration requires a Full-Stack level of service.
Configure a new Escalation policy đź”—
Navigate to the team in need of a waiting room escalation policy. Select Escalation Policies, then Add Escalation Policy.
Select the drop-down for Immediately and choose a time interval to delay alerts that are sent to this team’s Waiting Room escalation policy.
For the Escalation type select Execute Policy and then choose the policy from the team that will be responsible for these incidents should they fail to auto-resolve within the configured time delay.
Create a Routing Key đź”—
Navigate to Settings, then Routing Keys.
Select Add Key, give the new routing key a name, and choose the waiting room team you’ve just created.
Set up a Rules Engine rule to route these incidents to the Waiting Room đź”—
Navigate to Settings, then Add a Rule.
- Select Add a Rule. In the following example, the rule is configured to match the
entity_id
field to a wildcard phrase within variable of theentity_id
field. Any incoming alert that has this matching condition will be routed to the waiting room escalation policy. This allows you to limit the scope of the matching condition to these issues only, without affecting an on-call team’s ability to be paged immediately in the event of an urgent issue. For more information on using the Rules Engine, see Splunk On-Call Alert Rules Engine.
- Select Add a Rule. In the following example, the rule is configured to match the
If you have a variety of incidents that require this approach, and multiple teams or escalation policies that will be responsible, you will need to set up a unique waiting room escalation policy with its own routing_key for each of those teams’ policies. For example, “Ops Waiting Room” with an escalation policy that points to the Ops team, an “SRE Waiting Room” with an escalation policy that points to the SRE team.