Schedule maintenance downtime in ITSI
Splunk IT Service Intelligence (ITSI) lets you put services and entities into maintenance mode for a specific time period. You can use maintenance mode to prevents ITSI from triggering alerts from machines and other devices that are undergoing maintenance operations or do not require active monitoring for any reason.
For example, if your ticketing system experiences a failure and you start to receive a cascade of identical notable events, you can put the service that is monitoring the ticketing system into maintenance mode to stop ITSI from generating alerts until the issue is resolved.
Create a new maintenance window
Maintenance windows apply to services and entities.
By default, only users assigned the
write-maintenance_calendar capability can create a maintenance window. By default, the
itoa_team_admin roles have this capability.
- From the ITSI top menu bar, click Configure > Maintenance Windows.
- Click Create Maintenance Window.
- Provide a title for the maintenance window. For example, "DB entity maintenance window."
- Set the start time, duration, and end time for the maintenance window.
- Select the Objects for which you want to create a maintenance window: Entities or Services.
If you do not have write access to the Global team, you will not be able to select Entities to put into maintenance.
- Click Next.
- Select the specific services or entities that you want to place in maintenance mode for the duration of the maintenance window. You can only select services or entities for which you have write access.
- Click Create.
The selected entities or services enter maintenance mode according to the defined schedule.
When viewing a scheduled maintenance window, a user only sees the services included in the maintenance window for which the user has read access. If a maintenance window only contains services for which the user does not have read access, the user cannot view the maintenance window. A maintenance window that contains only entities can be viewed by all ITSI users.
If using the Delete bulk action to delete all or several maintenance windows, only the maintenance windows that contain services or entities for which the user has write access will be deleted.
Impact of maintenance windows
Maintenance windows can have an impact on associated KPIs, service health score calculations, and other ITSI features.
Consider the following when you put a service into maintenance mode:
- All KPIs associated with that service are automatically put into maintenance mode.
- Search results from KPIs in maintenance mode are ignored for the purpose of service health score calculation for the duration of the maintenance window.
- Maintenance windows do not affect adaptive threshold calculations. When looking back at past data to calculate threshold values, search results from KPIs in maintenance mode are ignored.
Consider the following when you put an entity into maintenance mode:
- If that entity has no KPIs running searches against it, there is no impact on service health scores.
- If that entity has one or more KPIs that are running searches against it, all search results from all KPIs running against that entity are ignored for the purpose of service health score calculation.
- If a KPI is split by entity (for example if the same KPI is running against two different entities, and one entity is in maintenance mode and one is not), search results generated by the KPI running against the entity in maintenance mode are ignored for the purpose of health score calculation. Search results generated by the same KPI running against the entity that is not in maintenance mode are included as usual in the service health score calculation.
- Entities can be placed in full or partial maintenance mode without being explicitly placed in maintenance mode, if a service that contains the entity is placed in maintenance mode.
Impact on ITSI features
Services, entities, and KPIs that are fully or partially impacted by a maintenance window appear in a dark gray color on pages that display health scores, including service analyzers, service and entity details pages, glass tables, multi-KPI alerts, and deep dives.
View impacted KPIs
You can view the impact of a maintenance window on associated KPIs.
- Select a maintenance window from the Maintenance Windows lister page.
The maintenance window details page opens, showing the specific services or entities impacted by the maintenance window.
- Click Impacted KPIs to see a list of KPIs impacted by the maintenance window. KPIs that are split by entity, and thus are currently running searches against other entities that are not in maintenance mode, are listed as "Partially" impacted. KPIs that are not split by entity are listed as "Fully" impacted.
When to schedule maintenance windows
It is a best practice to schedule maintenance windows with a 15- to 30-minute time buffer before and after you start and stop your maintenance work. This gives the system an opportunity to catch up with the maintenance state and reduces the chances of ITSI generating false positives during maintenance operations.
For example, if a server will be shut down for maintenance at 1:00PM and restarted at 5:00PM, the ideal maintenance window is 12:30PM to 5:30PM.
The 15- to 30-minute time buffer is a rough estimate based on 15 minutes being the time period over which most KPIs are configured to search data and identify alert triggers.
Create multi-KPI alerts in ITSI
Back up and restore ITSI KV store data
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.0.4, 4.1.0, 4.1.1, 4.1.2, 4.1.5, 4.2.0, 4.2.1, 4.2.2, 4.2.3, 4.3.0, 4.3.1