Splunk Cloud Platform

Splunk Cloud Platform Admin Manual

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Review the Splunk Cloud Platform health report

The Splunk Cloud Platform health report is a REST-based monitoring tool that lets you view and investigate the health status of some Splunk Cloud Platform features directly inside the Splunk Cloud Platform UI. Individual features report their health status through a tree structure that provides a continuous, real-time view of the health of your service, with no impact on search loads or ingest latency.

By default, the Splunk Cloud Platform health report currently shows the health status of Search Scheduler features only. See Supported features.

Locate the Splunk Cloud Platform health report

To locate and view the health report:

  1. In the Splunk Cloud bar at the top of the page, find the health report icon. The icon changes color from green to yellow or red, based on the health status of features in the health report.
  2. Click the health report icon to open the health report.
  3. In the health status tree, click on any feature to view information about the feature's status.

Splunk cloud health report 1.png

To view the Search Scheduler feature in the Splunk Cloud Platform health report, a role must have the list_health_subset and edit_health_subsetcapabilities. The sc_admin role has these capabilities by default.

How the health report works

The health report records the health status of Splunk Cloud Platform features in a tree structure, where leaf nodes represent particular features, and intermediary nodes categorize the various features. Feature health status is color-coded in four states as follows:

  • Green: The feature is functioning properly.
  • Yellow: The feature is experiencing a problem.
  • Red: The feature has a severe issue and is negatively impacting the functionality of your deployment.
  • Grey: Health report is disabled for the feature.

The health status tree structure

The health status tree has the following nodes:

Health status tree node Description
splunkd The top level node of the status tree shows the overall health status (color) of splunkd. The status of splunkd shows the least healthy state present in the tree. The REST endpoint retrieves the instance health from the splunkd node.
Feature categories Feature categories represent the second level in the health status tree. Feature categories are logical groupings of features. For example, "Search Lag", "Searches Delayed", and "Searches Skipped" are features that form a logical grouping with the name "Search Scheduler". Feature categories act as buckets for groups of features, and do not have their own health status.
Features The next level in the status tree is feature nodes. Each node contains information on the health status of a particular feature. Each feature contains one or more indicators that determine the status of the feature. The overall health status of a feature is based on the least healthy color of any of its indicators.
Indicators Indicators are the fundamental elements of the splunkd health report. These are the lowest levels of functionality that are tracked by each feature, and change colors as functionality changes. Indicator values are measured against red or yellow threshold values to determine the status of the feature. See What determines the status of a feature?

What determines the health status of a feature?

The health status of a feature depends on the current value of its associated indicators. For example, the status of the Search Scheduler: Search Skipped feature depends on the following two indicators:

  • percent_searches_skipped_high_priority_last_24h
  • percent_searches_skipped_non_high_priority_last_24h

Each indicator has a configurable threshold for yellow and red. When an indicator's value meets the threshold condition, the feature's status changes from green to yellow or yellow to red.

For instructions on configuring indicator thresholds, see Edit feature indicator thresholds.

Feature health status viewpoint

The health report shows the health status of Splunk Cloud Platform features from the viewpoint of the local instance on which you are monitoring. The modal title "Health of Splunk Deployment" indicates a distributed instance, and the modal title "Health Status of Splunkd" designates a standalone instance.

Configure the health report

The Splunk Cloud Platform health report displays the status of a pre-defined set of Splunk Cloud Platform features. You can modify some health report settings, including feature indicator thresholds, using the health report manager page in Splunk Web.

Supported features

The Splunk Cloud Platform health report currently lets the sc_admin role monitor these features by default:

Feature Category Features
Search Scheduler Searches Skipped, Searches Delayed, Search Lag

For more information on health report features, see Supported features in the Monitoring Splunk Enterprise manual.

Edit feature indicator thresholds

Each feature in the health status tree has one or more indicators. Each indicator reports a value against a pre-set threshold, which determines the status of the feature. When the indicator value meets the threshold condition, the health status of the feature changes, for example, from green to yellow, or yellow to red. Changing threshold values for any feature applies to all associated search heads or search head captains.


You can edit the threshold value for any feature indicator using Splunk Web, as follows:

  1. In Splunk Web, click Setttings > Health report manager.
  2. Find the feature you want to modify and click Edit Thresholds.
    The Edit Threshold modal opens showing a detailed description of each feature indicator.
  3. Set new indicator threshold values. For example, to modify thresholds for the Search Scheduler: Searches Skipped feature, you can set new Red or Yellow threshold values for the percent_searches_skipped_high_priority_last_24h and percent_searches_skipped_non_high_priority_last_24h indicators:

    Searches skipped indicators.png

  4. Click Save.

Disable a health report feature

You can disable any feature in the health report. Disabling a feature removes that feature from the splunkd health status tree. This is useful, for example, if you want to exclude a feature's status from the health report, while troubleshooting a problem with that feature. All supported features are enabled by default.

To disable a feature in Splunk Web:

  1. In Splunk Web, click Settings > Health report manager.
  2. Toggle the switch to disable the particular feature.
    The feature is now disabled and will no longer impact the overall health status of splunkd.

Example: Investigate search scheduler health status changes

The Splunk Cloud Platform health report lets you view the current health status of Search Scheduler features, including Searches Skipped, Searches Delayed, and Search Lag. You can use the report to identify and investigate Search Scheduler issues that can impact search performance.

The following example shows how you can use the Splunk Cloud Platform health report to investigate Search Scheduler health status changes.

1. Check the health report status

  1. In Splunk Web, check the color of the health report icon in the main menu. A red or yellow icon indicates that one or more search scheduler features have a problem.
  2. Click the health report icon to open the health report. The following health report indicates that the Skipped Searches feature has a severe problem, and that the Search Lag feature might also have a problem.

    Search scheduler status.png

2. Examine root cause and related messages

  1. Click the Searches Skipped feature to view diagnostic information about the current health status of the feature.
  2. Review the information under Root Cause. In this case, the percentage of high priority searches skipped is 44% over the last 24 hours, which exceeds the red threshold of 10% and causes the feature's health status to change to red.
  3. Review the Last 50 related messages. These log entries include warning messages showing that some scheduled searches cannot be executed. For example:
    09-15-2020 16:11:00.324 +0000 WARN SavedSplunker - cannot execute scheduled searches that live at the system level (need an app context). 
    

    Among explanations for this type of warning message is the possibility that the number of high-priority searches running exceeds the maximum concurrent search limit, which can cause searches to be skipped.

3. Confirm the cause of feature status change

After you review root cause and log file information, which suggest that maximum search concurrency limits caused the Searches Skipped feature's status change, you can use the Cloud Monitoring Console to check search scheduler activity and confirm if the suspected cause is correct.

  1. In Splunk Web, click Apps > Cloud Monitoring Console.
  2. Click Search > Scheduler Activity.
    The Count of Scheduler Executions panel shows that 43.62 % of searches have been skipped over the last 4 hours, which approximates the percentage of skipped searches reported under root cause in the health report.

    Skipped search percentage 1.png

  3. Click Search > Skipped Scheduled Searches.
    The Count of Skipped Scheduled Searches panel shows that 756 searches have been skipped over the last 4 hours because "The Maximum number of concurrent historical searches on this instance has been reached." This confirms that the cause of the Skipped Searches status change is that the maximum concurrent search limit has been reached on the system.

    Skipped search reason 2.png

  4. You can now take steps to remedy this issue, by decreasing the total number of concurrent scheduled searches running, and increasing the relative concurrency limit for scheduled searches, which can bring the number of concurrent searches below the maximum concurrent search limit, and return the Searches Skipped feature to the green state.

    For information on relative concurrency limits for scheduled searches, see Set limits for concurrent scheduled searches.
Last modified on 10 August, 2021
PREVIOUS
Use the Workload Management Monitoring dashboard
  NEXT
Manage Splunk Cloud Platform indexes

This documentation applies to the following versions of Splunk Cloud Platform: 8.1.2103, 8.2.2104, 8.2.2105 (latest FedRAMP release), 8.2.2106, 8.2.2107


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters