Examine performance metrics with the Splunk App for ES Health Check
The Splunk App for ES Health Check provides performance metrics related to specific functions of your Splunk Enterprise environment. Use the app to understand the performance of your Splunk Enterprise deployment before installing Splunk Enterprise Security, or to diagnose performance problems in an environment where Splunk Enterprise Security is installed.
Review your deployment on the Overview dashboard
Get an overview of performance statistics in your environment and identify potential problems.
|Indicators||The indicators display information and performance stats, including counts of incomplete data model accelerations, and skipped searches. An indicator can change color when a threshold value is met or exceeded, and can include a link to a drilldown search.|
|Detected Deployment||A list of the Splunk Enterprise instances detected in your deployment. An instance with a low core count or low memory can impact overall performance.|
|Average Dashboard Load Times - Last 24 Hours||The average time it takes an Enterprise Security dashboard to load. A slow loading dashboard can represent timeouts or other communications issues in your environment. Dashboard load times do not represent search response time.|
|Bundle Replication Attempts - Last 24 Hours||The counts of the total and successful bundle replication attempts. Large number of bundle replication failures will impact search performance.|
|Top Ten Error Messages - Last 24 Hours||The top 10 error messages over the last 24 hours. Investigate the source of these errors to identify potential causes of performance problems.|
|Bottom Ten Error Messages - Last 24 Hours||The bottom 10 error messages over the last 24 hours. Investigate the source of these errors to identify potential causes of performance problems.|
Assess resource usage
Use the Resources dashboard to assess the hardware and system resources used in your Splunk Enterprise deployment.
|Median CPU Usage by Instance||A list of the CPU use over the last 24 hours by each instance. An instance with higher CPU usage than its peers can indicate that it is receiving more data, is in an unstable state, or is configured differently than its peers.|
|Average Normalized CPU Load by Instance||The normalized CPU load average across the instances over the last 24 hours. Investigate the state of any instances showing consistently high average CPU load.|
|Average Memory Usage by Instance||The average memory usage across the instances over the last 24 hours. Investigate the state of any instances showing consistently high memory usage.|
Review index distribution and load
Use the Indexing dashboard to review the state of your indexes and indexers.
|Indexing Volume by Instance (GB)||The average index size by indexer instance. An instance indexing more data than its peers impacts overall performance.|
|Indexing Queue Sizes||Average queue sizes in the indexing pipeline. An instance with consistently high queue sizes is operating at or beyond its capacity. Use the load average, CPU usage, indexing volume, and error message panels to investigate further.|
|Indexing Rate||The average indexing rate by instance. An instance indexing more data than its peers impacts overall performance.|
|Bucket count||The total bucket count by instance. An index cluster infrastructure with very high bucket counts can impact overall performance.|
Assess search load
Use the Searches dashboard to determine the impact of searches on performance in your environment.
|Number of Searches by Instance||The count of searches by instance over the last 24 hours. A spike in search load can indicate poor search scheduling, a configuration change, or an unbalanced search load.|
|Number of Real-time Searches by Instance||The count of completed real-time searches by instance over the last 24 hours. Each real-time search increases the load on the indexers and the search head.|
|Top Ten Time Consuming Searches||The top ten longest running searches over the last 24 hours. Review the searches and determine if they can be optimized to reduce the run time.|
|Top Ten Time Consuming Correlation Searches||The top ten longest running correlation searches over the last 24 hours. Review the searches and determine if the run time exceeds the scheduled frequency. Use the load average, CPU usage, and other search load panels to investigate the reasons for long running searches.|
|Number of Skipped Searches by Reason||Displays the skipped searches by reason over the last 24 hours. Investigate the reasons to identify potential causes of skipped searches.|
Use the Analyzer to visually correlate different metrics and identify cause and effect relationships across different sources.
- In the Instance drop down, select the instance type.
- In the Overlay drop down, select the metric to display.
- (Optional) In the Overlay drop down, select another metric to display.
- Select a time range.
- Choose Submit.
Data model acceleration distribution
Use the dashboard to display data model distribution by instance.
- In the Data Model drop down, select a data model.
- Choose Submit.
|Data Model Acceleration Percentage Per Indexer||Displays the average completion for the selected data model by instance.|
|Data Bucket Distribution Per Indexer||Displays the index bucket distribution for the selected data model by instance.|
Troubleshooting the app
This documentation applies to the following versions of Splunk® App for ES Health Check: 1.0.0