Hydra troubleshooting searches in the Splunk Add-on for VMware Metrics
Hydra troubleshooting searches are the search queries of Hydra troubleshooting dashboards that help you identify the issues related to jobs and data collection. The Hydra troubleshooting dashboards are present in the Splunk App for Infrastructure (SAI). If you're not using SAI, follow these steps to use the search-time extractions for these searches.
Prerequisite
The machine on which you are performing these steps must have DCS and DCN logs.
Steps
- Select one of the following packages to add search-time extractions:
- Splunk_TA_vmware_inframon
- SA-Hydra-inframon
- Splunk_TA_vcenter
- Splunk_TA_esxilogs
- SA-VMWIndex-inframon
- Add the following stanzas in the props.conf file present in the local directory of the selected package. If the props.conf file doesn't exist, create a new props.conf file.
[ta_vmware_hierarchy_agent] REPORT-hydraloggerfields = hydra_logger_fields ## Original from SA-Hydra [hydra_scheduler] REPORT-schedulerfields = hydra_scheduler_log_fields [hydra_worker] REPORT-workerfields = hydra_worker_log_fields REPORT-pool_name_field = pool_name_field_extraction [source::.../var/log/splunk/*_configuration.log] REPORT-pool_name_field = pool_name_field_extraction [hydra_gateway] REPORT-gatewayfields = hydra_gateway_log_fields [hydra_access] REPORT-gatewayfields = hydra_access_log_fields
- Add the following search-time extractions to the transforms.conf file present in the local directory of the selected package. If the transforms.conf file doesn't exist, create a new transforms.conf file.
[hydra_logger_fields] REGEX = ^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d (\w+) \[([\w_]+):\/\/([^\]]+)\] (\[[^\]]+\])?\s?(.+)$ FORMAT = level::$1 input::$2 scheduler::$3 component::$4 message::$5 [hydra_worker_log_fields] REGEX = ^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d (\w+) \[([\w_]+):\/\/([^:]+):(\d+)\] (\[[^\]]+\])?\s?(.+)$ FORMAT = level::$1 input::$2 worker::$3 pid::$4 component::$5 message::$6 [pool_name_field_extraction] REGEX = \[pool=([^\]]*)\] FORMAT = pool::$1 MV_ADD = true [hydra_scheduler_log_fields] REGEX = ^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d (\w+) \[([\w_]+):\/\/([^\]]+)\] (\[[^\]]+\])?\s?(.+)$ FORMAT = level::$1 input::$2 scheduler::$3 component::$4 message::$5 [hydra_gateway_log_fields] REGEX = ^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d (\w+) \[([\w_]+):([^\]]+)\] (\[[^\]]+\])?\s?(.+)$ FORMAT = level::$1 service::$2 pid::$3 component::$4 message::$5 [hydra_access_log_fields] REGEX = ^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d (\w+) ((\w+) ([^\s]+)) '((\d+) ([^']+))' - - - (\d+)ms$ FORMAT = level::$1 request::$2 method::$3 uri_path::$4 status_full::$5 status::$6 status_message::$7 spent::$8
- Make sure the above search-time extractions are globally accessible to all the apps.
- Restart your Splunk software.
Hydra framework status searches
Use the search queries of the Hydra Framework Status dashboard to identify issues related to jobs handled by DCN. To enable data population for these search queries, add the search-time extractions to the package in etc/apps and made it globally available.
Query name | Search query | Description |
---|---|---|
Job Expiration and Failure Count Over Pool |
|
Number of jobs expired or failed for particular pool. DCN (Worker) logs are required to populate this panel. |
Job Expirations by DCN |
|
Number of jobs assigned and expired on each DCN versus time. DCN (Worker) logs are required to populate this panel. |
Jobs Handled by DCN |
|
Number of jobs successfully completed by each DCN versus time. DCN (Worker) logs are required to populate this panel. |
Job Scheduling Duration Range (DEBUG level logs only) |
|
Average, Max and Min time taken for Scheduler to assign jobs to DCNs at every iteration versus time. It will populate when DEBUG level is enabled on your scheduler. Scheduler logs are required to populate this panel. |
Collection Task Duration Range (Log Scale) |
|
Minimum, Median and Maximum execution time to perform all the task. DCN (Worker) logs are required to populate this panel. |
Median Task Performance Over Targets |
|
Target (vCenter) and task wise median job execution time reported by Worker on DCN. DCN (Worker) logs are required to populate this panel. |
Task Expiration Count Over DCN |
|
Task wise no. of jobs assigned and expired on each DCN. DCN (Worker) logs are required to populate this panel. |
Task Failure Count Over Target |
|
Task wise no. of jobs assigned and failed on each DCN. DCN (Worker) logs are required to populate this panel. |
Last 100 Worker Errors - excluding expiration |
|
Last 100 errors occurred in worker processes in all DCNs excluding errors which occurred due to job expiration. DCN (Worker) logs are required to populate this panel. |
Last 100 Scheduler Errors |
|
Last 100 errors occurred in Scheduler process. Scheduler logs are required to populate this panel. |
Hydra scheduler status
Use the Hydra Scheduler Status page to identify issues related to jobs assigned by your scheduler. To enable data population for these search queries, make sure you have added the search-time extractions to the package in etc/apps and made it globally available.
Some of the following queries require the DEBUG level logs of the Scheduler. To enable the DEBUG level logging for the scheduler, perform the following steps on the scheduler:
- Go to Settings > Data inputs.
- Select TA-VMware-inframon Collection Scheduler from the inputs.
- Click Global pool from Scheduler Name.
- Add DEBUG as the logging level.
- Click on Save.
Query name | Search query | Description |
---|---|---|
Job Assignment by DCN |
|
Number of jobs assigned to each DCN versus time. It will populate when DEBUG level is enabled on scheduler. Scheduler logs are required to populate this panel. |
Max Unclaimed Queue Length by DCN |
|
Number of unclaimed jobs reported by each DCN to Scheduler versus time. It will populate when DEBUG level is enabled on scheduler. Scheduler logs are required to populate this panel. |
Dead Nodes |
|
List of dead nodes (DCNs) and their count at every 5 minute interval. Scheduler logs are required to populate this panel. |
Activity Panel |
To see logs of all the success and failure operations:
To see logs of successful operations:
To see logs of failed operations:
|
It will show the logs of the configuration activities like adding DCN, adding vCenter. It will also have filter for "failure" and "Success" to show the logs as per status of the operation. |
Data collection configuration file reference for the Splunk Add-on for VMware Metrics | Third-party software credits for the Splunk Add-on for VMware Metrics |
This documentation applies to the following versions of Splunk® Supported Add-ons: released
Feedback submitted, thanks!