Troubleshoot the Splunk Add-on for VMware Metrics

If you have issues or problems with the Splunk Add-ons for VMware metrics, perform the following steps.

Gaps in data collection

Gaps in data collection or slow data collection (example: data only coming in equal or greater to every 20 minutes) sometimes requires a restart of your scheduler. Any updates to inframon_ta_vmware_collections.conf, inframon_ta_vmware_pool.conf or inframon_hydra_node.conf file requires a restart of the scheduler to take effect. Collection configurations using the UI do not require a restart of the scheduler.

The Splunk Add-on for VMware cannot make read-only API calls to vCenter Server systems

Inability to make read-only API calls means that you do not have the appropriate vCenter Server service login credentials for each vCenter Server. Obtain vCenter Server service login credentials for each vCenter server.

The DCNs are forwarding data using index=_internal tests, but Splunk Add-on for VMware Metrics is not collecting any API data

API data collection issues are typically caused by one of two issues:

Network connectivity issues from the Scheduler to the DCNs.
You have not changed the DCN admin account password from its default value.

To resolve this issue:

In the Splunk Add-on for VMware Metrics Collection Configuration page, verify the accuracy of the settings in the collection page.
Verify that the admin password for each DCN is not set to changeme.
Verify that each DCN has a fixed IP address. If Splunk Add-on for VMware Metrics uses DCN host names instead of fixed IP addresses, verify that DNS lookups resolve to the correct IP addresses.

Hydra scheduler proxy access error

If you attempt to use a proxy server to connect to Splunk Web and receive the following proxy error message:

URLError: <urlopen error Tunnel connection failed: 403 Proxy Error>

You'll also see the following error message in your log files:

hydra_inframon_scheduler_ta_vmware_collection_scheduler_Global pool.log

The hydra scheduler checks Splunk Web's proxy settings, and is trying to connect to a data collection node (DCN) through the proxy server. You cannot install a scheduler if you use a proxy server for Splunk Web.

Fix this problem by deploying and setting up your Splunk Enterprise instance inside the same network as your data collection nodes without the use of a proxy server.

Permission in vSphere

Splunk Add-on for VMware Metrics must use valid vCenter Server service credentials to gain read-only access to vCenter Server systems using API calls. The account's vSphere role determines access privileges.

The following sections list the permissions for the vCenter server roles for all of the VMware versions that Splunk Add-on for VMware Metrics supports.

Permissions to use your own syslog server

Best practice dictates that use your own syslog server, and that you install a Splunk Enterprise forwarder on the server to forward syslog data. Use these permissions to collect data from the ESXi hosts using your own syslog server. These system-defined privileges are always present for user-defined roles.

System.Anonymous
System.Read
System.View

Permissions to use an intermediate forwarder

Use these permissions if you configure your ESXi hosts to forward syslog data to one or more intermediate Splunk Enterprise forwarders. Use the vSphere client to enable the syslog firewall for the specific hosts. Note that in vSphere 5.x you do not need to add permissions beyond the default ones vSphere provides when creating a role.

System.Anonymous
System.Read
System.View
Host.Config.AdvancedConfig

Troubleshoot issue in cluster performance data collection caused by collection interval mismatch across configured vCenter

Problem

The add-on is unable to get cluster performance data. The following query doesn't return any results:

| mstats avg(_value) WHERE index="vmware-perf-metrics" AND source="VMPerf:ClusterComputeResource" AND metric_name=* BY sourcetype | dedup sourcetype | table sourcetype

Also, you get the following error on the search head in hydra_inframon_worker_ta_vmware_collection_<worker-process>.log:

2020-04-23 16:12:27,883 ERROR [ta_vmware_collection_worker_inframon://worker_process20:19296] Server raised fault: 'A specified parameter was not correct: interval'

Cause

The collection interval is set to different values across the configured vCenters. For example, if the VC1 collection interval is 5 minutes, and VC 2 is set to 3 minutes, then it's possible that the add-on fetches cluster performance data for only one vCenter at a time.

This is because the add-on script caches the collection interval and uses it when fetching cluster performance data. If a vCenter has a different collection interval than this stored value, the DCN throws an error and isn't able to fetch cluster performance data.

Resolution

Connect to the web client https://<vcenter-server-ip/hostname>.
Select vCenter Server.
Select Configure > General > Statistic.
Click Edit.
Update the collection interval to equal the same value across your configured vCenters.
Save the configuration.

Issue in data collection of the vCenter Server configured in both Splunk Add-on for VMware and Splunk Add-on for VMware Metrics on the same scheduler machine

Issue

Data for the vCenter Server is not collected by the Splunk Add-on for VMware, when the same vCenter Server is configured in the Splunk Add-on for VMware and Splunk Add-on for VMware Metrics.

Cause

This issue can occur when the vCenter Server with same username is configured in both Splunk Add-on for VMware and Splunk Add-on for VMware Metrics and the scheduler machine for both the TAs is same. But, the credentials of the vCenter Server is not present in the namespace of the Splunk Add-on for VMware.

To check the namespace of the credentials, follow below steps:

On the scheduler machine containing the Splunk Add-on for VMware, go to https://<scheduler-ip>:<management-port>/servicesNS/nobody/Splunk_TA_vmware/storage/passwords.
Search for the credentials of the configured vCenter server. It will be in the form of "<vcenter-FQDN>:<vcenter-username>:".
Check the value of eai:acl > app parameter to have Splunk_TA_vmware.

Resolution

As the credentials of the vCenter Server are shared globally, Splunk does not create the credential stanza for the vCenter Server with same user name in both the TAs. Instead, it will only update the credentials once they are created in any TA.

There are two approaches to resolve this issue:

Option 1:

Create a new user in the vCenter Server.
Delete the vCenter Server first from the Collection Configuration page of the Splunk Add-on for VMware, and then from the Collection Configuration page of Splunk Add-on for VMware Metrics.
Configure the same vCenter Server with different users in each add-on to avoid this problem.
As both the vCenter Servers are configured with different usernames, each add-on will use the respective username to collect data.

Option 2:

Delete the vCenter Server first from the Collection Configuration page of the Splunk Add-on for VMware, and then from the Collection Configuration page of Splunk Add-on for VMware Metrics.
Configure the vCenter server first in the Splunk Add-on for VMware, and then configure the vCenter server in the Splunk Add-on for VMware Metrics.
Verify that the credentials are present in the namespace of the Splunk Add-on for VMware.

Invalid value shown in "Last connected time" field for each configured vcenter and DCN when upgraded to Splunk Add-on for VMware Metric

Issue

After upgrading to the Splunk Add-on for VMware Metrics version 4.0.x from Splunk Add-on for VMware version 3.4.7, the "Last connected time" field for the configured vCenters and DCNs shows date "Thu Jan 01 1970 05:30:00 GMT+0530 (India Standard Time)".

Cause

The field "Last connected time" was not present in the Splunk Add-on for VMware for vCenters and DCNs. Therefore, after upgrading from Splunk Add-on for VMware Metrics version 4.0.x from Splunk Add-on for VMware version 3.4.7, the add-on expects the occurrence of the field in the configuration file.

The invalid value shown in UI will not affect the data collection or scheduling of the jobs.

Resolution

To update the "Last connected time" field for the vCenter Server or the DCN, you can follow below steps:

Following these steps for the vCenter/DCN entity triggers the scheduler to restart.

Go to Collection Configuration page of Splunk Add-on for VMware Metrics.
Select the vCenter/DCN entity for which you want to update the "Last connected time" field.
Click Action in the row of the entity and click Edit.
Configure the vCenter/DCN entity again by entering the correct password.
Check the Last connected time field for the entity.

No data collection when DCN is configured with more than 8 worker processes

When there are more than 8 worker processes configured, the scheduler throws the following error continuously and data is not collected:

2020-09-30 15:06:50,550 ERROR [ta_vmware_collection_scheduler_inframon://Global pool] [HydraWorkerNode] [establishGateway] could not connect to gateway=https://<DCN>:8008 for node=https://<DCN>:8089 due to a socket error, timeout, or other fundamental communication issue, marking node as dead

Cause

The DCS and DCN communicate with each other through the hydra gateway server. When the add-on is configured with more than 8 worker processes, the hydra gateway server takes a longer time to respond to the request. Therefore, the scheduler can't authenticate the hydra gateway server, so no jobs are assigned to DCNs and no data is collected.

Resolution

On the scheduler machine, go to the Collection Configuration page and edit the configured DCNs to update the worker process count to 8 or less. As per the standard guidelines, if more worker processes are required, configure the new DCN machines.

"The search you ran returned a number of fields that exceeded the current indexed field extraction limit" warning in Job Inspect

Issue

You get the following error in search job inspect while executing search query for VMware Inventory data:

The search you ran returned a number of fields that exceeded the current indexed field extraction limit. To ensure that all fields are extracted for search, set limits.conf: [kv] / indexed_kv_limit to a number that is higher than the number of fields contained in the files that you index.

Cause

The add-on contains the following index-time extraction to extract fields from JSON event for Inventory and task-event data:

INDEXED_EXTRACTIONS = JSON
KV_MODE = none

In Splunk, the parameter indexed_kv_limit is configured with a default value of 200 in limit.conf. As per its default value, a maximum of 200 key-value pairs can be extracted for the search. IF the resulting event contains more than this limit, the above warning will be shown in the UI and Job Inspect.

Resolution

On the same instance where this warning is shown, follow these steps to resolve the warning:

Create limits.conf file at $SPLUNK_HOME/etc/system/local.
Add the following stanza and replace <number that is higher than the number of fields> with the number of fields:
```
[kv]
indexed_kv_limit = <number that is higher than the number of fields>
```
Restart your Splunk instance.

Related answers from Splunk Community

Troubleshoot the Splunk Add-on for VMware Metrics

Gaps in data collection

The Splunk Add-on for VMware cannot make read-only API calls to vCenter Server systems

The DCNs are forwarding data using index=_internal tests, but Splunk Add-on for VMware Metrics is not collecting any API data

Hydra scheduler proxy access error

Permission in vSphere

Permissions to use your own syslog server

Permissions to use an intermediate forwarder

Troubleshoot issue in cluster performance data collection caused by collection interval mismatch across configured vCenter

Problem

Cause

Resolution

Issue in data collection of the vCenter Server configured in both Splunk Add-on for VMware and Splunk Add-on for VMware Metrics on the same scheduler machine

Issue

Cause

Resolution

Invalid value shown in "Last connected time" field for each configured vcenter and DCN when upgraded to Splunk Add-on for VMware Metric

Issue

Cause

Resolution

No data collection when DCN is configured with more than 8 worker processes

Cause

Resolution

"The search you ran returned a number of fields that exceeded the current indexed field extraction limit" warning in Job Inspect

Issue

Cause

Resolution

Comments

Troubleshoot the Splunk Add-on for VMware Metrics

Was this topic useful?