Troubleshoot the Splunk Add-on for VMware

Data collection issues

Gaps in data collection

Gaps in data collection or slow data collection (example: data only coming in equal or greater to every 20 minutes) sometimes requires a restart of your scheduler. Any updates to ta_vmware_collections.conf requires a restart of the scheduler to take effect. Collection configurations using the UI do not require a restart of the scheduler.

vCenter connectivity issues

The Splunk Add-on for VMware cannot make read-only API calls to vCenter Server systems

Inability to make read-only API calls means that you do not have the appropriate vCenter Server service login credentials for each vCenter Server. Obtain vCenter Server service login credentials for each vCenter server.

The DCNs are forwarding data using index=_internal tests, but Splunk Add-on for VMware is not collecting any API data

API data collection issues are typically caused by one of two issues:

Network connectivity issues from the Scheduler to the DCNs.
You have not changed the DCN admin account password from its default value.

To resolve this issue:

In the Splunk Add-on for VMware Collection Configuration page, verify the accuracy of the settings in the collection page.
Verify that the admin password for each DCN is not set to changeme.
Verify that each DCN has a fixed IP address. If Splunk Add-on for VMware uses DCN host names instead of fixed IP addresses, verify that DNS lookups resolve to the correct IP addresses.

Hydra scheduler proxy access error

If you attempt to use a proxy server to connect to Splunk Web and receive the following proxy error message:

URLError: <urlopen error Tunnel connection failed: 403 Proxy Error>

You will also see the following error message in your log files:

hydra_scheduler_ta_<ip address>_scheduler_nidhogg.log

The hydra scheduler checks Splunk Web's proxy settings, and is trying to connect to a data collection node (DCN) through the proxy server. You cannot install a scheduler if you use a proxy server for Splunk Web.

Fix this problem by deploying and setting up your Splunk Enterprise instance inside the same network as your data collection nodes without the use of a proxy server.

Permissions in vSphere

Splunk Add-on for VMware must use valid vCenter Server service credentials to gain read-only access to vCenter Server systems using API calls. The account's vSphere role determines access privileges.

The following sections list the permissions for the vCenter server roles for all of the VMware versions that Splunk App for VMware supports.

Permissions to use your own syslog server

Best practice dictates that use your own syslog server, and that you install a Splunk Enterprise forwarder on the server to forward syslog data. Use these permissions to collect data from the ESXi hosts using your own syslog server. These system-defined privileges are always present for user-defined roles.

Permission
System.Anonymous
System.Read
System.View

Permissions to use an intermediate forwarder

Use these permissions if you configure your ESXi hosts to forward syslog data to one or more intermediate Splunk Enterprise forwarders. Use the vSphere client to enable the syslog firewall for the specific hosts. Note that in vSphere 5.x you do not need to add permissions beyond the default ones vSphere provides when creating a role.

Permission
System.Anonymous
System.Read
System.View
Host.Config.AdvancedConfig

Splunk add-on for VMware sets SSL for WebUI as Default

Disable WebUI SSL in the Splunk Add-on for VMware to prevent web.conf from overriding your deployment's SSL settings. Navigate to $SPLUNK_HOME/etc/system/local/ and make the following change to web.conf

[settings]
enableSplunkWebSSL = false

Inventory data fields are not getting extracted using spath command

Issue

The Splunk Add-on for VMware collects the VMware infrastructure inventory data. Inventory data can contain JSON content that exceeds the default spath command character limit of 5000 characters.

Resolution

If you're using the spath command to extract inventory data and the event contains more than 5000 characters, see Update the default character count limitations for the search commands.

Troubleshoot issue in cluster performance data collection caused by collection interval mismatch across configured vCenter

Problem

The add-on is unable to get cluster performance data. The following query doesn't return any results:

index="vmware-perf" source="VMPerf:ClusterComputeResource" | dedup sourcetype | table sourcetype

Also, you get the following error on the search head in hydra_worker_ta_vmware_collection_worker_*.log:

2020-04-23 16:12:27,883 ERROR [ta_vmware_collection_worker://worker_process20:19296] Server raised fault: 'A specified parameter was not correct: interval'

Cause

The collection interval is set to different values across the configured vCenters. For example, if the VC1 collection interval is 5 minutes, and VC 2 is set to 3 minutes, then it's possible that the add-on fetches cluster performance data for only one vCenter at a time.

This is because the add-on script caches the collection interval and uses it when fetching cluster performance data. If a vCenter has a different collection interval than this stored value, the DCN throws an error and isn't able to fetch cluster performance data.

Solution

Work around this error by setting the collection interval to the same value for all vCenters:

Connect to the web client https:// <vcenter server ip/hostname>.
Select vCenter Server.
Select Configure > General > Statistic.
Click Edit.
Update the collection interval to equal the same value across your configured vCenters.
Save the configuration.

Virtual machine performance data is missing

Problem

Unable to get virtual machine performance data. This query doesn't return any results:

index="vmware-perf" source="VMPerf:VirtualMachine" | dedup sourcetype | table sourcetype

And on the Scheduler machine, you see the following error message in splunkd.log:

03-30-2020 13:34:04.693 +0100 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py" splunk.AuthorizationFailed: [HTTP 403] Client is not authorized to perform requested action; https://127.0.0.1:8089/servicesNS/nobody/Splunk_TA_vmware/storage/passwords/

Cause

The admin user has been renamed and Splunk no longer has an "admin" named user.

To collect virtual machine performance data, ta_vmware_hierarchy_agent.py scripted input prepares the list Virtual Machine moids. So if this list isn't created and shared with the data collection node (DCN), the DCN isn't able to collect performance data for them.

For this scripted input, the parameter "passAuth" is used for getting sessionKey for authentication purposes. It's value is admin, which means the 'admin' user is required to do the authentication.

Check $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/default/inputs.conf

[script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py]
passAuth = admin

Resolution

There are 2 resolutions for this issue:

On the scheduler machine, create a new user with the name "admin" and assign the "admin" and splunk_vmware_admin roles to admin user.
Change the passAuth attribute value to the existing user name on the scheduler machine:

Add the passAuth = splunk-system-user parameter value to the following stanza in $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/local/inputs.conf:

[script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py] passAuth = splunk-system-user

Restart Splunk.

No data collection when DCN is configured with more than 8 worker processes

Problem

When there are more than 8 worker processes configured, the scheduler throws the following error and data is not collected.

2020-09-30 15:06:50,550 ERROR [ta_vmware_collection_scheduler_inframon://Global pool] [HydraWorkerNode] [establishGateway] could not connect to gateway=https://<DCN>:8008 for node=https://<DCN>:8089 due to a socket error, timeout, or other fundamental communication issue, marking node as dead

Cause

The DCS and DCN communicate with each other through the hydra gateway server. When the add-on is configured with more than 8 worker processes, the hydra gateway server takes a longer time to respond to the request. Therefore the scheduler can't authenticate the hydra gateway server, so no jobs are assigned to DCNs and no data is collected.

Resolution

On the scheduler machine, go to the Collection Configuration page and edit the configured DCNs to update the worker process count to 8 or less. If more worker processes are required then configure new DCN machines. See Prepare to deploy the DCN for the standard guidelines.

Error for unexpected keyword argument 'rewrite' on Scheduler

Problem

When Splunkd is restarted, the DCNs stop collecting data and the scheduler for the Splunk Add-on for VMware throws the following error:

2020-09-21 19:25:01,199 ERROR [ta_vmware_collection_scheduler://puff] Problem with hydra scheduler ta_vmware_collection_scheduler://puff:
 checkvCenterConnectivity() got an unexpected keyword argument 'rewrite'
 Traceback (most recent call last):
 File "/opt/splunk/etc/apps/SA-Hydra/bin/hydra/hydra_scheduler.py", line 2126, in run
 self.checkvCenterConnectivity(rewrite=True)
 TypeError: checkvCenterConnectivity() got an unexpected keyword argument 'rewrite'

Cause

In the add-on, the "checkvCenterConnectivity" function is defined to check the connectivity of the configured vCenter server every 30 minutes.

Because this function is defined in the Splunk_TA_vmware package and is called from the SA-Hydra scheduler module, it requires a supported SA-Hydra version installed with the Splunk_TA_vmware package on the scheduler instance.

Resolution

Up grade SA-Hydra or Splunk_TA_vmware to versions that are compatible with each other. Also, make sure the scheduler, DCN, search, and indexer have the same add-on version.

Here's the version compatibility matrix for Splunk_TA_vmware and supported SA-Hydra:

Splunk_TA_vmware version	SA-Hydra version
3.4.4	4.0.8
3.4.5	4.0.9
3.4.6	4.1.0
3.4.7	4.1.1

Related answers from Splunk Community

Troubleshoot the Splunk Add-on for VMware

Data collection issues

Gaps in data collection

vCenter connectivity issues

The Splunk Add-on for VMware cannot make read-only API calls to vCenter Server systems

The DCNs are forwarding data using index=_internal tests, but Splunk Add-on for VMware is not collecting any API data

Hydra scheduler proxy access error

Permissions in vSphere

Permissions to use your own syslog server

Permissions to use an intermediate forwarder

Splunk add-on for VMware sets SSL for WebUI as Default

Inventory data fields are not getting extracted using spath command

Issue

Resolution

Troubleshoot issue in cluster performance data collection caused by collection interval mismatch across configured vCenter

Problem

Cause

Solution

Virtual machine performance data is missing

Problem

Cause

Resolution

No data collection when DCN is configured with more than 8 worker processes

Problem

Cause

Resolution

Error for unexpected keyword argument 'rewrite' on Scheduler

Problem

Cause

Resolution

Comments

Troubleshoot the Splunk Add-on for VMware

Was this topic useful?