Troubleshoot the Splunk Add-on for VMware
Data collection issues
Gaps in data collection
Gaps in data collection or slow data collection (example: data only coming in equal or greater to every 20 minutes) sometimes requires a restart of your scheduler. Any updates to ta_vmware_collections.conf
requires a restart of the scheduler to take effect. Collection configurations using the UI do not require a restart of the scheduler.
vCenter connectivity issues
The Splunk Add-on for VMware cannot make read-only API calls to vCenter Server systems
Inability to make read-only API calls means that you do not have the appropriate vCenter Server service login credentials for each vCenter Server. Obtain vCenter Server service login credentials for each vCenter server.
The DCNs are forwarding data using index=_internal tests, but Splunk Add-on for VMware is not collecting any API data
API data collection issues are typically caused by one of two issues:
- Network connectivity issues from the Scheduler to the DCNs.
- You have not changed the DCN admin account password from its default value.
To resolve this issue:
- In the Splunk Add-on for VMware Collection Configuration page, verify the accuracy of the settings in the collection page.
- Verify that the
admin
password for each DCN is not set tochangeme
. - Verify that each DCN has a fixed IP address. If Splunk Add-on for VMware uses DCN host names instead of fixed IP addresses, verify that DNS lookups resolve to the correct IP addresses.
Hydra scheduler proxy access error
If you attempt to use a proxy server to connect to Splunk Web and receive the following proxy error message:
URLError: <urlopen error Tunnel connection failed: 403 Proxy Error>
You will also see the following error message in your log files:
hydra_scheduler_ta_<ip address>_scheduler_nidhogg.log
The hydra scheduler checks Splunk Web's proxy settings, and is trying to connect to a data collection node (DCN) through the proxy server. You cannot install a scheduler if you use a proxy server for Splunk Web.
Fix this problem by deploying and setting up your Splunk Enterprise instance inside the same network as your data collection nodes without the use of a proxy server.
Permissions in vSphere
Splunk Add-on for VMware must use valid vCenter Server service credentials to gain read-only access to vCenter Server systems using API calls. The account's vSphere role determines access privileges.
The following sections list the permissions for the vCenter server roles for all of the VMware versions that Splunk App for VMware supports.
Permissions to use your own syslog server
Best practice dictates that use your own syslog server, and that you install a Splunk Enterprise forwarder on the server to forward syslog data. Use these permissions to collect data from the ESXi hosts using your own syslog server. These system-defined privileges are always present for user-defined roles.
Permission |
---|
System.Anonymous |
System.Read |
System.View |
Permissions to use an intermediate forwarder
Use these permissions if you configure your ESXi hosts to forward syslog data to one or more intermediate Splunk Enterprise forwarders. Use the vSphere client to enable the syslog firewall for the specific hosts. Note that in vSphere 5.x you do not need to add permissions beyond the default ones vSphere provides when creating a role.
Permission |
---|
System.Anonymous |
System.Read |
System.View |
Host.Config.AdvancedConfig |
Splunk add-on for VMware sets SSL for WebUI as Default
Disable WebUI SSL in the Splunk Add-on for VMware to prevent web.conf from overriding your deployment's SSL settings.
Navigate to $SPLUNK_HOME/etc/system/local/
and make the following change to web.conf
[settings] enableSplunkWebSSL = false
Inventory data fields are not getting extracted using spath command
Issue
The Splunk Add-on for VMware collects the VMware infrastructure inventory data. Inventory data can contain JSON content that exceeds the default spath command character limit of 5000 characters.
Resolution
If you're using the spath command to extract inventory data and the event contains more than 5000 characters, see Update the default character count limitations for the search commands.
Troubleshoot issue in cluster performance data collection caused by collection interval mismatch across configured vCenter
Problem
The add-on is unable to get cluster performance data. The following query doesn't return any results:
index="vmware-perf" source="VMPerf:ClusterComputeResource" | dedup sourcetype | table sourcetype
Also, you get the following error on the search head in hydra_worker_ta_vmware_collection_worker_*.log
:
2020-04-23 16:12:27,883 ERROR [ta_vmware_collection_worker://worker_process20:19296] Server raised fault: 'A specified parameter was not correct: interval'
Cause
The collection interval is set to different values across the configured vCenters. For example, if the VC1 collection interval is 5 minutes, and VC 2 is set to 3 minutes, then it's possible that the add-on fetches cluster performance data for only one vCenter at a time.
This is because the add-on script caches the collection interval and uses it when fetching cluster performance data. If a vCenter has a different collection interval than this stored value, the DCN throws an error and isn't able to fetch cluster performance data.
Solution
Work around this error by setting the collection interval to the same value for all vCenters:
- Connect to the web client
https:// <vcenter server ip/hostname>
. - Select vCenter Server.
- Select Configure > General > Statistic.
- Click Edit.
- Update the collection interval to equal the same value across your configured vCenters.
- Save the configuration.
Virtual machine performance data is missing
Problem
Unable to get virtual machine performance data. This query doesn't return any results:
index="vmware-perf" source="VMPerf:VirtualMachine" | dedup sourcetype | table sourcetype
And on the Scheduler machine, you see the following error message in splunkd.log:
03-30-2020 13:34:04.693 +0100 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py" splunk.AuthorizationFailed: [HTTP 403] Client is not authorized to perform requested action; https://127.0.0.1:8089/servicesNS/nobody/Splunk_TA_vmware/storage/passwords/
Cause
The admin user has been renamed and Splunk no longer has an "admin" named user.
To collect virtual machine performance data, ta_vmware_hierarchy_agent.py scripted input prepares the list Virtual Machine moids. So if this list isn't created and shared with the data collection node (DCN), the DCN isn't able to collect performance data for them.
For this scripted input, the parameter "passAuth" is used for getting sessionKey for authentication purposes. It's value is admin, which means the 'admin' user is required to do the authentication.
Check $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/default/inputs.conf
[script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py] passAuth = admin
Resolution
There are 2 resolutions for this issue:
- On the scheduler machine, create a new user with the name "admin" and assign the "admin" and splunk_vmware_admin roles to admin user.
- Change the passAuth attribute value to the existing user name on the scheduler machine:
- Add the
passAuth = splunk-system-user
parameter value to the following stanza in$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/local/inputs.conf
: - Restart Splunk.
[script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py] passAuth = splunk-system-user
No data collection when DCN is configured with more than 8 worker processes
Problem
When there are more than 8 worker processes configured, the scheduler throws the following error and data is not collected.
2020-09-30 15:06:50,550 ERROR [ta_vmware_collection_scheduler_inframon://Global pool] [HydraWorkerNode] [establishGateway] could not connect to gateway=https://<DCN>:8008 for node=https://<DCN>:8089 due to a socket error, timeout, or other fundamental communication issue, marking node as dead
Cause
The DCS and DCN communicate with each other through the hydra gateway server. When the add-on is configured with more than 8 worker processes, the hydra gateway server takes a longer time to respond to the request. Therefore the scheduler can't authenticate the hydra gateway server, so no jobs are assigned to DCNs and no data is collected.
Resolution
On the scheduler machine, go to the Collection Configuration page and edit the configured DCNs to update the worker process count to 8 or less. If more worker processes are required then configure new DCN machines. See Prepare to deploy the DCN for the standard guidelines.
Error for unexpected keyword argument 'rewrite' on Scheduler
Problem
When Splunkd is restarted, the DCNs stop collecting data and the scheduler for the Splunk Add-on for VMware throws the following error:
2020-09-21 19:25:01,199 ERROR [ta_vmware_collection_scheduler://puff] Problem with hydra scheduler ta_vmware_collection_scheduler://puff: checkvCenterConnectivity() got an unexpected keyword argument 'rewrite' Traceback (most recent call last): File "/opt/splunk/etc/apps/SA-Hydra/bin/hydra/hydra_scheduler.py", line 2126, in run self.checkvCenterConnectivity(rewrite=True) TypeError: checkvCenterConnectivity() got an unexpected keyword argument 'rewrite'
Cause
In the add-on, the "checkvCenterConnectivity" function is defined to check the connectivity of the configured vCenter server every 30 minutes.
Because this function is defined in the Splunk_TA_vmware package and is called from the SA-Hydra scheduler module, it requires a supported SA-Hydra version installed with the Splunk_TA_vmware package on the scheduler instance.
Resolution
Up grade SA-Hydra or Splunk_TA_vmware to versions that are compatible with each other. Also, make sure the scheduler, DCN, search, and indexer have the same add-on version.
Here's the version compatibility matrix for Splunk_TA_vmware and supported SA-Hydra:
Splunk_TA_vmware version | SA-Hydra version |
---|---|
3.4.4 | 4.0.8 |
3.4.5 | 4.0.9 |
3.4.6 | 4.1.0 |
3.4.7 | 4.1.1 |
Upgrade the Splunk Add-on for VMware from v3.4.5 to v4.0.0 | Source types for the Splunk Add-on for VMware |
This documentation applies to the following versions of Splunk® Supported Add-ons: released
Feedback submitted, thanks!