Splunk® Supported Add-ons

Splunk Add-on for VMware

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Troubleshoot the Splunk Add-on for VMware

Data collection issues

Gaps in data collection

Gaps in data collection or slow data collection (example: data only coming in equal or greater to every 20 minutes) sometimes requires a restart of your scheduler. Any updates to ta_vmware_collections.conf requires a restart of the scheduler to take effect. Collection configurations using the UI do not require a restart of the scheduler.

vCenter connectivity issues

The Splunk Add-on for VMware cannot make read-only API calls to vCenter Server systems

Inability to make read-only API calls means that you do not have the appropriate vCenter Server service login credentials for each vCenter Server. Obtain vCenter Server service login credentials for each vCenter server.

The Splunk Add-on for VMware is not receiving data

If you have configured vCenter Server 5.0 but no data is coming in, the vCenter Server 5.0 and 5.1 are missing WSDL files that are required for Splunk Add-on for VMware to make API calls to vCenter Server.

  • reflect-message.xsd
  • reflect-types.xsd

Resolve this issue by installing the missing VMware WSDL files as documented in the vSphere Web Services SDK WSDL workaround in the VMware documentation. Note that the programdata folder is typically a hidden folder.

The DCNs are forwarding data using index=_internal tests, but Splunk App for VMware is not collecting any API data

API data collection issues are typically caused by one of two issues:

  • Network connectivity issues from the Scheduler to the DCNs.
  • You have not changed the DCN admin account password from its default value.

To resolve this issue:

  1. In the Splunk Add-on for VMware Collection Configuration page, verify the accuracy of the settings in the collection page.
  2. Verify that the admin password for each DCN is not set to changeme.
  3. Verify that each DCN has a fixed IP address. If Splunk App for VMware uses DCN host names instead of fixed IP addresses, verify that DNS lookups resolve to the correct IP addresses.

Hydra scheduler proxy access error

If you attempt to use a proxy server to connect to Splunk Web and receive the following proxy error message:

URLError: <urlopen error Tunnel connection failed: 403 Proxy Error>

You will also see the following error message in your log files:

hydra_scheduler_ta_<ip address>_scheduler_nidhogg.log

The hydra scheduler checks Splunk Web's proxy settings, and is trying to connect to a data collection node (DCN) through the proxy server. You cannot install a scheduler if you use a proxy server for Splunk Web.

Fix this problem by deploying and setting up your Splunk Enterprise instance inside the same network as your data collection nodes without the use of a proxy server.

Permissions in vSphere

Splunk Add-on for VMware must use valid vCenter Server service credentials to gain read-only access to vCenter Server systems using API calls. The account's vSphere role determines access privileges.

The following sections list the permissions for the vCenter server roles for all of the VMware versions that Splunk App for VMware supports.

Permissions to use your own syslog server

Best practice dictates that use your own syslog server, and that you install a Splunk Enterprise forwarder on the server to forward syslog data. Use these permissions to collect data from the ESXi hosts using your own syslog server. These system-defined privileges are always present for user-defined roles.

Permission
System.Anonymous
System.Read
System.View

Permissions to use an intermediate forwarder

Use these permissions if you configure your ESXi hosts to forward syslog data to one or more intermediate Splunk Enterprise forwarders. Use the vSphere client to enable the syslog firewall for the specific hosts. Note that in vSphere 5.x you do not need to add permissions beyond the default ones vSphere provides when creating a role.

Permission
System.Anonymous
System.Read
System.View
Host.Config.AdvancedConfig

Splunk add-on for VMware sets SSL for WebUI as Default

Disable WebUI SSL in the Splunk Add-on for VMware to prevent web.conf from overriding your deployment's SSL settings. Navigate to $SPLUNK_HOME/etc/system/local/ and make the following change to web.conf

[settings]
enableSplunkWebSSL = false

Esxilog issue

Problem

Not getting esxilogs while forwarding it to indexers which are in a cluster.

Or on indexers, you see the following ERROR message in splunkd.log:

ERROR AggregatorMiningProcessor - Uncaught Exception in Aggregator, skipping an event: 
Can't open DateParser XML configuration file 
"/opt/splunk/etc/apps/Splunk_TA_esxilogs/default/syslog_datetime.xml": No such file or
directory - data_source="/data/log_files/syslog/<hostname>.log", data_host="<hostname>",
data_sourcetype="vmw-syslog"

Cause

While esxilogs are directly forwarded to indexers (which are in cluster), splunkd.log on indexers will show the above error.

Reason: Splunk is not able to find custom datetime (syslog_datetime.xml) file which is used to extract dates and timestamps from events.

The following parameter is set for this in props.conf.

DATETIME_CONFIG = /etc/apps/Splunk_TA_esxilogs/default/syslog_datetime.xml

As indexers are in cluster, Splunk_TA_esxilogs on indexers would be installed under slave-apps (/etc/slave-apps/) hence above path would not exist.

Resolution

  1. On cluster-master, Create local directory in the $SPLUNK_HOME/etc/master-apps/Splunk_TA_esxilogs directory, if not present.
  2. If not present, create props.conf file in the $SPLUNK_HOME/etc/master-apps/Splunk_TA_esxilogs/local directory and add the below stanza and configuration to it:
    [vmw-syslog]
    DATETIME_CONFIG = /etc/slave-apps/Splunk_TA_esxilogs/default/syslog_datetime.xml
    
  3. Push the bundle on indexers.

Inventory data fields are not getting extracted using spath command

Issue

The Splunk Add-on for VMware collects the VMware infrastructure inventory data. Inventory data can contain JSON content that exceeds the default spath command character limit of 5000 characters.

Resolution

If you're using the spath command to extract inventory data and the event contains more than 5000 characters, see Update the default character count limitations for the search commands.

Enable cluster DRS service error: lookup table "TimeClusterServicesAvailability" is empty on some dashboards

Problem

Here are troubleshooting steps for enabling cluster DRS service if you see the error Lookup table "TimeClusterServicesAvailability" is empty on the following cluster compute resource related dashboards:

  • Capacity Planning for Clusters-CPU Headroom
  • Capacity Planning for Clusters-Memory Headroom
  • Capacity Planning (Clusters)
  • Cluster details

If you do not want to enable cluster DRS service, ignore the error.

Cause

The add-on is not able to get following required metrics, so the TimeClusterServicesAvailability lookup is empty:

  • p_average_clusterServices_effectivecpu_megaHertz
  • p_average_clusterServices_effectivemem_megaBytes

Resolution

Enable cluster DRS service of the configured vCenter to get the required metrics:

  • Log in to configured vCenter using vsphere client.
  • Navigate to Home > Inventory > Hosts and Clusters.
  • Right click on Cluster.
  1. Open Cluster in Settings
  2. Go to Cluster features and click Turn on vSphere DRS.

Troubleshoot the error "ValueError: unsupported pickle protocol: 3" in hydra worker logs

Problem

The Splunk add-on for VMware is unable to run the hydra worker script and following logs in hydra worker:

ERROR [ta_vmware_collection_worker://worker_process2:28696] Problem with hydra worker ta_vmware_collection_worker://worker_process2:28696: unsupported pickle protocol: 3
Traceback (most recent call last):
File "/home/splunker/splunk/etc/apps/SA-Hydra/bin/hydra/hydra_worker.py", line 622, in run
  self.establishMetadata()
File "/home/splunker/splunk/etc/apps/SA-Hydra/bin/hydra/hydra_worker.py", line 64, in establishMetadata
  metadata_stanza = HydraMetadataStanza.from_name("metadata", self.app, "nobody")
File "/home/splunker/splunk/etc/apps/SA-Hydra/bin/hydra/models.py", line 610, in from_name
  host_path=host_path)
File "/home/splunker/splunk/lib/python2.7/site-packages/splunk/models/base.py", line 533, in get
  return self._from_entity(entity)
File "/home/splunker/splunk/etc/apps/SA-Hydra/bin/hydra/models.py", line 345, in _from_entity
  obj.from_entity(entity)
File "/home/splunker/splunk/lib/python2.7/site-packages/splunk/models/base.py", line 903, in from_entity
  super(SplunkAppObjModel, self).from_entity(entity)
File "/home/splunker/splunk/lib/python2.7/site-packages/splunk/models/base.py", line 661, in from_entity
  return self.set_entity_fields(entity)
File "/home/splunker/splunk/etc/apps/SA-Hydra/bin/hydra/models.py", line 544, in set_entity_fields
  from_api_val = wildcard_field.field_class.from_apidata(entity, entity_attr)
File "/home/splunker/splunk/etc/apps/SA-Hydra/bin/hydra/models.py", line 123, in from_apidata
  obj = cPickle.loads(b64decode(val))
ValueError: unsupported pickle protocol: 3

Cause

The add-on is unable to deserialize the python object that is serialized using another python version than the current python version on which add-on is running. This usually happens when the add-on that was running on Python 3, is running on Python 2. Python 2 is unable to deserialize the python object serialized by Python 3.

Resolution

  1. Stop the Scheduler from Collection Configuration page.
  2. Stop Splunk on DCN.
  3. On DCN, go to $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/local and remove the following files:
    1. ta_vmware_cache.conf
    2. hydra_session.conf
    3. hydra_metadata.conf
  4. Start Splunk on DCN.
  5. Start the Scheduler from Collection Configuration page.

Troubleshoot the error "ImportError: bad magic number in 'uuid': b'\x03\xf3\r\n'" in hydra scheduler logs

Problem

The Splunk add-on for VMware is unable to run the hydra scheduler script and the following logs in hydra scheduler, so no jobs are assigned.

Traceback (most recent call last):
 File "ta_vmware_collection_scheduler.py", line 20, in <module>
   from hydra.hydra_scheduler import HydraScheduler, HydraCollectionManifest, HydraConfigToken
 File "/opt/splunk/etc/apps/SA-Hydra/bin/hydra/hydra_scheduler.py", line 11, in <module>
   import uuid
ImportError: bad magic number in 'uuid': b'\x03\xf3\r\n'

Cause

This error is caused by the uuid.pyc file that is compiled on Splunk 7.2.x, Splunk 7.3.x or Splunk 8.x ( Python version 2) and is being run on Splunk version 8.x (Python version 3).

Resolution

  1. Stop Scheduler from Collection Configuration page.
  2. Stop Splunk on Scheduler and DCN machines.
  3. Remove all the .pyc files existing in following directory on all the DCN machines and scheduler.
    1. $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin
    2. $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/vim25
    3. $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware
    4. $SPLUNK_HOME/etc/apps/SA-Hydra/bin/hydra
  4. Start Splunk on DCN machine
  5. Start Splunk on Scheduler machine
  6. Start Scheduler from Collection Configuration page.

Troubleshoot issue in cluster performance data collection caused by collection interval mismatch across configured vCenter

Problem

The add-on is unable to get cluster performance data. The following query doesn't return any results:

index="vmware-perf" source="VMPerf:ClusterComputeResource" | dedup sourcetype | table sourcetype

Also, you get the following error on the search head in hydra_worker_ta_vmware_collection_worker_*.log:

2020-04-23 16:12:27,883 ERROR [ta_vmware_collection_worker://worker_process20:19296] Server raised fault: 'A specified parameter was not correct: interval' 

Cause

The collection interval is set to different values across the configured vCenters. For example, if the VC1 collection interval is 5 minutes, and VC 2 is set to 3 minutes, then it's possible that the add-on fetches cluster performance data for only one vCenter at a time.

This is because the add-on script caches the collection interval and uses it when fetching cluster performance data. If a vCenter has a different collection interval than this stored value, the DCN throws an error and isn't able to fetch cluster performance data.

Solution

Work around this error by setting the collection interval to the same value for all vCenters:

  1. Connect to the web client https:// <vcenter server ip/hostname>.
  2. Select vCenter Server.
  3. Select Configure > General > Statistic.
  4. Click Edit.
  5. Update the collection interval to equal the same value across your configured vCenters.
  6. Save the configuration.

Virtual machine performance data is missing

Problem

Unable to get virtual machine performance data. This query doesn't return any results:


index="vmware-perf" source="VMPerf:VirtualMachine" | dedup sourcetype | table sourcetype

And on the Scheduler machine, you see the following error message in splunkd.log:

03-30-2020 13:34:04.693 +0100 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py" splunk.AuthorizationFailed: [HTTP 403] Client is not authorized to perform requested action; https://127.0.0.1:8089/servicesNS/nobody/Splunk_TA_vmware/storage/passwords/

Cause

The admin user has been renamed and Splunk no longer has an "admin" named user.

To collect virtual machine performance data, ta_vmware_hierarchy_agent.py scripted input prepares the list Virtual Machine moids. So if this list isn't created and shared with the data collection node (DCN), the DCN isn't able to collect performance data for them.

For this scripted input, the parameter "passAuth" is used for getting sessionKey for authentication purposes. It's value is admin, which means the 'admin' user is required to do the authentication.

Check $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/default/inputs.conf

[script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py]
passAuth = admin

Resolution

There are 2 resolutions for this issue:

  • On the scheduler machine, create a new user with the name "admin" and assign the "admin" and splunk_vmware_admin roles to admin user.
  • Change the passAuth attribute value to the existing user name on the scheduler machine:
  1. Add the passAuth = splunk-system-user parameter value to the following stanza in $SPLUNK_HOME/etc/apps/Splunk_TA_vmware/local/inputs.conf:
  2. [script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py] passAuth = splunk-system-user 
    
  3. Restart Splunk.

No data collection when DCN is configured with more than 8 worker processes on Splunk version 8.x

Problem

When there are more than 8 worker processes configured, the scheduler throws the following error and data is not collected.

2020-09-30 15:06:50,550 ERROR [ta_vmware_collection_scheduler_inframon://Global pool] [HydraWorkerNode] [establishGateway] could not connect to gateway=https://<DCN>:8008 for node=https://<DCN>:8089 due to a socket error, timeout, or other fundamental communication issue, marking node as dead

Cause

In VMware add-on, the scheduler and the DCN communicate with each other through the hydra gateway server. When the add-on is installed on Splunk version 8.x and there are more than 8 worker processes configured for the DCNs, the hydra gateway server takes a longer time to respond to the request. The schedule isn't able to authenticate the hydra gateway server and no jobs are assigned to the DCNs.

Resolution

On the scheduler machine, go to the Collection Configuration page and edit the configured DCNs to update the worker process count to 8 or less. If more worker processes are required then configure new DCN machines. See Prepare to deploy the DCN for the standard guidelines.

Error for unexpected keyword argument 'rewrite' on Scheduler

Problem

When Splunkd is restarted, the DCNs stop collecting data and the scheduler for the Splunk Add-on for VMware throws the following error:

2020-09-21 19:25:01,199 ERROR [ta_vmware_collection_scheduler://puff] Problem with hydra scheduler ta_vmware_collection_scheduler://puff:
 checkvCenterConnectivity() got an unexpected keyword argument 'rewrite'
 Traceback (most recent call last):
 File "/opt/splunk/etc/apps/SA-Hydra/bin/hydra/hydra_scheduler.py", line 2126, in run
 self.checkvCenterConnectivity(rewrite=True)
 TypeError: checkvCenterConnectivity() got an unexpected keyword argument 'rewrite'

Cause

In the add-on, the "checkvCenterConnectivity" function is defined to check the connectivity of the configured vCenter server every 30 minutes.

Because this function is defined in the Splunk_TA_vmware package and is called from the SA-Hydra scheduler module, it requires a supported SA-Hydra version installed with the Splunk_TA_vmware package on the scheduler instance.

Resolution

Up grade SA-Hydra or Splunk_TA_vmware to versions that are compatible with each other. Also, make sure the scheduler, DCN, search, and indexer have the same add-on version.

Here's the version compatibility matrix for Splunk_TA_vmware and supported SA-Hydra:

Splunk_TA_vmware version SA-Hydra version
3.4.4 4.0.8
3.4.5 4.0.9
3.4.6 4.1.0
3.4.7 4.1.1
Last modified on 21 July, 2021
PREVIOUS
Upgrade the Splunk Add-on for VMware from v3.4.5 to v4.0.0
  NEXT
Source types for the Splunk Add-on for VMware

This documentation applies to the following versions of Splunk® Supported Add-ons: released


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters