Troubleshoot Splunk App for VMware

Review the release notes to determine if the trouble you are experiencing is a known issue.
Consider enabling the troubleshooting logs on data collection nodes to facilitate root cause investigation. See Enable troubleshooting logs.
Review the following problems for advice on how to resolve them.

Can't edit worker inputs on the remote node, connect to splunkd, or retrieve EAI_new descriptor

If you are getting "Problem editing the number of worker inputs on the remote node" error in splunk_for_vmware_setup.log
If you are getting "could connect to splunkd but failed to auth check username and password" error in hydra_scheduler_ta_vmware_collection_scheduler_puff.log
If you are getting "unable to retrieve the EAI _new descriptor for entity: configs/conf-hydra_metadata" error in splunkd.log
On scheduler, if you are unable to update or validate the configurations on collection configuration page.

Cause

Splunk user of schedular and DCN does not have admin_all_objects capability added to the splunk_vmware_admin role.

Resolution

Follow the below steps on schedular and DCN.

Splunk user which is used to configure the collection configurations and the Splunk user of DCN which is used to configure the DCN must have splunk_vmware_admin role.
splunk_vmware_admin role should have admin_all_objects capabilities.

Splunk App for VMware doesn't make read-only API calls to vCenter Server systems

Cause

You do not have the appropriate vCenter Server service login credentials for each vCenter Server.

Resolution

Obtain vCenter Server service login credentials for each vCenter server. See vSphere requirements.

`vpxd.stats.maxQueryMetrics` error prevents data collection from vCenters

Cause

As of version 6.0 VMware vCenter has added a limitation to the number of performance metrics collected by the vpxd.stats.maxQueryMetrics function. The vCenter 6.0 maxQuerySize limit is 64 metrics per query. This 64 metric limit is calculated by mulltiplying the number of metrics queried by the number of entities (virtual machines) being queried. For example, querying 8 entities (virtual machines) for 10 metrics from each entity (virtual machine) equals a query size of 80.

Resolution

To adjust the maxQuerySize limit:

Navigate to the advanced settings of vCenter Server, or vCenter Server Appliance.
Edit the config.vpxd.stats.maxQueryMetrics key.
Edit the web.xml file.

See the VMware documentation for more information.

You have configured vCenter Server 5.0 in Splunk App for VMware, but no data is coming in

Cause

vCenter Server 5.0 and 5.1 are missing WSDL files that are required for Splunk App for VMware to make API calls to vCenter Server.

reflect-message.xsd
reflect-types.xsd

Resolution

Install the missing VMware WSDL files as documented in the vSphere Web Services SDK WSDL workaround in the VMware documentation. Note that the programdata folder is typically a hidden folder.

The individual tests in the deployment section work, but the Splunk App for VMware main dashboard is empty

Cause

This is typically caused by one of two issues:

Time synchronization issues between the indexer, DCN, and vCenter Server
Incorrect permissions assignments in Splunk Enterprise

Resolution

Check for time gaps between the indexer, DCN, and vCenter Server. See Validate vCenter Servers time synchronization settings for details. To adjust or disable the Network Time Protocol (NTP) on your DCN, see Configure additional settings for a DCN.

Make sure that you have assigned the correct roles to users of Splunk Enterprise.

User	Role
admin user	splunk_vmware_admin
all users of the Splunk App for VMware	splunk_vmware_user

The DCNs are forwarding data using index=_internal tests, but the app is not collecting any API data

Cause

This is typically caused by one of two issues:

Network connectivity issues from the Scheduler to the DCNs.
You have not changed the DCN admin account password from its default value.

Resolution

In the Splunk for VMware App Settings page, verify the accuracy of the settings in the collection page.
Verify that the admin password for each DCN is not set to changeme.
Verify that each DCN has a fixed IP address. If Splunk App for VMware uses DCN host names instead of fixed IP addresses, verify that DNS lookups resolve to the correct IP addresses.

Splunk App for VMware works for 60 days, then stops

Cause

The 60-day trial license for Splunk App for VMware has expired.

Resolution

Configure the DCN to join your Splunk Enterprise license pool.

splunk edit licenser-localslave -master_uri https://myhost:8089

Splunk App for VMware seems to be collecting only partial data. Hosts are missing, and so on

Cause

There are insufficient DCNs to handle the data volume coming from the ESXi environment.

Resolution

From the Settings page in Splunk App for VMware, review the list of hosts for each vCenter Server environment.

Verify that each DCN polls information for up to 40 ESXi hosts and 1,000 virtual machines (30/750 is recommended), based on the specifications for a 4 core DCN (the one configured with the OVA for VMware). Based on this sizing, a site that pulls information from 200 hypervisors and 5,000 VMs needs at least 5 DCNs.

Verify that the number of worker processes must be one fewer than the number of CPU cores the vCenter Server system granted to the DCN. For example, if the DCN has four CPU cores, the number of worker processes is three.

In the Splunk App for VMware configuration pane for the DCN, make sure that the number of worker processes is one fewer than the number of CPU cores assigned to the machine.

The DCNs are not delivering data to the Splunk Enterprise indexers

Cause

If the DCNs are configured correctly, this problem is typically the result of a connectivity issue.

Resolution

Make sure that the DCN has an IP address and can resolve DNS.
Verify that no firewalls are preventing communication between the DCN and port 9997 on the indexers and that the Scheduler can connect to ports 8089 and 8008 on each DCN. On the search head, search index=_internal host=DCN-hostname-here.

Events error in "Host Detail" view when using Splunk Enterprise version 6.0 or later

In the Host Detail view of Splunk Enterprise version 6.0 or later, you see the following warning message:

Events may not be returned in sub-second order due to search memory limits configured in limits.conf:[search]:max_rawsize_perchunk. See search.log for more information.

Or, in search.log of Splunk Enterprise version 6.0 or later, you see the following warning message:

02-06-2014 15:47:04.353 ERROR databasePartitionPolicy - Max Raw Size Limit Exceeded
02-06-2014 15:47:04.467 INFO  UnifiedSearch - Error in 'databasePartitionPolicy': Max Raw Size Limit Exceeded
02-06-2014 15:47:04.467 WARN  CursoredSearch - Events may be returned not in exact sub-second order: M=1368 > N=1250, where M is the number of events read in the 1390841799th second, and N is max number of events to read in a single span. Note that N was scaled back because we exceeded limits.conf:[search]:max_rawsize_perchunk value=100000000

Cause

A bug: non-surppressed error message (SOLNVMW-3587)

Resolution

1. On indexers, add the following configuration in the limits.conf file in the $SPLUNK_HOME/etc/system/local directory:

limits.conf
[search]
max_rawsize_perchunk = 800000000

2. Restart the indexer.

3. Test to you if still see the error message in Splunk App for VMware Host Detail view.

Note that setting the value of max_rawsize_perchunk = 400000000 surppresses the warning message in the Host Detail view. However, in the search.log file, you will still see the following message:

02-07-2014 14:52:43.008 ERROR databasePartitionPolicy - Max Raw Size Limit Exceeded
02-07-2014 14:52:43.127 INFO  UnifiedSearch - Error in 'databasePartitionPolicy': Max Raw Size Limit Exceeded

To mitigate the appearance of these error messages, set the max_rawsize_perchunk to at least 600000000.

Incomplete or no data coming from vCenters that are configured and connected by a DCN

Incomplete or no data coming from vCenters that are configured and connected by a DCN. Data collection tasks are failing and/or connections between DCN and vCenter are closing before all data is transferred. This could be due to one of two issues.

The collection tasks taking longer than the vCenter and app are expecting.
Collection intervals are currently overloading your Data Collection Nodes (DCNs) and your vCenters.

Resolution

Change collection intervals in order to reduce the load on your Data Collection Nodes (DCNs) and your vCenters

Change the time interval for your host inventory job.

On the instance where your scheduler is running, navigate to \etc\apps\Splunk_TA_vmware\default\.
Open the ta_vmware_collection.conf file.
Change hostinv_interval and hostinv_expiration from the 900 second default to a larger number (maximum 2700 seconds). Keep hostinv_interval and hostinv_expiration at the same number of seconds.
Save your changes and exit.

Change the time interval for host performance data.

On the instance where your scheduler is running, navigate to \etc\apps\Splunk_TA_vmware\local\.
Open the ta_vmware_collection.conf file.
Change hostvmperf_interval and hostvmperf_expiration from the 180 second default to a larger number (maximum 1200 seconds). Keep hostvmperf_interval and hostvmperf_expiration at the same number of seconds.
Save your changes and exit.

Increase the timeout period in the vpxd file on your vCenter.

Open the vpxd.cfg file, located in C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg file (C:\ProgramData\VMware\VMware VirtualCenter\vpxd.cfg on Windows 2008) using a text editor.
Add the below information in the <vpxd> tags:

<heartbeat>
<notRespondingTimeout>180</notRespondingTimeout>
</heartbeat>

Restart your VMware VirtualCenter Server service.

VMware App sets SSL for WebUI as Default

Resolution

Disable WebUI SSL in the Splunk App for VMware to prevent web.conf from overriding your deployment's SSL settings.

Navigate to $SPLUNK_HOME/etc/system/local/ and make the following change to web.conf

[settings]
enableSplunkWebSSL = false

You get the message "orphaned scheduled searches" and the gauges on the homepage show "No Data"

Problem

You receive the message, "Splunk has found 8 orphaned searches owned by 1 unique disabled users.Click to view the orphaned scheduled searches. Reassign them to a valid user to re-enable or alternatively disable the searches."

On the homepage, the gauges show "No Data" and in Proactive Monitoring > Entity Views > Details, dropdowns do not populate.

Cause

The admin user has been renamed and Splunk no longer detects an "admin" user.

In the Splunk App for VMWare, the admin user is the owner of scheduled saved searches. If Splunk cannot identify a user named "admin," then these saved searches aren't scheduled and don't run. This results in empty lookups, such as the FullHierarchy lookup.

The dashboards affected get data from the FullHiearchy lookup, which is why the gauges and dropdowns don't populate.

Resolution

There are two ways to resolve this problem:

On the search head machine, you can create a new user with the name "admin" and assign the roles admin and splunk_vmware_admin roles to this user.

Or, on the search head machine, you can change the owner of the saved searches to "nobody":

Perform the following steps for the splunk_for_vmware app:

Go to $SPLUNK_HOME/etc/apps/splunk_for_vmware/metadata.
Create local.meta if the file is not present.
Create a [savedsearches] stanza into file if the stanza is not present.
Add owner = nobody parameter-value pair under that stanza.

[savedsearches]
 owner = nobody

Perform the same steps for the "SA-VMW-HierarchyInventory" and "SA-VMNetAppUtils" apps
Restart Splunk on the search head. Or, you can hit the following URL to reload metadata file changes:

http://<SH IP/Hostname>:8000/en-US/debug/refresh

Verify the resolution by checking the orphaned saved searches list at the following URL:

http://<SH IP/Hostname>:8000/en-US/app/search/orphaned_scheduled_searches

Some VMware App Dashboards are not getting populated with the VMware Add-on data

Problem

Some panels of the following dashboards are not getting populated even when the VMware add-on data is present:

Capacity Forecasting
Capacity Planning for Clusters - CPU Headroom
Capacity Planning for Clusters - Memory Headroom
Cluster Detail
Host Detail
Virtual Machine Detail
Virtual Machine Snapshots
Proactive Monitoring
Home
Performance of Hosts and VMs
Capacity Planning (Hosts)
Capacity Planning (Cluster)

Cause

The dashboards use the summarized data present in the VMwareInventory and VMwarePerformance data models, for which data model acceleration must be enabled. The dashboards might not get populated if data model acceleration isn't enabled.

Resolution

Enable data model acceleration for the VMwareInventory and VMwarePerformance data models. An admin can enable acceleration or change the acceleration period by performing the following steps on the search head:

On the Splunk menu bar, Click select Settings > Data models.
Select VMware (splunk_for_vmware) from the App dropdown to see the data models defined and used by the Splunk App for VMware.
From the list for data models, click Edit in the Action column of a data model.
Select Edit Acceleration.
Check the Accelerate checkbox to enable data model acceleration.
Select the summary range to specify the acceleration period. The default summary range is one month.
Click Save.

Dashboard drop downs aren't populating even when the user has the required roles

Problem

The input drop downs for selecting vCenter and entities in dashboards from "Proactive Monitoring" and "Performance and Capacity Planning" aren't populating even when data is available on the search head and the current user has the required VMware roles.

Cause

In the VMware app, saved search queries are source-type based. Indexes aren't defined in the queries.

The VMware app role splunk_vmware_admin makes these indexes searchable when no index is defined in the query. If a saved search owner, such as an admin user who executes the scheduled searches, doesn't have this role, then the saved searches return zero events and lookup results.

The admin user must have the splunk_vmware_admin role for saved searches to return results. In Splunk Cloud, puppet scripts remove this role.

Resolution

Use the Splunk Supporting Add-on for VMware package in which the role dependency has been removed.

Troubleshoot Splunk App for VMware

Can't edit worker inputs on the remote node, connect to splunkd, or retrieve EAI_new descriptor

Cause

Resolution

Splunk App for VMware doesn't make read-only API calls to vCenter Server systems

Cause

Resolution

vpxd.stats.maxQueryMetrics error prevents data collection from vCenters

Cause

Resolution

You have configured vCenter Server 5.0 in Splunk App for VMware, but no data is coming in

Cause

Resolution

The individual tests in the deployment section work, but the Splunk App for VMware main dashboard is empty

Cause

Resolution

The DCNs are forwarding data using index=_internal tests, but the app is not collecting any API data

Cause

Resolution

Splunk App for VMware works for 60 days, then stops

Cause

Resolution

Splunk App for VMware seems to be collecting only partial data. Hosts are missing, and so on

Cause

Resolution

The DCNs are not delivering data to the Splunk Enterprise indexers

Cause

Resolution

Events error in "Host Detail" view when using Splunk Enterprise version 6.0 or later

Cause

Resolution

Incomplete or no data coming from vCenters that are configured and connected by a DCN

Resolution

VMware App sets SSL for WebUI as Default

Resolution

You get the message "orphaned scheduled searches" and the gauges on the homepage show "No Data"

Problem

Cause

Resolution

Some VMware App Dashboards are not getting populated with the VMware Add-on data

Problem

Cause

Resolution

Dashboard drop downs aren't populating even when the user has the required roles

Problem

Cause

Resolution

Comments

Troubleshoot Splunk App for VMware

Was this topic useful?

`vpxd.stats.maxQueryMetrics` error prevents data collection from vCenters