Common issues with Splunk and WMI
This topic discusses common issues encountered when getting WMI-based data into Splunk. It offers solutions for problems such as the following:
- Splunk can't get data from remote machines.
- Splunk can't get local data through WMI.
- Splunk sometimes crashes when getting remote data.
- Splunk connects to WMI differently depending on product version.
Splunk can't get data from remote machines
When Splunk can index events on the local machine, but can't get data from remote machines using WMI, authentication or network connectivity is often the reason. Splunk requires a user account with valid credentials for the Active Directory (AD) domain or forest in which it's installed in order to collect data remotely. It also requires a clear network path to the machine from which it gets data, unblocked by firewalls on either the source or target machines.
Determine that Splunk has been installed as a domain user
The first thing to do is to make sure that Splunk is installed as a domain user. If this requirement isn't met, Splunk won't be able to get data remotely even if the network is functioning.
1. Open a command prompt.
2. Run the SC
command to query the Services Command Manager about the splunkd
and splunkweb
services.
C:\> sc qc splunkd [SC] QueryServiceConfig SUCCESS SERVICE_NAME: splunkd TYPE : 10 WIN32_OWN_PROCESS START_TYPE : 2 AUTO_START ERROR_CONTROL : 1 NORMAL BINARY_PATH_NAME : "C:\Program Files\Splunk\bin\splunkd.exe" service LOAD_ORDER_GROUP : TAG : 0 DISPLAY_NAME : Splunkd DEPENDENCIES : SERVICE_START_NAME : LocalSystem
The SERVICE_START_NAME
field tells you the user that Splunk is configured to run as. If this field shows LocalSystem
, then Splunk is not configured to run as a domain user. Uninstall Splunk, then reinstall it and make sure to specify "Other user" during the setup process.
Note: You can also determine which user Splunk is configured to run as by using the Services control panel.
Review the splunkd.log file
If Splunk is correctly configured as a domain user, the next step is to investigate why Splunk is having problems connecting to WMI providers.
Open the %SPLUNK_HOME%\var\log\splunk\splunkd.log
file and search for wmi.
When Splunk encounters an error attempting to connect to a WMI provider, it logs errors in splunkd.log
as follows:
03-11-2009 10:08:29.296 ERROR ExecProcessor - error from "python E:\Splunk\bin\scripts\splunk-wmi.py" ERROR WMI - Instantiation of IWbemServices::ExecQueryAsync failed (error code 800706be)
03-11-2009 10:08:29.296 ERROR ExecProcessor - error from "python E:\Splunk\bin\scripts\splunk-wmi.py" ERROR WMI - IWbemServices::CancelAsyncCall error (WMI Namespace "\\ADLDBS01\root\cimv2", Param "Application", HRESULT error 80041002)
The following table shows the most common errors encountered when connecting to WMI providers:
Error code | Description |
---|---|
80070005 | Access is denied. (due to an incorrect login) |
80041064 | User credentials cannot be used for local connections. |
800706BA | The RPC server is unavailable. |
80041003 | Access Denied. (due to explicit access restrictions) |
If you see lines within the log file that contain HRESULT error
then Splunk is unable to complete the WMI operation due to a network connectivity or authentication problem. You can use the WBEMTEST
utility to corroborate what is shown in Splunk's log file.
Enable debug logging
You can get even more detailed information about what is causing the errors by enabling debug logging in Splunk's logging engine.
Note: After you have confirmed the cause of the error, be sure to turn debug logging off.
To enable debugging for WMI-based inputs, you must set two parameters:
1. Edit log.cfg
in %SPLUNK_HOME\etc
. Add the following parameter:
[splunkd] category.ExecProcessor=DEBUG
2. Edit log-cmdline.cfg
, also in %SPLUNK_HOME%\etc
. Add the following parameter:
category.WMI=DEBUG
Note: You can place this attribute/value pair anywhere in the file, as long as it is on its own line. log-cmdline.cfg
does not use stanzas.
3. Restart Splunk:
C:\Program Files\Splunk\bin> splunk restart
4. Once Splunk has restarted, let it run for a few minutes until you see debug log events coming into Splunk.
Note: You can search Splunk's logfiles within Splunk by supplying index="_internal"
as part of your search string. Review "What Splunk logs about itself" in the Troubleshooting Manual for additional information.
5. Once Splunk has collected enough debug log data, send a diag to Splunk Support:
C:\Program Files\Splunk\bin> splunk diag
After you finish troubleshooting, revert to the default settings:
1. In log.cfg
, change the category.ExecProcessor
attribute to its default setting:
[splunkd] category.ExecProcessor=WARN
Note: You can also remove this entry from the file.
2. In log-cmdline.cfg
, change the category.WMI
attribute to its default setting:
category.WMI=ERROR
Note: Any changes made to log.cfg
are overwritten when you upgrade Splunk. Create a log-local.cfg
in %SPLUNK_HOME%\etc
to avoid this problem.
Use the WBEMTEST utility to reproduce the error outside of Splunk
If you see HRESULT error
entries in the splunkd.log
, use the WBEMTEST
utility to confirm the error outside of Splunk.
1. Log into the Splunk server as the Splunk user.
2. Click Start > Run…
3. In the Run dialog, type in wbemtest
and click OK.
4. In the Windows Management Instrumentation Tester window, click the Connect… button.
The Connect window appears.
5. In the Namespace field of the Connect window, type in the namespace of the server that is experiencing errors.
Note: You must type in the full path of the namespace. For example, if the server you are attempting to connect to is called ADLDBS01, you must type in \\ADLDBS01\root\cimv2
(including the backslashes).
6. Click Connect.
Note: You should be able to connect to the server without needing to supply credentials. If you are prompted for credentials, then the Splunk user is not correctly configured to access WMI.
7. Once you are connected to the server, set your WMI connection mode by selecting one of the radio buttons in Method Invocation Options the lower right corner of the WBEMTEST
window:
- For Splunk 3.4.9 and earlier, choose Asynchronous.
- For versions of Splunk after 3.4.9, choose Semisynchronous.
8. Click "Query…"
The Query window appears.
9. In the Query window, type in a valid Windows Query Language (WQL) statement, such as the one supplied below, then click Apply.
Following is a WQL statement that you can test WMI connections with:
SELECT Category, CategoryString, ComputerName, EventCode, EventIdentifier, EventType, Logfile, Message, RecordNumber, SourceName, TimeGenerated, TimeWritten, Type, User FROM Win32_NTLogEvent WHERE Logfile = "Application"
The following graphic shows an example of successful results:
Check Windows Firewall
If Windows Firewall (or any other firewall software) is running on either the source or target machine, Splunk might be blocked from getting data through WMI providers. Make sure that you explicitly allow WMI through on the firewalls on both machines. You can also disable Windows Firewall, but this is not recommended by Splunk or Microsoft.
Additional information about connecting through Windows Firewall can be found at "Connecting Through Windows Firewall", http://msdn.microsoft.com/en-us/library/aa389286(VS.85).aspx on MSDN. If you are trying to extract events from a Windows Vista or Windows Server 2008 computer, review "Connecting to WMI remotely starting with Windows Vista", http://msdn.microsoft.com/en-us/library/aa822854(VS.85).aspx, also on MSDN.
Splunk is unable to get local data through WMI
When Splunk is unable to get data from the local machine through WMI providers, this might be because WMI is experiencing issues under load. When this happens, try restarting the Windows Management Instrumentation (wmimgmt
) service from within the Services control panel, or by using the sc
command-line utility.
Splunk sometimes crashes when collecting data over WMI
WMI can occasionally cause the splunk-wmi.exe
process to crash. Splunk will spawn a new process when this happens (you can tell by the changed process ID).
- While there is no guaranteed fix for this issue, you can reduce the number of crashes by reducing the number of servers you are monitoring through WMI with any given Splunk instance. Limit the number of WMI-based inputs per instance to 80 or fewer.
- If you monitor the same subset of WMI providers on large numbers of machines, you can run into WMI memory constraints on the monitoring server. This can also cause crashes. Limit the number of WMI-based data inputs per server monitored through WMI. It's best to reduce the total number of WMI connections per instance to 120 or fewer on 32-bit Windows servers, and 240 or fewer on 64-bit Windows servers.
- Consider using universal forwarders to get your data. You can either install universal forwarders on a few machines and get data from other machines through WMI, or you can put universal forwarders on all remote machines.
Splunk connects to WMI differently based on product version
When Splunk makes requests to WMI, it does so in one of three ways: Synchronous, asynchronous and semisynchronous.
Splunk makes what are known as semisynchronous calls to WMI providers. This means that when Splunk makes a call to WMI, it continues running while WMI deals with the request.
Semisynchronous mode offers the best balance of resource usage and security on the computer making the request. It differs from the faster asynchronous mode, but is more secure due to the way that the system handles retrieval of the WMI objects. Both of these modes are faster than synchronous mode, which forces programs making that kind of WMI request to wait until WMI returns the data.
When WMI is dealing with a large number of requests, you might notice a slower response because memory usage on the system increases until the retrieved WMI objects are no longer needed by Splunk (after indexing).
More information about how WMI calls are made is available at "Calling a Method", http://msdn.microsoft.com/en-us/library/aa384832(VS.85).aspx on MSDN.
Note: Versions of Splunk prior to 3.4.10 make asynchronous connections to WMI providers.
Manually verify that WMI is working
To test WMI, you can run the splunk-wmi.exe
command manually with a desired query and/or namespace to see the output that it produces.
Caution: When running this command, be sure to temporarily change Splunk's data store directory (the location that SPLUNK_DB
points to), so that you do not miss any WMI events. To change Splunk's database store, refer to "Test access to WMI providers" in the Getting Data In Manual.
Here is an example of a valid splunk-wmi
statement:
C:\Program Files\Splunk\bin> splunk cmd splunk-wmi.exe -wql "select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk"
The following output shows a failure to connect to the desired WMI provider:
$ ./splunk cmd splunk-wmi.exe -wql "select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk_typo"
ERROR WMI - Error occurred while trying to retrieve results from a WMI query (error="Specified class is not valid." HRESULT=80041010) (.: select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk_typo)
ERROR WMI - Giving up attempt to connect to WMI provider after maximum number of retries at maximum backoff time (.: select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk_typo)
Clean shutdown completed.
The following shows a successful connection to a WMI provider:
jrodman@jrodman-PC /cygdrive/c/Program Files/Splunk/bin $ ./splunk cmd splunk-wmi.exe -wql "select * FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk" 20090904144105.000000 AvgDiskBytesPerRead=0 AvgDiskBytesPerTransfer=0 AvgDiskBytesPerWrite=0 AvgDiskQueueLength=0 AvgDiskReadQueueLength=0 AvgDiskWriteQueueLength=0 AvgDisksecPerRead=0 AvgDisksecPerTransfer=0 AvgDisksecPerWrite=0 Caption=NULL CurrentDiskQueueLength=0 Description=NULL DiskBytesPersec=0 $ DiskReadsPersec=0 DiskTransfersPersec=0 DiskWriteBytesPersec=0 DiskWritesPersec=0 Frequency_Object=NULL Frequency_PerfTime=NULL Frequency_Sys100NS=NULL Name=0 D: C: PercentDiskReadTime=0 PercentDiskTime=0 PercentDiskWriteTime=0 PercentIdleTime=98 SplitIOPerSec=0 Timestamp_Object=NULL Timestamp_PerfTime=NULL Timestamp_Sys100NS=NULL wmi_type=unspecified ---splunk-wmi-end-of-event--- 20090904144105.000000 AvgDiskBytesPerRead=0 AvgDiskBytesPerTransfer=0 AvgDiskBytesPerWrite=0 AvgDiskQueueLength=0 AvgDiskReadQueueLength=0 AvgDiskWriteQueueLength=0 AvgDisksecPerRead=0 AvgDisksecPerTransfer=0 AvgDisksecPerWrite=0 Caption=NULL CurrentDiskQueueLength=0 Description=NULL DiskBytesPersec=0 DiskReadBytesPersec=0 DiskReadsPersec=0 DiskTransfersPersec=0 DiskWriteBytesPersec=0 DiskWritesPersec=0 Frequency_Object=NULL Frequency_PerfTime=NULL Frequency_Sys100NS=NULL Name=Total PercentDiskReadTime=0 PercentDiskTime=0 PercentDiskWriteTime=0 PercentIdleTime=98 SplitIOPerSec=0 Timestamp_Object=NULL Timestamp_PerfTime=NULL Timestamp_Sys100NS=NULL wmi_type=unspecified ---splunk-wmi-end-of-event--- Clean shutdown completed.
For more information
See the Admin Manual for information on getting started for Windows admins.
Troubleshoot Windows event log collection | Advanced help troubleshooting Splunk software for Windows |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2, 9.4.0
Feedback submitted, thanks!