Workaround for network accessibility issues on Splunk Windows systems under certain conditions
This page discusses how to work around an issue where network-intensive Splunk Enterprise operations on a Windows system can sometimes cause that system to become inaccessible from the network.
A Windows system that supports a Splunk Enterprise instance which performs network-intensive operations can become inaccessible from the network after a period of time. Problems usually begin within eight to twelve hours, but can start as late as 2-3 days depending on the amount of network activity that the instance sees. When this anomaly occurs, any attempts to connect to the system remotely fail, and you must restart the computer to return it to service.
You might see the following error in splunkd.log, or in the search.log file(s) created in the individual dispatch directory that each search (scheduled or real-time) generates:
01-16-2013 06:55:33.935 WARN NetUtils - Error connecting - winsock error 10055\n
This problem has multiple causes:
- By default, Windows configures a low number (5000) of available ephemeral, or short-lived, network (TCP) ports.
- When you perform network-intensive activities in Splunk Enterprise, Splunk Enterprise generates a large number of short-lived network connections, which use these ports. Network-intensive activities include but are not limited to:
- Running a large number of concurrent real-time searches (usually from an app).
- Configuring a deployment client to connect to a deployment server which is on the same computer.
- Once the Windows system runs out of available ports, it returns
WSAENOBUFS(Windows Sockets error 10055) to any application that requests a port for network operations, and immediately becomes inaccessible from the network.
When this happens, the only way to fix the problem is to reboot the affected computer.
Note: While this problem most commonly occurs when you employ numerous concurrent real-time searches, any kind of search - and more importantly, any kind of network operation - can trigger the issue. The problem is not limited to Splunk, but Splunk can often cause the problem to appear.
This problem only appears on Windows systems.
To work around this issue, you can complete one or both of the following steps.
Caution: The steps below require that you make administrative changes to your Windows system. These advanced changes might render your system unstable or unusable. If you are not able to make these changes, or are either unsure or uncomfortable about what to do, then contact your internal IT support organization for assistance.
1. Modify the Registry to increase the number of available user ports. Follow the instructions at "When you try to connect from TCP ports greater than 5000 you receive the error 'WSAENOBUFS'" (http://support.microsoft.com/kb/196271/en-us) on the Microsoft Support site to modify the Registry and increase the number of ephemeral TCP ports.
Important: We suggest you complete this step first, then restart your system. If the problem persists, then perform the next step.
2. Install a downloadable hotfix from Microsoft. If your system is a multiple-CPU system that runs either Windows Server 2008 R2 or Windows 7, then you can download and install a hotfix which addresses this specific issue. For information and instructions on how to download and apply the hotfix, see "Kernel sockets leak on a multiprocessor computer that is running Windows Server 2008 R2 or Windows 7" (http://support.microsoft.com/kb/2577795) on the Microsoft Support site.
Important: This option is available only for systems with multiple CPUs that run Windows Server 2008 R2 or Windows 7.
You must restart your computer after performing either of these actions.
Splunk Enterprise and anti-virus products
Performance Monitor inputs show maximum values of 100 percent usage for a process on multicore Microsoft Windows machines
This documentation applies to the following versions of Splunk® Enterprise: 6.5.7, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 9.0.0