Performance degraded in a search head pooling environment
Symptoms
In a pool environment, you notice that searches are taking longer than they used to. How do you figure out where your performance degradation is coming from? This topic suggests a few tests you can run.
Time some simple commands
Try some basic commands outside of Splunk Enterprise. If either of these operating system commands takes more than ten or so seconds to complete, it indicates an issue on the shared storage.
- On the search head, in the pooled location, at the *nix command line,
time find /path/to/pool/dir | wc -l
measures the time to find the things in .../dir and then count them.
- Another simple command to try is:
time ls -lR /path/to/pool/dir | wc -l
,
which measures how long it takes to count items in the pool.
If you do not have shell access, other tests you can run include:
- logging in (which uses a shared token)
- accessing knowledge objects.
Compare searches in and out of search head pooling
Run a simple search (for example, index=_internal source=*splunkd.log | tail 20
) with and without search head pooling enabled. Compare the timings.
Use Splunk Enterprise log files
In splunkd.log
searchstats
index=_internal source=*splunkd_access.log NOT rtsearch spent>29999
any search taking over 30 seconds to return is a slow search.
If
- the only slow things are searches (but not, for example, bundle replication), then your problem might be with your mount point. Run some commands outside of Splunk Enterprise to validate that your mount point is healthy.
- accessing knowledge objects takes a long time, search in metrics.log for the load_average:
index=_internal source=*metrics.log load_average
look in metrics for 2-5 minutes before and after the duration of the slow-running search
If you see this is high, and you have SoS installed, refer to the same period of time and look at the CPU graphs on SoS to make sure you're not seeing a system load.
If the problem is with the mount point, the box is not going to be challenged.
If the problem is with the search load, the CPU usage will be high for the duration of the slow search.
Is it a search load problem?
Start turning off field extractions. Is it still slow?
Next turn off real-time all-time and wildcards in your searches.
If you have the Splunk on Splunk app, check the search load view. If you have the Distributed Management Console, check the Search Activity views.
Consider search scheduling. Have you scheduled many searches to run at the same time? Use the Distributed Management Console Search Activity view to identify search scheduling issues. If you've identified issues, move some of your scheduled searches to different minutes past the hour.
Intermittent authentication timeouts on search peers | I'm having problems with the Splunk PDF Server app |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9
Feedback submitted, thanks!