Distributed Deployment Manual

 


Troubleshoot distributed search

Troubleshoot distributed search

This topic describes issues to be aware of when configuring or using distributed search.

General configuration issues

Clock skew between search heads and search peers can affect search behavior

It's important to keep the clocks on your search heads and search peers in sync, via NTP (network time protocol) or some similar means. If the clocks are out-of-sync by more than a few seconds, you can end up with search failures or premature expiration of search artifacts.

Search head pooling configuration issues

When implementing search head pooling, there are a few potential issues you should be aware of, mainly having to do with coordination among search heads.

Authentication and authorization changes made in Manager apply only to a single search head

Authentication and authorization changes made through a search head's Manager apply only to that search head and not to other search heads in that pool. Each member of the pool maintains its local configurations in $SPLUNK_HOME/etc/system/local. To share configurations across the pool, set them up in shared storage, as described in "Configure search head pooling".

Clock skew between search heads and shared storage can affect search behavior

It's important to keep the clocks on your search heads and shared storage server in sync, via NTP (network time protocol) or some similar means. If the clocks are out-of-sync by more than a few seconds, you can end up with search failures or premature expiration of search artifacts.

Permission problems on the shared storage server can cause pooling failure

On each search head, the user account Splunk runs as must have read/write permissions to the files on the shared storage server.

NFS client concurrency limits can cause search timeouts or slow search behavior

The search performance in a search head pool is a function of the throughput of the shared storage and the search workload. The combined effect of concurrent search users and concurrent scheduled searches running will yield a total IOPs that the shared volume needs to support. IOP requirements will also vary by the kind of searches run. To adequately provision a device to be shared between search heads, you need to know the number of concurrent users submitting searches and the number of jobs/apps that will be executed simultaneously.

If searches are timing out or running slowly, you might be exhausting the maximum number of concurrent requests supported by the NFS client. To solve this problem, increase your client concurrency limit. For example, on a Linux NFS client, adjust the tcp_slot_table_entries setting.

NFS latency for large user count can incur splunk configuration access latency or slow dispatch reaping

Splunk synchronizes the search head pool storage configuration state with the in-memory state when it detects changes. Essentially, it reads the configuration into memory when it detects updates. When dealing either with overloaded search pool storage or with large numbers of users, apps, and configuration files, this synchronization process can reduce performance. To mitigate this, the minimum frequency of reading can be increased, as discussed in "Select timing for configuration refresh".

Warning about unique serverName attribute

Each search head in the pool must have a unique serverName attribute. Splunk validates this condition when each search head starts. If it finds a problem, it generates this error message:

serverName "<xxx>" has already been claimed by a member of this search head pool 
in <full path to pooling.ini on shared storage>
There was an error validating your search head pooling configuration. For more 
information, run 'splunk pooling validate'

The most common cause of this error is that another search head in the pool is already using the current search head's serverName. To fix the problem, change the current search head's serverName attribute in .system/local/server.conf.

There are a few other conditions that also can generate this error:

  • The current search head's serverName has been changed.
  • The current search head's GUID has been changed. This is usually due to /etc/instance.cfg being deleted.

To fix these problems, run

splunk pooling replace-member

This updates the pooling.ini file with the current search head's serverName->GUID mapping, overwriting any previous mapping.

Artifacts and incorrectly-displayed items in Manager UI after upgrade

When upgrading pooled search heads, you must copy all updated apps - even those that ship with Splunk (such as the Search app and the data preview feature, which is implemented as an app) - to the search head pool's shared storage after the upgrade is complete. If you do not, you might see artifacts or other incorrectly-displayed items in Manager.

To fix the problem, copy all updated apps from an upgraded search head to the shared storage for the search head pool, taking care to exclude the local sub-directory of each app.

Important: Excluding the local sub-directory of each app from the copy process prevents the overwriting of configuration files on the shared storage with local copies of configuration files.

Once the apps have been copied, restart Splunk on all search heads in the pool.

Distributed search error messages

This table lists some of the more common search-time error messages associated with distributed search:

Error message Meaning
status=down The specified remote peer is not available.
status=not a splunk server The specified remote peer is not a Splunk server.
duplicate license The specified remote peer is using a duplicate license.
certificate mismatch Authentication with the specified remote peer failed.

This documentation applies to the following versions of Splunk: 5.0 , 5.0.1 , 5.0.2 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!