Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

Splunk Enterprise version 5.0 reached its End of Life on December 1, 2017. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Bucket replication issues

Network issues impede bucket replication

If there are problems with the connection between peer nodes such that a source peer is unable to replicate a hot bucket to a target peer, the source peer will roll the hot bucket and start a new hot bucket. If it still has problems connecting with the target peer, it will roll the new hot bucket, and so on.

To prevent a situation from arising where a prolonged failure causes the source peer to generate a large quantity of small hot buckets, the source peer will, after a configurable number of replication errors to a single target peer, stop rolling hot buckets due to the connection problem with that target peer. The default is three replication errors. The following banner message then appears one or more times in the master node's dashboard, depending on the number of source peers encountering errors:

Search peer <search peer> has the following message: Too many streaming errors to target=<target 
peer>. Not rolling hot buckets on further errors to this target. (This condition might exist with 
other targets too. Please check the logs.)

While the network problem persists, there might not be repfactor available copies of the most recent hot buckets.

Configure the allowable number of replication errors

To adjust the allowable number of replication errors, configure the max_replication_errors attribute in server.conf on the source peer. Here is the full definition of the attribute:

max_replication_errors = <integer>                                     
   * Currently only valid for mode=slave  
   * This is the maximum number of consecutive replication errors 
     (currently only for hot bucket replication) from a source peer 
     to a specific target peer. Until this limit is reached, the 
     source continues to roll hot buckets on streaming failures to   
     this target. After the limit is reached, the source will no
     longer roll hot buckets if streaming to this specific target 
     fails. This is reset if at least one successful (hot bucket) 
     replication occurs to this target from this source. 
   * Defaults to 3.                                                                   
   * The special value of 0 turns off this safeguard; so the source
     always rolls hot buckets on streaming error to any target.               

Evidence of replication failure on the source peer

Evidence of replication failure appears in the source peer's splunkd.log, with a reference to the failed target peer(s). You can locate the relevant lines in the log by searching on "CMStreamingErrorJob". For example, this grep command finds that there have been 26 streaming errors to the peer with the GUID "B3D35EF4-4BC8-4D69-89F9-3FACEDC3F46E":

grep CMStreamingErrorJob ../var/log/splunk/splunkd.log* | cut -d' ' -f10 | sort |uniq -c | sort -nr
15 failingGuid=B3D35EF4-4BC8-4D69-89F9-3FACEDC3F46E 
PREVIOUS
What happens when a master node goes down
 

This documentation applies to the following versions of Splunk® Enterprise: 5.0.3, 5.0.4


Comments

What might a Splunk search string look like to look for replication errors?

Sowings splunk, Splunker
July 29, 2013

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters