Bucket replication issues
Network issues impede bucket replication
If there are problems with the connection between peer nodes such that a source peer is unable to replicate a hot bucket to a target peer, the source peer will roll the hot bucket and start a new hot bucket. If it still has problems connecting with the target peer, it will roll the new hot bucket, and so on.
To prevent a situation from arising where a prolonged failure causes the source peer to generate a large quantity of small hot buckets, the source peer will, after a configurable number of replication errors to a single target peer, stop rolling hot buckets due to the connection problem with that target peer. The default is three replication errors. The following banner message then appears one or more times in the master node's dashboard, depending on the number of source peers encountering errors:
Search peer <search peer> has the following message: Too many streaming errors to target=<target peer>. Not rolling hot buckets on further errors to this target. (This condition might exist with other targets too. Please check the logs.)
While the network problem persists, there might not be replication factor number of copies available for the most recent hot buckets.
Configure the allowable number of replication errors
To adjust the allowable number of replication errors, you can configure the
max_replication_errors attribute in
server.conf on the source peer. Consult with Support before doing so, however, because changing this value can lead to false positives. That is, a single network event can generate multiple replication errors. In such situations, the replication errors in total might exceed the value of this attribute, even though all errors were derived from just the one network event.
Important: This attribute previously had a default of 3. Starting with 5.0.5, it has a default of 0, which means it is disabled by default.
Note: In 6.0, replication errors that can be attributed to a single network problem are bunched together and only count as one error, so the issue of false positives is eliminated.
Evidence of replication failure on the source peer
Evidence of replication failure appears in the source peer's
splunkd.log, with a reference to the failed target peer(s). You can locate the relevant lines in the log by searching on "CMStreamingErrorJob". For example, this
grep command finds that there have been 26 streaming errors to the peer with the GUID "B3D35EF4-4BC8-4D69-89F9-3FACEDC3F46E":
grep CMStreamingErrorJob ../var/log/splunk/splunkd.log* | cut -d' ' -f10 | sort |uniq -c | sort -nr 15 failingGuid=B3D35EF4-4BC8-4D69-89F9-3FACEDC3F46E
What happens when a master node goes down
This documentation applies to the following versions of Splunk® Enterprise: 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18