How indexer clusters handle report and data model acceleration summaries

By default, indexer clusters do not replicate report acceleration and data model acceleration summaries. This means that only primary bucket copies will have associated summaries.

You can configure the master so that the cluster does replicate summaries. All searchable bucket copies will then have associated summaries. This is the recommended behavior.

Note: The replicated summary feature is not available for peer nodes running version 6.3 or below.

For details on report acceleration and data model acceleration, read the chapter "Use data summaries to accelerate searches" in the Knowledge Manager Manual.

Where summaries reside

The summaries reside on the peer nodes in their own directories. You specify the directory locations in indexes.conf, with the summaryHomePath and tstatsHomePath attributes for the report acceleration and data model acceleration summaries, respectively. See the indexes.conf specification file for details.

A summary correlates with one or more buckets, depending on the summary's time span.

Replicated summaries

If you want the cluster to replicate summaries, you must set this attribute in the master node's server.conf file:

[clustering]
summary_replication = true

You must restart the master.

You can also use the CLI on the master node to set the attribute:

splunk edit cluster-config -summary_replication true

This command does not require a restart.

When the cluster is configured to replicate summaries, the cluster takes steps to ensure that each searchable bucket copy has an associated summary copy:

For hot buckets. The cluster creates a summary for each searchable copy of a hot bucket.

For warm/cold buckets. The cluster replicates summaries for searchable copies of warm or cold buckets, when necessary. The cluster will use replication to fill in any missing summaries for searchable copies of warm or cold buckets.

When you turn on summary replication for the first time, the cluster might need to replicate a large number of summaries. This can have an impact on network bandwidth. To limit the number of summary replications occurring simultaneously, you can change the value of the max_peer_sum_rep_load attribute in the master node's server.conf file. Its default value is 5.

Non-replicated summaries

If you keep the default behavior, the cluster will not replicate summaries. This section describes how the cluster handles non-replicated summaries.

A summary correlates with one or more buckets, depending on the summary's time span. When a summary is generated, it resides on the peer that holds the primary copy of the bucket for that time span. If the summary spans multiple buckets, and the primary copies of those buckets reside on multiple peers, then each of those peers will hold the corresponding part of the summary.

If primacy gets reassigned from one copy of a bucket to another (for example, because the peer holding the primary copy fails), the summary does not move to the peer with the new primary copy. Therefore, it becomes unavailable. It will not be available again until the next time Splunk Enterprise attempts to update the summary, finds that it is missing, and then regenerates it.

In multisite clusters, like single-site clusters, the summaries reside with the primary bucket copy. Because a multisite cluster has multiple primaries, one for each site that supports search affinity, the summaries reside with the particular primary that the generating search head accessed when running the search. Due to site affinity, that usually means that the summaries reside on primaries on the same site as the generating search head.

Summary replication and contention for resources

A search head with acceleration enabled runs special searches on the peers. These searches build the summaries. See, for example, the description of building report acceleration summaries in "Manage report acceleration" in the Knowledge Manager Manual.

In the case of replicated summaries on an indexer cluster, summaries are built on each searchable copy of a hot bucket. A peer node can be building summaries simultaneously both for copies of buckets originating on that peer and also for copies of buckets originating on other peers. This means that, with summary replication enabled, summary-generating searches use more resources across the cluster, and the searches to take longer to complete.

Related answers from Splunk Community

How indexer clusters handle report and data model acceleration summaries

Where summaries reside

Replicated summaries

Non-replicated summaries

Summary replication and contention for resources

Comments

How indexer clusters handle report and data model acceleration summaries

Was this topic useful?