Buckets and clusters
Splunk stores indexed data in buckets, which are directories containing the data itself, as well as index files into the data. An index typically consists of many buckets, organized by age of the data.
The cluster replicates data on a bucket-by-bucket basis. The original bucket copy and its replicated copies on other peer nodes contain identical sets of data, although only searchable copies also contain the index files.
In a cluster, copies of buckets originating from a single source peer can be spread across many target peers. For example, if you have five peers in your cluster and a replication factor of 3 (a typical scenario for horizontal scaling), the cluster will maintain three copies of each bucket (the original copy on the source peer and replicated copies on two target peers). Each time the source peer starts a new hot bucket, the master gives the peer a new set of target peers to replicate data to. Therefore, while the original copies will all be on the source peer, the replicated copies of those buckets will be randomly spread across the other peers.This behavior is not currently configurable. The one certainty is that you'll never have two copies of the same bucket on the same peer.
The following diagram shows the scenario just described - five peers, a replication factor of 3, and seven original buckets, with their copies spread across all the peers. To reduce complexity, the diagram only shows the buckets for data originating from one peer. In a real-life scenario, most, if not all, of the other peers would also be originating data and replicating it to other peers on the cluster.
In this diagram, 1A is a source bucket. 1B and 1C are copies of that bucket. The diagram uses the same convention with 2A/B/C, 3A/B/C, and so on.
You need a good grasp of buckets to understand cluster architecture. The rest of this section describes some bucket concepts of particular importance for a clustered deployment. For a thorough introduction to buckets, read "How Splunk stores indexes".
There are two types of files in a bucket:
- The processed external data in compressed form (rawdata)
- Indexes that point to the rawdata (index files)
Rawdata is not actually "raw" data, as the term might be defined by a dictionary. Rather, it consists of the external data after it has been processed into events. The processed data is stored in a compressed rawdata journal file. As a journal file, the rawdata file, in addition to containing the event data, contains all information necessary to generate the associated index files, if they're missing.
When a peer node receives a block of data from a forwarder, it processes the data and adds it to the rawdata file in its local hot bucket. It also indexes it, creating the associated index files. In addition, it streams copies of only the processed rawdata to each of its target peers, which then adds it to the rawdata file in its own copy of the bucket. The rawdata in both the original and the replicated bucket copies are identical.
If the cluster has a search factor of 1, the target peers store only the rawdata in the bucket copies. They do not generate index files for the data. By not storing the index files on the target peers, you limit storage requirements. Because the rawdata is stored as a journal file, if the peer maintaining the original, fully indexed data goes down, one of the target peers can step in and generate the indexes from its copy of the rawdata.
If the cluster has a search factor greater than 1, some or all of the target peers will also create index files for the data. For example, say you have a replication factor of 3 and a search factor of 2. In that case, the source peer will stream its rawdata to two target peers. One of those peers will then use the rawdata to create index files, which it stores in its copy of the bucket. That way, there will be two searchable copies of the data (the original copy and the replicated copy with the index files). As described in "Search factor", this allows the cluster to recover more quickly in case of peer node failure. For more information on searchable bucket copies, see "Bucket searchability" later in this topic.
As a bucket ages, it rolls through several stages:
For detailed information about these stages, read "How Splunk stores indexes".
For the immediate discussion of cluster architecture, you just need a basic understanding of these bucket stages. A hot bucket is a bucket that's still being written to. When an indexer finishes writing to a hot bucket (for example, because the bucket reaches a maximum size), it rolls the bucket to warm and begins writing to a new hot bucket. Warm buckets are readable (for example, for searching) but the indexer does not write new data to them. Eventually, a bucket rolls to "cold" and then to "frozen", at which point it gets archived or deleted.
There are a couple other details that are important to keep in mind:
- Hot/warm and cold buckets are stored in separately configurable locations on the source peer node.
- All replicated copies of buckets are stored in the cold location on the target nodes, no matter whether they're hot, warm, or cold.
- The filename of a warm or cold bucket includes the time range of the data in the bucket. For detailed information on bucket naming conventions, read "What the index directories look like".
- Searches occur across hot, warm, and cold buckets.
- The conditions that cause buckets to roll are completely configurable, as described in "How Splunk stores indexes".
- For storage hardware information, such as help on estimating storage requirements, read "Storage considerations".
Bucket searchability and primacy states
Since a cluster maintains multiple copies of a bucket, there needs to be a way to identify which copy of a bucket gets searched. At any time, a copy of a bucket is either searchable or non-searchable. In addition, a searchable copy of a bucket can be primary or non-primary.
A bucket copy is searchable if it contains index files in addition to the rawdata file. The peer receiving the external data indexes the rawdata and also sends copies of the rawdata to its peers. If the search factor is greater than 1, some or all of those peers will also generate index files for the buckets they're replicating. So, for example, if you have a replication factor of 3 and a search factor of 2 and the cluster is complete, the cluster will contain three copies of each bucket. All three copies will contain the rawdata file, and two of the copies will also contain index files and therefore be searchable (the copy on the source peer and one of the copies on the target peers). The third copy will be non-searchable, but it can be made searchable if necessary. The main reason that a non-searchable copy gets made searchable is because a peer holding a searchable copy of the bucket goes down.
A primary copy of a bucket is the searchable copy that participates in a search. A valid cluster has exactly one primary copy of each bucket. That way, one and only one copy of each bucket gets searched. If a node with primary copies goes down, searchable but non-primary copies on other nodes can immediately be designated as primary, thus allowing searches to continue without any need to first wait for new index files to be generated.
Initially, the copy of the bucket on the peer originating the data is the primary copy, but this can change over time. For example, if the peer goes down, primacy gets transferred to a searchable copy on one of the target peers.
Important: Primacy reassignment occurs only if a peer goes down. In that case, the master reassigns primacy from any primary copies on the downed peer to searchable copies on remaining peers. For more information on this process, read "What happens when a peer node goes down".
The following diagram shows buckets spread across all the peers, as in the previous diagram. The cluster has a replication factor of 3 and a search factor of 2, which means that the cluster maintains two searchable copies of each bucket. Here, the copies of the buckets on the source peer are all primary (and therefore also searchable). The buckets' second searchable (but non-primary) copies are spread among most of the remaining peers in the cluster.
The set of primary bucket copies define a cluster's generation, as described in the next section.
A generation identifies which copies of a cluster's buckets are primary and therefore will participate in a search. The generation changes over time, as peers leave and join the cluster. When a peer goes down, its primary bucket copies get reassigned to other peers.
Note: The actual set of buckets that get searched also depends on factors such as the search time range. This is true for any indexer, clustered or not.
Here's another way of defining a generation: A generation is a snapshot of a valid state of the cluster; "valid" in the sense that every bucket on the cluster has exactly one primary copy.
At any point, all peers that are currently registered with the master participate in the current generation. When a peer joins or leaves the cluster, the master creates a new generation.
How cluster components use the generation
Here's how the various cluster components use generation information:
- The master creates each new generation, and assigns a generation ID to it. When necessary, it communicates the current generation ID to the peers and the search head. It also keeps track of the primary bucket copies for each generation and which peers they're located on.
- The peers keep track of which of their bucket copies are primary for each generation. The peers retain primacy information across multiple generations.
- For each search, the search head uses the generation ID that it gets from the master to determine which peers to search across.
When the generation changes
The generation changes under these circumstances:
- The master comes online.
- A peer is added to the cluster.
- A peer goes down, either intentionally (through the CLI
offlinecommand) or unintentionally (by crashing).
When a peer goes down, the master reassigns primacy from bucket copies on the downed node to searchable copies of the same buckets on the remaining nodes and creates a new generation.
The master does not create a new generation merely when a bucket rolls from hot to warm, thus causing a new hot bucket to get created (unless the bucket rolled for one of the reasons listed above). In that situation, the set of peers doesn't change. The search head only needs to know which peers are part of the generation; that is, which peers are participating in the cluster. It does not need to know which bucket copies on a particular peer are primary; the peer itself keeps track of that information.
How the generation is used in searches
The search head polls the master for the latest generation information at regular intervals. When the generation changes, the master gives the search head the new generation ID and a list of the peers that belong to that generation. The search head, in turn, gives the peers the ID whenever it initiates a search. The peers use the ID to identify which of their buckets are primary for that search.
Usually, a search occurs over the most recent generation of primary bucket copies. In the case of long-running searches, however, it's possible that a search could be running across an earlier generation; typically, because a peer went down in the middle of the search. This allows the long-running search to complete, even though some data might be missing (due to a downed node). The alternative would be to start the search over again, which you can always do manually if necessary.
Why a downed peer causes the generation to change
The reason that a downed peer causes the master to create a new generation is because, when a peer goes down, the master reassigns the downed peer's primary copies to copies on other peers. A copy that wasn't primary for a previous generation becomes primary in the new generation. By knowing the generation ID associated with a search, a peer is able to determine which of its buckets are primary for that search.
For example, the diagram that follows shows the same simplified version of a cluster as earlier, after the source node holding all the primary copies has gone down and the master has directed the remaining peers in fixing the buckets. First, the master reassigned primacy to the remaining searchable copy of each bucket. Next, it directed the peers to make their non-searchable copies searchable, to make up for the missing set of searchable copies. Finally, it directed the replication of a new set of non-searchable copies (1D, 2D, etc.), spread among the remaining peers.
Even though the source node went down, the cluster was able to fully recover both its complete and valid states, with replication factor number (3) of total bucket copies, search factor number (2) of searchable bucket copies, and exactly one primary copy of each bucket. This represents a different generation from the previous diagram, because primary copies have moved to different peers.
Note: This diagram only shows the buckets originating from one of the peers. A more complete version of this diagram would show buckets originating from several peers as they've migrated around the cluster.
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18