Splunk® Enterprise

Distributed Deployment Manual

Download manual as PDF

Download topic as PDF

How data moves through Splunk deployments: The data pipeline

The processing tiers in a Splunk deployment correspond to the data pipeline, which is the route that data takes through Splunk software.

The processing tiers and the data pipeline

A Splunk deployment typically has three processing tiers:

  • Data input
  • Indexing
  • Search management

See "Scale your deployment with Splunk Enterprise components."

Each Splunk processing component resides on one of the tiers. Together, the tiers support the processes occurring in the data pipeline.

As data moves along the data pipeline, Splunk components transform the data from its origin in external sources, such as log files and network feeds, into searchable events that encapsulate valuable knowledge.

The data pipeline has these segments:

The correspondence between the three typical processing tiers and the four data pipeline segments is this:

  • The data input tier handles the input segment.
  • The indexing tier handles the parsing and indexing segments.
  • The search management tier handles the search segment.

This diagram outlines the data pipeline:

Datapipeline1 60.png

Splunk components participate in one or more segments of the data pipeline. See "Components and the data pipeline."

Note: The diagram represents a simplified view of the indexing architecture. It provides a functional view of the architecture and does not fully describe Splunk software internals. In particular, the parsing pipeline actually consists of three pipelines: parsing, merging, and typing, which together handle the parsing function. The distinction can matter during troubleshooting, but does not ordinarily affect how you configure or deploy Splunk Enterprise components. For a more detailed diagram of the data pipeline, see "How Indexing Works" in the Community Wiki.

The data pipeline segments in depth

This section provides more detail about the segments of the data pipeline. For more information on the parsing and indexing segments, see also "How indexing works" in the Managing Indexers and Clusters of Indexers manual.

Input

In the input segment, Splunk software consumes data. It acquires the raw data stream from its source, breaks it into 64K blocks, and annotates each block with some metadata keys. The keys apply to the entire input source overall. They include the host, source, and source type of the data. The keys can also include values that are used internally, such as the character encoding of the data stream, and values that control later processing of the data, such as the index into which the events should be stored.

During this phase, Splunk software does not look at the contents of the data stream, so the keys apply to the entire source, not to individual events. In fact, at this point, Splunk software has no notion of individual events at all, only of a stream of data with certain global properties.

Parsing

During the parsing segment, Splunk software examines, analyzes, and transforms the data. This is also known as event processing. It is during this phase that Splunk software breaks the data stream into individual events.The parsing phase has many sub-phases:

  • Breaking the stream of data into individual lines.
  • Identifying, parsing, and setting timestamps.
  • Annotating individual events with metadata copied from the source-wide keys.
  • Transforming event data and metadata according to regex transform rules.

Indexing

During indexing, Splunk software takes the parsed events and writes them to the index on disk. It writes both compressed raw data and the corresponding index files.

For brevity, parsing and indexing are often referred together as the indexing process. At a high level, that makes sense. But when you need to examine the actual processing of data more closely or decide how to allocate your components, it can be important to consider the two segments individually.

Search

The search segment manages all aspects of how the user accesses, views, and uses the indexed data. As part of the search function, Splunk software stores user-created knowledge objects, such as reports, event types, dashboards, alerts, and field extractions. The search function also manages the search process itself.

Where to go next

While the data pipeline processes always function in approximately the same way, no matter the size and nature of your deployment, it is important to take the pipeline into account when designing your deployment. For that, you must understand how Splunk components map to the data pipeline segments. See "Components and the data pipeline."

PREVIOUS
Use clusters for high availability and ease of management
  NEXT
Components and the data pipeline

This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.2.0, 7.2.1


Comments

This is missing the scheduler. I was sent to this page from the default-mode.conf spec file. And this is undocumented (although it works):

[pipeline:scheduler]
disabled = true

Skawasaki splunk, Splunker
May 10, 2018

Sgoodman - That may be so, but since the wiki is caveated as obsolescent, where will the detailed information be re-hosted? It'd be a shame to lose anything useful.

DUThibault
February 7, 2018

Joxley - That community wiki page is really helpful in its level of detail, particularly for troubleshooting. The topic at hand, though, is meant to serve as a high-level introduction to the data pipeline, covering the key elements necessary to deploy Splunk Enterprise. To focus on that goal, we have abstracted out much of the detail of the data pipeline process.

Sgoodman, Splunker
January 19, 2016

This Community Wiki page goes into more detail and is very helpful https://wiki.splunk.com/Community:HowIndexingWorks

Joxley
January 19, 2016

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters