Splunk® Enterprise

Distributed Deployment Manual

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

How data moves through Splunk Enterprise: the data pipeline

Data in Splunk Enterprise transitions through several phases, as it moves along the data pipeline from its origin in sources such as logfiles and network feeds to its transformation into searchable events that encapsulate valuable knowledge. The data pipeline includes these segments:

You can assign each of these segments to a different Splunk Enterprise instance, as described here.

This diagram outlines the data pipeline:

Datapipeline1 60.png

Splunk Enterprise instances participate in one or more segments of the data pipeline, as described in "Scale your deployment".

Note: The diagram represents a simplified view of the indexing architecture. It provides a functional view of the architecture and does not fully describe Splunk Enterprise internals. In particular, the parsing pipeline actually consists of three pipelines: parsing, merging, and typing, which together handle the parsing function. The distinction can matter during troubleshooting, but does not ordinarily affect how you configure or deploy Splunk Enterprise.

The data pipeline and structured data

For certain types of structured data - data that resides in a file that has headers and fields separated by specific characters - not all components of this pipeline apply. When you collect structured data, you must configure data collection so that it arrives at the indexer in the format that you want. In environments with forwarders, this must happen at the forwarder. See "Extract data from files with headers".

Input

In the input segment, Splunk Enterprise consumes data. It acquires the raw data stream from its source, breaks it into 64K blocks, and annotates each block with some metadata keys. The keys apply to the entire input source overall. They include the host, source, and source type of the data. The keys can also include values that are used internally by Splunk Enterprise, such as the character encoding of the data stream, and values that control later processing of the data, such as the index into which the events should be stored.

During this phase, Splunk Enterprise does not look at the contents of the data stream, so the keys apply to the entire source, not to individual events. In fact, at this point, Splunk Enterprise has no notion of individual events at all, only of a stream of data with certain global properties.

Parsing

During the parsing segment, Splunk Enterprise examines, analyzes, and transforms the data. This is also known as event processing. During this phase, Splunk Enterprise breaks the data stream into individual events. The parsing phase has many sub-phases:

  • Breaking the stream of data into individual lines.
  • Identifying, parsing, and setting timestamps.
  • Annotating individual events with metadata copied from the source-wide keys.
  • Transforming event data and metadata according to Splunk Enterprise regex transform rules.

Indexing

During indexing, Splunk Enterprise takes the parsed events and writes them to the index on disk. It writes both compressed raw data and the corresponding index files.

For brevity, parsing and indexing are often referred together as the indexing process. At a high level, that's fine. But when you need to look more closely at the actual processing of data, it can be important to consider the two segments individually.

A detailed diagram that depicts the indexing pipelines and explains how indexing works can be found in "How Indexing Works" in the Community Wiki.

Search

Splunk Enterprise's search function manages all aspects of how the user sees and uses the indexed data, including interactive and scheduled searches, reports and charts, dashboards, and alerts. As part of its search function, Splunk Enterprise stores user-created knowledge objects, such as saved searches, event types, views, and field extractions.

For more information on the various steps in the pipeline, see "How indexing works" in the Managing Indexers and Clusters of Indexers manual.

PREVIOUS
Distributed Splunk Enterprise overview
  NEXT
Scale your deployment: Splunk Enterprise components

This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters