Splunk® Enterprise

Getting Data In

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

What Splunk does with your data (and how to make it do it better)

Splunk consumes any sort of data and indexes it, transforming it into useful and searchable knowledge in the form of events. The data pipeline, displayed below, shows the main processes that act on the data during indexing. These processes constitute event processing. After the data has been processed into events, you can associate the events with knowledge objects to further enhance their usefulness.

The data pipeline

Once a chunk of data enters Splunk, it moves through the data pipeline, which transforms the data into searchable events. This diagram shows the main steps in the data pipeline:

Datapipeline1 60.png

For a concise description of the data pipeline, see "How data moves through Splunk" in the Distributed Deployment manual.

Splunk makes reasonable decisions for most types of data during event processing, so that the resulting events are immediately useful and searchable. However, depending on the data and what sort of knowledge you need to extract from it, you might want to tweak one or more steps of event processing.

Event processing

Event processing occurs in two stages, parsing and indexing. All data that comes into Splunk enters through the parsing pipeline as large chunks. During parsing, Splunk breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs.

During both parsing and indexing, Splunk acts on the data, transforming it in various ways. Most of these processes are configurable, so you have the ability to adapt them to your needs. In the description that follows, each link takes you to a topic that discusses one of these processes, with information on ways you can configure it.

While parsing, Splunk performs a number of actions, including:

  • Extracting a set of default fields for each event, including host, source, and sourcetype.
  • Configuring character set encoding.
  • Identifying line termination using linebreaking rules. While many events are short and only take up a line or two, others can be long. You can also modify line termination settings interactively, using Splunk Web's data preview feature.
  • Identifying timestamps or creating them if they don't exist. At the same time that it processes timestamps, Splunk identifies event boundaries. You can also modify timestamp setings interactively, using Splunk Web's data preview feature.

In the indexing pipeline, Splunk performs additional processing, including:

  • Breaking all events into segments that can then be searched. You can determine the level of segmentation. The segmentation level affects indexing and searching speed, search capability, and efficiency of disk compression.
  • Building the index data structures.
  • Writing the raw data and index files to disk, where post-indexing compression occurs.

The distinction between parsing and indexing pipelines matters mainly for forwarders. Heavy forwarders can fully parse data locally and then forward the parsed data on to receiving indexers, where the final indexing occurs. With universal forwarders, on the other hand, the data gets forwarded after very minimal parsing. Most parsing then occurs on the receiving indexer.

  • For more information about events and what happens to them during the indexing process, see Overview of event processing in this manual.
  • A detailed diagram that depicts the indexing pipelines and explains how indexing works can be found in "How Indexing Works" in the Community Wiki.

Enhance and refine events

Once the data has been transformed into events, you can make the events even more useful by associating them with knowledge objects, such as event types, field extractions, and saved searches. For information about managing Splunk knowledge, read the Knowledge Manager manual, starting with "What is Splunk knowledge?".

About Windows data and Splunk
Monitor files and directories

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters