How handles your data
consumes data and indexes it, transforming it into searchable knowledge in the form of events. The data pipeline shows the main processes that act on the data during indexing. These processes constitute event processing. After the data is processed into events, you can associate the events with knowledge objects to enhance their usefulness.
The data pipeline
Incoming data moves through the data pipeline. For more detailed information, see How data moves through Splunk deployments: The data pipeline in the Distributed Deployment Manual.
Each processing component resides on one of the three typical processing tiers: the data input tier, the indexing tier, and the search management tier. Together, the tiers support the processes occurring in the data pipeline.
As data moves along the data pipeline, components transform the data from its origin in external sources, such as log files and network feeds, into searchable events that encapsulate valuable knowledge.
The data pipeline has these segments:
This diagram shows the main steps in the data pipeline. In the data input tier, consumes data from various inputs. Then, in the indexing tier, examines, analyzes, and transforms the data. then takes the parsed events and writes them to the index on disk. Finally, the search management tier manages all aspects of how the user accesses, views, and uses the indexed data.
Event processing occurs in two stages, parsing and indexing. All data enters through the parsing pipeline as large chunks. During parsing, the Splunk platform breaks these chunks into events. It then hands off the events to the indexing pipeline, where final processing occurs.
During both parsing and indexing, the Splunk platform transforms the data. You can configure most of these processes to adapt them to your needs.
In the parsing pipeline, the Splunk platform performs a number of actions. The following table shows some examples in addition to related information:
|Extracting a set of default fields for each event, including
|About default fields
|Configuring character set encoding.
|Configure character set encoding
|Identifying line termination using line breaking rules. You can also modify line termination settings interactively, using the Set Source Type page in Splunk Web.
|Configure event line breaking
Assign the correct source types to your data
|Identifying or creating timestamps. At the same time that it processes timestamps, Splunk software identifies event boundaries. You can modify timestamp settings interactively, using the Set Source Type page in Splunk Web.
|How timestamp assignment works
Assign the correct source types to your data
|Anonymizing data, based on your configuration. You can mask sensitive data (such as credit card or social security numbers) at this stage.
|Applying custom metadata to incoming events, based on your configuration.
|Assign default fields dynamically
In the indexing pipeline, the Splunk platform performs additional processing. For example:
- Breaking all events into segments that can then be searched. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression. See About event segmentation.
- Building the index data structures.
- Writing the raw data and index files to disk, where post-indexing compression occurs.
The distinction between parsing and indexing pipelines matters mainly for forwarders. Heavy forwarders can parse data locally and then forward the parsed data on to receiving indexers, where the final indexing occurs. Universal forwarders offer minimal parsing in specific cases such as handling structured data files. Additional parsing occurs on the receiving indexer.
For information about events and what happens to them during the indexing process, see Overview of event processing.
Enhance and refine events with knowledge objects
After the data has been transformed into events, you can make the events more useful by associating them with knowledge objects, such as event types, field extractions, and reports. For information about managing Splunk software knowledge, see the Knowledge Manager Manual, starting with What is Splunk knowledge?.
Other ways to get data in
How do you want to add data?
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2112, 8.2.2201, 8.2.2203, 9.0.2205, 8.2.2202, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305 (latest FedRAMP release), 9.1.2308, 9.1.2312