Configuration parameters and the data pipeline
Data goes through several phases as it transitions from raw input to searchable events. This process is called the data pipeline and consists of four phases:
Each phase of the data pipeline relies on different configuration file parameters. Knowing which phase uses a particular parameter allows you to identify where in your Splunk deployment topology you need to set the parameter.
What the data pipeline looks like
This diagram outlines the data pipeline:
The Distributed Deployment manual describes the data pipeline in detail, in "How data moves through Splunk: the data pipeline".
How Splunk Enterprise components correlate to phases of the pipeline
One or more Splunk Enterprise components can perform each of the pipeline phases. For example, a universal forwarder, a heavy forwarder, or an indexer can perform the input phase.
Data only goes through each phase once, so each configuration belongs on only one component, specifically, the first component in the deployment that handles that phase. For example, say you have data entering the system through a set of universal forwarders, which forward the data to an intermediate heavy forwarder, which then forwards the data onwards to an indexer. In that case, the input phase for that data occurs on the universal forwarders, and the parsing phase occurs on the heavy forwarder.
|Data pipeline phase||Components that can perform this role|
Where to set a configuration parameter depends on the components in your specific deployment. For example, you set parsing parameters on the indexers in most cases. But if you have heavy forwarders feeding data to the indexers, you instead set parsing parameters on the heavy forwarders. Similarly, you set search parameters on the search heads, if any. But if you aren't deploying dedicated search heads, you set the search parameters on the indexers.
For more information, see "Components and roles" in the Distributed Deployment Manual.
How configuration parameters correlate to phases of the pipeline
This is a non-exhaustive list of configuration parameters and the pipeline phases that use them. By combining this information with an understanding of which Splunk component in your particular deployment performs each phase, you can determine where to configure each setting.
For example, if you are using universal forwarders to consume inputs, you need to configure
inputs.conf parameters on the forwarders. If, however, your indexer is directly consuming network inputs, you need to configure those network-related
inputs.conf parameters on the indexer.
- LINE_BREAKER, SHOULD_LINEMERGE, BREAK_ONLY_BEFORE_DATE, and all other line merging settings
- TZ, DATETIME_CONFIG, TIME_FORMAT, TIME_PREFIX, and all other time extraction settings and rules
- TRANSFORMS* which includes per-event queue filtering, per-event index assignment, per-event routing. Applied in the order defined
- MORE_THAN*, LESS_THAN*
- stanzas referenced by a TRANSFORMS* clause in props.conf
- LOOKAHEAD, DEST_KEY, WRITE_META, DEFAULT_VALUE, REPEAT_MATCH
- stanzas referenced by a REPORT* clause in props.conf
- filename, external_cmd, and all other lookup-related settings
- FIELDS, DELIMS
- lookup files in the lookups folders
- search and lookup scripts in the bin folders
- search commands and lookup scripts
Other configuration settings
There are some settings that don't work well in a distributed Splunk environment. These tend to be exceptional and include:
- CHECK_FOR_HEADER, LEARN_MODEL, maxDist. These are created in the parsing phase, but they require generated configurations to be moved to the search phase configuration location.
Attribute precedence within a single props.conf file
What's a Splunk index?
This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18