Configuration parameters and the data pipeline
Contents
Configuration parameters and the data pipeline
Data goes through several phases as it transitions from raw input to searchable events. This process is called the data pipeline and consists of four phases:
Key configuration file parameters are associated with specific phases of the data pipeline. Knowing which phase uses a parameter can help you to understand better what the parameter does. It also allows you to identify where in your Splunk deployment topology you need to set the parameter.
The Distributed Deployment manual describes the data pipeline in detail, in "How data moves through Splunk: the data pipeline". Then, in "Components and roles", it describes how the various Splunk components, such as forwarders and indexers, correlate to the different phases of the pipeline. For instance, the input phase is handled by either a forwarder or an indexer; the search phase is handled by either an indexer or a search head.
Which configuration parameters go with which pipeline phases
This is a non-exhaustive list of configuration parameters and the pipeline phases that use them. By combining this information with an understanding of which Splunk component in your particular deployment performs each phase, you can determine where to configure each setting.
For example, if you are using universal forwarders to consume inputs, you need to configure inputs.conf parameters on the forwarders. If, however, your indexer is directly consuming network inputs, you need to configure those network-related inputs.conf parameters on the indexer.
For information on the correlation between components and pipeline phases, see "Components and roles" in the Distributed Deployment manual.
Input phase
- inputs.conf
- props.conf
- CHARSET
- NO_BINARY_CHECK
- CHECK_METHOD
- sourcetype
- wmi.conf
- regmon-filters.conf
Parsing phase
- props.conf
- LINE_BREAKER, SHOULD_LINEMERGE, BREAK_ONLY_BEFORE_DATE, and all other line merging settings
- TZ, DATETIME_CONFIG, TIME_FORMAT, TIME_PREFIX, and all other time extraction settings and rules
- TRANSFORMS* which includes per-event queue filtering, per-event index assignment, per-event routing. Applied in the order defined
- SEDCMD*
- MORE_THAN*, LESS_THAN*
- transforms.conf
- stanzas referenced by a TRANSFORMS* clause in props.conf
- LOOKAHEAD, DEST_KEY, WRITE_META, DEFAULT_VALUE, REPEAT_MATCH
- datetime.xml
Indexing phase
- props.conf
- SEGMENTATION*
- indexes.conf
- segmenters.conf
Search phase
- props.conf
- EXTRACT*
- REPORT*
- LOOKUP*
- KV_MODE
- FIELDALIAS*
- rename
- transforms.conf
- stanzas referenced by a REPORT* clause in props.conf
- filename, external_cmd, and all other lookup-related settings
- FIELDS, DELIMS
- MV_ADD
- lookup files in the lookups folders
- search and lookup scripts in the bin folders
- search commands and lookup scripts
- savedsearches.conf
- eventtypes.conf
- tags.conf
- commands.conf
- alert_actions.conf
- macros.conf
- fields.conf
- transactiontypes.conf
- multikv.conf
Other configuration settings
There are some settings that don't work well in a distributed Splunk environment. These tend to be exceptional and include:
- props.conf
- CHECK_FOR_HEADER, LEARN_MODEL, maxDist. These are created in the parsing phase, but they require generated configurations to be moved to the search phase configuration location.
This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 View the Article History for its revisions.