Configure input processing
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Configure input processing
This topic mentions the main issues for getting your data into Splunk cleanly.
Splunk is very flexible: it processes almost any type of data and stores the indexed data in flat files. You can also add Splunk structure or knowledge -- fields, event types, and so on -- at any time. Most knowledge resides in a configuration layer and is instantiated at search time, which means it can be easily revised or changed. But there are some things that are baked into the index and that it's really important to get right. Splunk does its best to guess about these things and can do okay, but you want to make sure that you have them right before you roll out a large deployment. It's always recommended that you experiment using a test index before you roll out Splunk.
Configure linebreaking
Some events are made up of more than one line. Usually, Splunk can automatically figure out the event boundaries. However, if event boundary recognition is not working as desired, you can set custom rules using Splunk's configuration files.
To configure multi-line events, examine the format of the events. Determine a pattern in the events to set as the start or end of an event. Then, edit $SPLUNK_HOME/etc/system/local/props.conf, and set the necessary attributes for your data handling.
See Configure linebreaking for multi-line events in the Admin manual for more information.
Configure default fields
Each entry (event) in a Splunk index includes the following four default fields:
- timestamp: The timestamp of the event. Usually this is fairly straightforward, but if some of your logs have an unusual or obscure format for the timestamp, you need to explicitly configure Splunk to recognize it. (Or, even better, change the format in your logs.)
- host: The host where the event originated. You may need to do special configuration if you are working in a centralized log server environment, if you are working with files forwarded from other hosts, or if you want to ensure that Splunk uses a hostname instead of an IP address.
- source: The location where Splunk found the data -- for example, from a network port or from a file or directory on disk. This is usually straightforward.
- sourcetype: The type of log file or other data input -- for example, events from Apache log records may be assigned an
access_combinedsource type. The source type gives important information about the format of the data -- for example, Apache log entries all have the same fields, the same timestamp format and location, and so on. Giving the same sourcetype to events with the same format means you can add structure to events that look alike. Events with the same source type can come from different sources and events from a single source (such as syslog) can have different source types.
See About default fields in the Admin manual for an in-depth discussion.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.