Configure linebreaking for multi-line events
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Configure linebreaking for multi-line events
Overview of multi-line events and event linebreaking
Some events are made up of more than one line. Splunk handles most of these kinds of events correctly by default, but there are cases of multi-line events that Splunk doesn't recognize properly by default. These require special configuration to change Slunk's default linebreaking behavior.
Multi-line event linebreaking and segmentation limitations
Splunk does apply limitations to extremely large events when it comes to linebreaking and segmentation:
- Lines over 10,000 bytes: Splunk breaks lines over 10,000 bytes into multiple lines of 10,000 bytes each when it indexes them. It appends the field
meta::truncatedto the end of each truncated section. However, Splunk still groups these lines into a single event. - Segmentation for events over 100,000 bytes: Splunk only displays the first 100,000 bytes of an event in the search results. Segments after those first 100,000 bytes of a very long line are still searchable, however.
- Segmentation for events over 1,000 segments: Splunk displays the first 1,000 individual segments of an event as segments separated by whitespace and highlighted on mouseover. It displays the rest of the event as raw text without interactive formatting.
Configuration
Many event logs have a strict one-line-per-event format, but some do not. Usually, Splunk can automatically figure out the event boundaries. However, if event boundary recognition is not working as desired, you can set custom rules by configuring props.conf.
To configure multi-line events, examine the format of the events. Determine a pattern in the events to set as the start or end of an event. Then, edit $SPLUNK_HOME/etc/system/local/props.conf, and set the necessary attributes for your data handling.
There are two ways to handle multi-line events:
- Break the event stream into real events. This is recommended, as it increases indexing speed significantly. Use
LINE_BREAKER(see below). - Break the event stream into lines and reassemble. This is slower but affords more robust configuration options. Use any linebreaking attribute besides
LINE_BREAKER(see below).
Linebreaking general attributes
These are the props.conf attributes that affect linebreaking:
TRUNCATE = <non-negative integer>
- Change the default maximum line length (in bytes).
- Set to 0 if you never want truncation (very long lines are, however, often a sign of garbage data).
- Defaults to 10000 bytes.
LINE_BREAKER = <regular expression>
- If not set, the raw stream will be broken into an event for each line delimited by \r or \n.
- If set, the given regex will be used to break the raw stream into events.
- The regex must contain a matching group.
- Wherever the regex matches, the start of the first matched group is considered the first text NOT in the previous event.
- The end of the first matched group is considered the end of the delimiter and the next character is considered the beginning of the next event.
- For example, "LINE_BREAKER = ([\r\n]+)" is equivalent to the default rule.
- The contents of the first matching group will not occur in either the previous or next events.
- Note: There is a significant speed boost by using the LINE_BREAKER to delimit multi-line events rather than using line merging to reassemble individual lines into events.
LINE_BREAKER_LOOKBEHIND = <integer> (100)
- Change the default lookbehind for the regex based linebreaker.
- When there is leftover data from a previous raw chunk, this is how far before the end of the raw chunk (with the next chunk concatenated) Splunk begins applying the regex.
SHOULD_LINEMERGE = <true/false>
- When set to true, Splunk combines several input lines into a single event, with configuration based on the attributes described below.
- Defaults to true.
Attributes available only when SHOULD_LINEMERGE = true
When SHOULD_LINEMERGE is set to true, these additional attributes have meaning:
AUTO_LINEMERGE = <true/false>
- Directs Splunk to use automatic learning methods to determine where to break lines in events.
- Defaults to true.
BREAK_ONLY_BEFORE_DATE = <true/false>
- When set to true, Splunk will create a new event if and only if it encounters a new line with a date.
- Defaults to false.
BREAK_ONLY_BEFORE = <regular expression>
- When set, Splunk will create a new event if and only if it encounters a new line that matches the regular expression.
- Defaults to empty.
MUST_BREAK_AFTER = <regular expression>
- When set, and the regular expression matches the current line, Splunk always creates a new event for the next input line.
- Splunk may still break before the current line if another rule matches.
- Defaults to empty.
MUST_NOT_BREAK_AFTER = <regular expression>
- When set and the current line matches the regular expression, Splunk will not break on any subsequent lines until the MUST_BREAK_AFTER expression matches.
- Defaults to empty.
MUST_NOT_BREAK_BEFORE = <regular expression>
- When set and the current line matches the regular expression, Splunk will not break the last event before the current line.
- Defaults to empty.
MAX_EVENTS = <integer>
- Specifies the maximum number of input lines that will be added to any event.
- Splunk will break after the specified number of lines are read.
- Defaults to 256.
Examples
Specify event breaks
[my_custom_sourcetype] BREAK_ONLY_BEFORE = ^\d+\s*$
This example instructs Splunk to divide events in a file or stream by presuming any line that consists of all digits is the start of a new event, for any source whose source type was configured or determined by Splunk to be sourcetype::my_custom_sourcetype .
Merge multiple lines into a single event
The following log event contains several lines that are part of the same request. The differentiator between requests is "Path". For this example, assume that all these lines need to be shown as a single event entry.
{{"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""}}
To index this multiple line event properly, use the Path differentiator in your configuration. Add the following to your $SPLUNK_HOME/etc/system/local/props.conf:
[source::source-to-break] SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = Path=
This code tells Splunk to merge the lines of the event, and only break before the term Path=.
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around multi-line event processing.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.