
Configure event linebreaking
Some events consist of more than one line. Splunk handles most multi-line events correctly by default. If you have multi-line events that Splunk doesn't handle properly, you need to configure Splunk to change its linebreaking behavior.
How Splunk determines event boundaries
Splunk determines event boundaries in two steps:
1. Line breaking, which uses the LINE_BREAKER
attribute's regex value to split the incoming stream of bytes into separate lines. By default, the LINE_BREAKER
is any sequence of newlines and carriage returns (that is, ([\r\n]+)
).
2. Line merging, which only occurs when the SHOULD_LINEMERGE
attribute is set to "true" (the default). This step uses all the other line merging settings (for example, BREAK_ONLY_BEFORE, BREAK_ONLY_BEFORE_DATE, MUST_BREAK_AFTER,
etc.) to merge the previously-separated lines into events.
If the second step does not run (because you set the SHOULD_LINEMERGE
attribute to "false"), then the events are simply the individual lines determined by LINE_BREAKER
. The first step is relatively efficient, while the second is relatively slow. If you are clever with the LINE_BREAKER
regex, you can often make Splunk get the desired result by using only the first step, and skipping the second step. This is particularly valuable if a significant amount of your data consists of multi-line events.
How to configure event boundaries
Many event logs have a strict one-line-per-event format, but some do not. Usually, Splunk can automatically recognize the event boundaries. However, if event boundary recognition is not working right, you can set custom rules in props.conf.
To configure multi-line events, first examine the format of the events. Determine a pattern in the events to set as the start or end of an event. Then, edit $SPLUNK_HOME/etc/system/local/props.conf
, and set the necessary attributes to configure your data.
There are two ways to handle multi-line events:
- Break the data stream into lines and reassemble into events. This method usually simplifies the configuration process, as it gives you access to several attributes that you can use to define line-merging rules. Use the
LINE_BREAKER
attribute to break the data stream into multiple lines. Along with this, setSHOULD_LINEMERGE=true
and set your line-merging attributes (BREAK_ONLY_BEFORE
, etc.) to tell Splunk how to reassemble the lines into events. If your data conforms well to the defaultLINE_BREAKER
setting (any number of newlines and carriage returns), you don’t need to alterLINE_BREAKER
. Instead, just setSHOULD_LINEMERGE=true
and use the line-merging attributes to reassemble it. - Break the data stream directly into real events using the
LINE_BREAKER
feature. This might increase your indexing speed, but is somewhat more difficult to work with. If you're finding that indexing is slow and a significant amount of your data consists of multi-line events, this method can provide significant improvement. Use theLINE_BREAKER
attribute withSHOULD_LINEMERGE=false
.
These attributes are described below.
Linebreaking general attributes
These are the props.conf
attributes that affect linebreaking:
TRUNCATE = <non-negative integer>
- Change the default maximum line length (in bytes). Note that although this attribute is a byte measurement, Splunk rounds down line length when this attribute would otherwise land mid-character for multi-byte characters.
- Set to 0 if you never want truncation (very long lines are, however, often a sign of garbage data).
- Defaults to 10000 bytes.
LINE_BREAKER = <regular expression>
- Specifies a regex that determines how the raw text stream is broken into initial events, before any line merging takes place (if specified by the
SHOULD_LINEMERGE
attribute, described below). - Defaults to
([\r\n]+)
, meaning data is broken into an event for each line, delimited by any number of carriage return (\r
) or newline (\n
) characters. - The regex must contain a capturing group -- a pair of parentheses that defines an identified subcomponent of the match.
- Wherever the regex matches, Splunk considers the start of the first capturing group to be the end of the previous event, and considers the end of the first capturing group to be the start of the next event.
- The contents of the first capturing group are discarded, and will not be present in any event. You are telling Splunk that this text comes between lines.
- Note: You can realize a significant boost to processing speed when you use
LINE_BREAKER
to delimit multi-line events (as opposed to usingSHOULD_LINEMERGE
to reassemble individual lines into multi-line events). Consider using this method if a significant portion of your data consists of multi-line events. - See the props.conf specification file for more details, including information on how to use
LINE_BREAKER
with branched expressions.
LINE_BREAKER_LOOKBEHIND = <integer>
- When there is leftover data from a previous raw chunk,
LINE_BREAKER_LOOKBEHIND
indicates the number of characters before the end of the raw chunk (with the next chunk concatenated) that Splunk applies theLINE_BREAKER
regex. You might want to increase this value from its default if you are dealing with especially large or multi-line events. - Defaults to 100.
SHOULD_LINEMERGE = [true|false]
- When set to true, Splunk combines several input lines into a single event, with configuration based on the attributes described in the next section.
- Defaults to true.
Attributes that are available only when SHOULD_LINEMERGE is set to true
When SHOULD_LINEMERGE=true
(the default), use these attributes to define linebreaking behavior:
BREAK_ONLY_BEFORE_DATE = [true|false]
- When set to true, Splunk creates a new event if, and only if, it encounters a new line with a date.
- Defaults to true.
- Note: If
DATETIME_CONFIG
is set toCURRENT
orNONE
, this attribute is not meaningful, because in those cases, Splunk does not identify timestamps.
BREAK_ONLY_BEFORE = <regular expression>
- When set, Splunk creates a new event if, and only if, it encounters a new line that matches the regular expression.
- Defaults to empty.
MUST_BREAK_AFTER = <regular expression>
- When set and the regular expression matches the current line, Splunk always creates a new event for the next input line.
- Splunk might still break before the current line if another rule matches.
- Defaults to empty.
MUST_NOT_BREAK_AFTER = <regular expression>
- When set and the current line matches the regular expression, Splunk does not break on any subsequent lines until the
MUST_BREAK_AFTER
expression matches. - Defaults to empty.
MUST_NOT_BREAK_BEFORE = <regular expression>
- When set and the current line matches the regular expression, Splunk does not break the last event before the current line.
- Defaults to empty.
MAX_EVENTS = <integer>
- Specifies the maximum number of input lines that will be added to any event.
- Splunk will break after the specified number of lines are read.
- Defaults to 256.
Examples
Specify event breaks
[my_custom_sourcetype] BREAK_ONLY_BEFORE = ^\d+\s*$
This example instructs Splunk to divide events by assuming that any line that consists of only digits is the start of a new event. It does this for any data whose source type is set to my_custom_sourcetype
.
Merge multiple lines into a single event
The following log event contains several lines that are part of the same request. The differentiator between requests is "Path". For this example, assume that all these lines need to be shown as a single event entry.
{{"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""}}
To index this multiple line event properly, use the Path
differentiator in your configuration. Add the following to your $SPLUNK_HOME/etc/system/local/props.conf
:
[source::source-to-break] SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = Path=
This code tells Splunk to merge the lines of the event, and only break before the term Path=
.
Multi-line event linebreaking and segmentation limitations
Splunk applies linebreaking and segmentation limitations to extremely large events:
- Lines over 10,000 bytes: Splunk breaks lines over 10,000 bytes into multiple lines of 10,000 bytes each when it indexes them. It appends the field
meta::truncated
to the end of each truncated section. However, Splunk still groups these lines into a single event. - Segmentation for events over 100,000 bytes: In search results, Splunk only displays the first 100,000 bytes of an event. Segments after those first 100,000 bytes of a very long line are still searchable, however.
- Segmentation for events over 1,000 segments: In search results, Splunk displays the first 1,000 individual segments of an event as segments separated by whitespace and highlighted on mouseover. It displays the rest of the event as raw text without interactive formatting.
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around linebreaking.
PREVIOUS Configure character set encoding |
NEXT Configure event timestamps |
This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18
Comments
Does this article apply to the props.conf on the univeral forwarder, to a full splunk instance just set up to forward, or only on splunk indexers?
Hi Cneberg,<br /><br />This article only applies to full instances of Splunk, or light forwarders. The universal forwarder is incapable of transforming data prior to sending it to an indexer.