Getting Data In

 


Configure event linebreaking

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Configure event linebreaking

Some events consist of more than one line. Splunk handles most multi-line events correctly by default. If you have multi-line events that Splunk doesn't handle properly, you need to configure Splunk to change its linebreaking behavior.

How Splunk determines event boundaries

Splunk determines event boundaries in two steps:

1. Line breaking, which uses the LINE_BREAKER attribute's regex value to split the incoming stream of bytes into separate lines. By default, the LINE_BREAKER is any sequence of newlines and carriage returns (that is, ([\r\n]+)).

2. Line merging, which only occurs when the SHOULD_LINEMERGE attribute is set to "true" (the default). This step uses all the other line merging settings (for example, BREAK_ONLY_BEFORE, BREAK_ONLY_BEFORE_DATE, MUST_BREAK_AFTER, etc.) to merge the previously-separated lines into events.

If the second step does not run (because you set the SHOULD_LINEMERGE attribute to "false"), then the events are simply the individual lines determined by LINE_BREAKER. The first step is relatively efficient, while the second is relatively slow. If you are clever with the LINE_BREAKER regex, you can often make Splunk get the desired result by using only the first step, and skipping the second step. This is particularly valuable if a significant amount of your data consists of multi-line events.

How to configure event boundaries

Many event logs have a strict one-line-per-event format, but some do not. Usually, Splunk can automatically recognize the event boundaries. However, if event boundary recognition is not working right, you can set custom rules in props.conf.

To configure multi-line events, first examine the format of the events. Determine a pattern in the events to set as the start or end of an event. Then, edit $SPLUNK_HOME/etc/system/local/props.conf, and set the necessary attributes to configure your data.

There are two ways to handle multi-line events:

  • Break the data stream into lines and reassemble into events. This method usually simplifies the configuration process, as it gives you access to several attributes that you can use to define line-merging rules. Use the LINE_BREAKER attribute to break the data stream into multiple lines. Along with this, set SHOULD_LINEMERGE=true and set your line-merging attributes (BREAK_ONLY_BEFORE, etc.) to tell Splunk how to reassemble the lines into events. If your data conforms well to the default LINE_BREAKER setting (any number of newlines and carriage returns), you don’t need to alter LINE_BREAKER. Instead, just set SHOULD_LINEMERGE=true and use the line-merging attributes to reassemble it.
  • Break the data stream directly into real events using the LINE_BREAKER feature. This might increase your indexing speed, but is somewhat more difficult to work with. If you're finding that indexing is slow and a significant amount of your data consists of multi-line events, this method can provide significant improvement. Use the LINE_BREAKER attribute with SHOULD_LINEMERGE=false.

These attributes are described below.

Linebreaking general attributes

These are the props.conf attributes that affect linebreaking:

TRUNCATE = <non-negative integer>

  • Change the default maximum line length (in bytes). Note that although this attribute is a byte measurement, Splunk rounds down line length when this attribute would otherwise land mid-character for multi-byte characters.
  • Set to 0 if you never want truncation (very long lines are, however, often a sign of garbage data).
  • Defaults to 10000 bytes.

LINE_BREAKER = <regular expression>

  • Specifies a regex that determines how the raw text stream is broken into initial events, before any line merging takes place (if specified by the SHOULD_LINEMERGE attribute, described below).
  • Defaults to ([\r\n]+), meaning data is broken into an event for each line, delimited by any number of carriage return (\r) or newline (\n) characters.
  • The regex must contain a capturing group -- a pair of parentheses that defines an identified subcomponent of the match.
  • Wherever the regex matches, Splunk considers the start of the first capturing group to be the end of the previous event, and considers the end of the first capturing group to be the start of the next event.
  • The contents of the first capturing group are discarded, and will not be present in any event. You are telling Splunk that this text comes between lines.
  • Note: You can realize a significant boost to processing speed when you use LINE_BREAKER to delimit multi-line events (as opposed to using SHOULD_LINEMERGE to reassemble individual lines into multi-line events). Consider using this method if a significant portion of your data consists of multi-line events.
  • See the props.conf specification file for more details, including information on how to use LINE_BREAKER with branched expressions.

LINE_BREAKER_LOOKBEHIND = <integer>

  • When there is leftover data from a previous raw chunk, LINE_BREAKER_LOOKBEHIND indicates the number of characters before the end of the raw chunk (with the next chunk concatenated) that Splunk applies the LINE_BREAKER regex. You might want to increase this value from its default if you are dealing with especially large or multi-line events.
  • Defaults to 100.

SHOULD_LINEMERGE = [true|false]

  • When set to true, Splunk combines several input lines into a single event, with configuration based on the attributes described in the next section.
  • Defaults to true.

Attributes that are available only when SHOULD_LINEMERGE is set to true

When SHOULD_LINEMERGE=true (the default), use these attributes to define linebreaking behavior:

BREAK_ONLY_BEFORE_DATE = [true|false]

  • When set to true, Splunk creates a new event if, and only if, it encounters a new line with a date.
  • Defaults to true.
  • Note: If DATETIME_CONFIG is set to CURRENT or NONE, this attribute is not meaningful, because in those cases, Splunk does not identify timestamps.

BREAK_ONLY_BEFORE = <regular expression>

  • When set, Splunk creates a new event if, and only if, it encounters a new line that matches the regular expression.
  • Defaults to empty.

MUST_BREAK_AFTER = <regular expression>

  • When set and the regular expression matches the current line, Splunk always creates a new event for the next input line.
  • Splunk might still break before the current line if another rule matches.
  • Defaults to empty.

MUST_NOT_BREAK_AFTER = <regular expression>

  • When set and the current line matches the regular expression, Splunk does not break on any subsequent lines until the MUST_BREAK_AFTER expression matches.
  • Defaults to empty.

MUST_NOT_BREAK_BEFORE = <regular expression>

  • When set and the current line matches the regular expression, Splunk does not break the last event before the current line.
  • Defaults to empty.

MAX_EVENTS = <integer>

  • Specifies the maximum number of input lines that will be added to any event.
  • Splunk will break after the specified number of lines are read.
  • Defaults to 256.

Examples

Specify event breaks

[my_custom_sourcetype]
BREAK_ONLY_BEFORE = ^\d+\s*$

This example instructs Splunk to divide events by assuming that any line that consists of only digits is the start of a new event. It does this for any data whose source type is set to my_custom_sourcetype.

Merge multiple lines into a single event

The following log event contains several lines that are part of the same request. The differentiator between requests is "Path". For this example, assume that all these lines need to be shown as a single event entry.

{{"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""}}

To index this multiple line event properly, use the Path differentiator in your configuration. Add the following to your $SPLUNK_HOME/etc/system/local/props.conf:

[source::source-to-break]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = Path=

This code tells Splunk to merge the lines of the event, and only break before the term Path=.

Multi-line event linebreaking and segmentation limitations

Splunk applies linebreaking and segmentation limitations to extremely large events:

  • Lines over 10,000 bytes: Splunk breaks lines over 10,000 bytes into multiple lines of 10,000 bytes each when it indexes them. It appends the field meta::truncated to the end of each truncated section. However, Splunk still groups these lines into a single event.
  • Segmentation for events over 100,000 bytes: In search results, Splunk only displays the first 100,000 bytes of an event. Segments after those first 100,000 bytes of a very long line are still searchable, however.
  • Segmentation for events over 1,000 segments: In search results, Splunk displays the first 1,000 individual segments of an event as segments separated by whitespace and highlighted on mouseover. It displays the rest of the event as raw text without interactive formatting.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around linebreaking.

This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 4.3.7 , 5.0 , 5.0.1 , 5.0.2 , 5.0.3 , 5.0.4 , 5.0.5 , 5.0.6 , 5.0.7 , 5.0.8 View the Article History for its revisions.


Comments

Hi Cneberg,

This article only applies to full instances of Splunk, or light forwarders. The universal forwarder is incapable of transforming data prior to sending it to an indexer.

Malmoore, Splunker
February 24, 2013

Does this article apply to the props.conf on the univeral forwarder, to a full splunk instance just set up to forward, or only on splunk indexers?

Cneberg
February 14, 2013

You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!