Splunk Cloud Platform

Getting Data In

Configure event line breaking

Some events consist of more than one line. The Splunk platform handles most multiline events correctly by default. If you have multiline events that the Splunk platform doesn't handle properly, you can configure it to change its line breaking behavior.

If you use Splunk Cloud Platform, you can do the following:

  • Forward any data where you need to configure event line breaking, because there is no way to configure event line breaking in the Splunk Web interface. You can use a heavy forwarder to break incoming data into lines and subsequently merge them as you want into events prior to sending data to your Splunk Cloud Platform instance.
  • If you have access to the Edge Processor solution, you can use Edge Processors to configure event line breaking, see Using source types to break and merge data in Edge Processors and About the Edge Processor solution in the Splunk Use Edge Processors manual.

If you use Splunk Enterprise, you can configure the settings and follow the procedures in this topic on any instance that indexes the incoming data stream.

How the Splunk platform determines event boundaries

The Splunk platform determines event boundaries in two phases:

  1. Line breaking, which uses the LINE_BREAKER setting to split the incoming stream of data into separate lines. By default, the LINE_BREAKER value is any sequence of newlines and carriage returns. In regular expression format, this is represented as the following string: ([\r\n]+). You don't normally need to adjust this setting, but in cases where it's necessary, you must configure it in the props.conf configuration file on the forwarder that sends the data to Splunk Cloud Platform or a Splunk Enterprise indexer. The LINE_BREAKER setting expects a value in regular expression format.
  2. Line merging, which uses the SHOULD_LINEMERGE setting to merge previously separated lines into events. By default, the Splunk platform performs line merging, and the value for SHOULD_LINEMERGE is true. You don't normally need to adjust this setting, but in cases where it is necessary, you must configure this setting in the props.conf configuration file on the forwarder that sends the data to Splunk Cloud Platform. If you configure the Splunk platform to not perform line merging by setting the SHOULD_LINEMERGE attribute to false, then the platform splits the incoming data into lines according to what the LINE_BREAKER setting determines.

Line breaking is relatively efficient for the Splunk platform, while line merging is relatively slow. Using the LINE_BREAKER setting can produce the results you want in the line breaking phase. This is valuable if a significant amount of your data consists of multiline events.

There are additional configuration settings that help you break your incoming data stream into events, such as line-breaking.

How to configure event boundaries

Many event logs have a strict one-line-per-event format, but others don't. The Splunk platform can often recognize the event boundaries, but if event boundary recognition doesn't occur, or happens incorrectly, you can set custom rules in the props.conf configuration file to establish event boundaries.

Requirements for configuring event boundaries

Before you attempt to configure event boundaries for your events, confirm that you have the following:

  • An understanding of regular expressions. The LINE_BREAKER setting uses a regular expression to determine what the boundary of an event is.
  • One of the following, depending on whether you use Splunk Cloud Platform or Splunk Enterprise:
    • A heavy forwarder that has been configured to send data to your Splunk Cloud Platform instance. You can download the Splunk Cloud Platform universal forwarder credentials package that comes with your Splunk Cloud Platform instance and install it on a Splunk heavy forwarder.
    • A Splunk Enterprise indexer or heavy forwarder, if you use Splunk Enterprise.
  • A file that represents the data stream where you want to configure custom line breaking.

Edit the props.conf configuration file to configure multiline events

  1. Examine the file that you want to index to determine its event format.
  2. In the file, look for a pattern in the events to set as the start or end of an event.
  3. Using a text editor, on the forwarder you have configured to send data to Splunk Cloud Platform, edit the $SPLUNK_HOME/etc/system/local/props.conf configuration file.
  4. In the props.conf configuration file, add the necessary line breaking and line merging settings to configure the forwarder to perform the correct line breaking on your incoming data stream.
  5. Save the file and close it.
  6. Restart the forwarder to commit the changes.

There are two ways to handle multiline events:

  • Break and reassemble the data stream into events.
  • Break the data stream directly into real events with the LINE_BREAKER setting.

Break and reassemble the data stream into events

This method oftentimes simplifies the configuration process, as it gives you access to several settings that you can use to define line-merging rules.

You must perform these steps on the heavy forwarder that you have designated to send data to your Splunk Cloud Platform instance.

  1. On the forwarder that is to send data to your Splunk Cloud Platform instance, use a text editor to open $SPLUNK_HOME/etc/system/local/props.conf for editing.
  2. In this file, specify a stanza in the props.conf configuration file that represents the stream of data you want to break and reassemble into events.
  3. In that stanza, configure the LINE_BREAKER setting with a regular expression that breaks the data stream into multiple lines.
  4. Add the SHOULD_LINEMERGE setting to the stanza, and set its value to true.
  5. Configure additional line-merging settings, such as BREAK_ONLY_BEFORE and others, to specify how the forwarder is to reassemble the lines into events. For more information on the line-merging settings, see Attributes that apply only when the SHOULD_LINEMERGE setting is true later in this topic.

If your data conforms well to the default LINE_BREAKER value, which is any number of newlines and carriage returns, you don't need to change the LINE_BREAKER setting. Instead, set SHOULD_LINEMERGE=true and use the line-merging settings to reassemble the data.

Break the data stream directly into real events with the LINE_BREAKER setting

Using the LINE_BREAKER setting to define event boundaries might increase your indexing speed, but is somewhat more difficult to work with. If you find that indexing is slow and a significant amount of your data consists of multiline events, this method can provide significant improvement.

  1. Specify a stanza in props.conf that represents the stream of data you want to break directly into events.
  2. Under this stanza, configure the LINE_BREAKER setting with a regular expression that matches the boundary that you want to use to break up the raw data stream into events.
  3. Add the SHOULD_LINEMERGE setting, and configure it to false.

Line breaking general settings

The following tables list the settings in the props.conf file that affect line breaking.

Attribute Description Default
TRUNCATE = <non-negative integer> Changes the default maximum line length, in bytes. Although this setting is a byte measurement, the Splunk platform rounds down line length when this attribute would otherwise land mid-character for multibyte characters.

Set to 0 if you never want truncation. However, very long lines are often a sign of garbage data.

10000
LINE_BREAKER = <regular expression> A regular expression that determines how the Splunk platform breaks the raw text stream into initial events, before any line merging takes place. This setting is dependent upon the SHOULD_LINEMERGE setting, described later.

The expression must contain a capturing group, which is a pair of parentheses that defines an identified subcomponent of the match.

Wherever the expression matches, the Splunk platform considers the start of the first capturing group to be the end of the previous event, and considers the end of the first capturing group to be the start of the next event.

The platform discards the contents of the first capturing group. This content will not be present in any event, as the platform considers this text to come between lines.

You can realize a significant boost to processing speed when you use the LINE_BREAKER setting to delimit multiline events as opposed to using SHOULD_LINEMERGE to reassemble individual lines into multiline events. Consider using this method if a significant portion of your data consists of multiline events.

See the props.conf specification file for information on how to use LINE_BREAKER with branched expressions and additional information.

([\r\n]+) The Splunk platform breaks data into an event for each line, delimited by any number of carriage return (\r) or newline (\n) characters.
LINE_BREAKER_LOOKBEHIND = <integer> When there is leftover data from a previous raw chunk, LINE_BREAKER_LOOKBEHIND indicates the number of characters before the end of the raw chunk, with the next chunk concatenated, where the Splunk platform applies the LINE_BREAKER regular expression. You might want to increase this value from its default if you are dealing with especially large or multiline events. 100
SHOULD_LINEMERGE = [true|false] When set to true, the Splunk platform combines several input lines into a single event, with configuration based on the settings described in the next section. true

Attributes that apply only when the SHOULD_LINEMERGE setting is true

When you set SHOULD_LINEMERGE to the default of true, use these additional settings to define line breaking behavior.

Attribute Description Default
BREAK_ONLY_BEFORE_DATE = [true|false] When set to true, the Splunk platform creates a new event if it encounters a new line with a date. true

If you configure the DATETIME_CONFIG setting to CURRENT or NONE, this attribute is not meaningful, because in those cases, the Splunk platform doesn't identify timestamps.

BREAK_ONLY_BEFORE = <regular expression> When set, the Splunk platform creates a new event if it encounters a new line that matches the regular expression. empty string
MUST_BREAK_AFTER = <regular expression> When set, and the regular expression matches the current line, the Splunk platform always creates a new event for the next input line. The platform might still break before the current line if another rule matches. empty string
MUST_NOT_BREAK_AFTER = <regular expression> When set, and the current line matches the regular expression, the Splunk platform doesn't break on any subsequent lines until the MUST_BREAK_AFTER expression matches. empty string
MUST_NOT_BREAK_BEFORE = <regular expression> When set and the current line matches the regular expression, the Splunk platform doesn't break the last event before the current line. empty string
MAX_EVENTS = <integer> Specifies the maximum number of input lines that the Splunk platform adds to any event. The software breaks the event after it reads the specified number of lines. 256 lines

Examples of configuring event line breaking

Specify event breaks

The following example configures the Splunk platform to identify any line that consists of only digits as the start of a new event for any data whose source type is set to my_custom_sourcetype.

[my_custom_sourcetype]
BREAK_ONLY_BEFORE = ^\d+\s*$

Merge multiple lines into a single event

The following log event contains several lines that are part of the same request. The differentiator between requests is "Path".

{{"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""}}
{{"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""}}

To index this multiline event properly, use the Path differentiator in your configuration. Add the following to your $SPLUNK_HOME/etc/system/local/props.conf file.

[source::source-to-break]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = Path=

This code configures the Splunk platform to merge the lines of the event, and only break before the term Path=.

Multiline event line breaking and segmentation limitations

The Splunk platform applies line breaking and segmentation limitations to extremely large events:

Limitation Description
Events over MAX_EVENTS lines If the platform encounters a multiline event that exceeds the number of lines that you specified in MAX_EVENTS, it breaks the event at that limit, sets the BREAK_ONLY_BEFORE_DATE setting to false if it is true, and then drops any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. This can result in events not being line broken as you would expect. To work around the problem, you can raise the MAX_EVENTS setting, but you might get better results by changing the SHOULD_LINEMERGE setting to false and by specifying the event boundary with the LINE_BREAKER setting.
Lines that exceed 10,000 bytes in length. The Splunk platform uses the LINE_BREAKER and TRUNCATE settings to evaluate and break events over 10kB into multiple lines of 10kB each. It adds the index time field meta::truncated. If you have also configured SHOULD_LINEMERGE to true, the platform evaluates any additional event data using the props.conf rules until it can create a complete event.
Segmentation for events over 100,000 bytes In search results, Splunk Web displays the first 100,000 bytes of an event. Segments after those first 100,000 bytes of a very long line are still searchable, however.
Segmentation for events over 1,000 segments In search results, Splunk Web displays the first 1,000 segments of an event as segments separated by whitespace and highlighted on mouseover. It displays the rest of the event as raw text without interactive formatting.
Last modified on 26 March, 2024
Configure character set encoding   Configure event timestamps

This documentation applies to the following versions of Splunk Cloud Platform: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters