Correct handling of multi-line events
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Correct handling of multi-line events
Many event logs have a strict one-line-per-event format, but many don't. Splunk can look at a data sample and figure out where the event boundaries are automatically.
etc/bundles/local/props.conf
The pre-configured properties for Websphere activity logs in etc/bundles/default/props.conf are a good example. For Websphere activity logs, Splunk is configured to not automatically find event boundaries, but instead to break only before lines that start with at least five hyphens. Also, Websphere timestamps can appear far inside the event, so MAX_TIMESTAMP_LOOKAHEAD is set to 500.
[websphere_activity] #Appears in the typeahead for source_types pulldown_type = true #Do not automatically find event boundaries AUTO_LINEMERGE = False #The timestamp is within the first 500 characters of the event MAX_TIMESTAMP_LOOKAHEAD = 500 #Break if and only if the line starts with ----- BREAK_ONLY_BEFORE = ^----- TYPING_CONFIG = /etc/event-types/current/was_activity.xml
For Websphere core dumps, events are demarcated by the word NULL at the beginning of a line.
[websphere_core] #Appears in the typeahead for source_types pulldown_type = true #Do not automatically find event boundaries AUTO_LINEMERGE = False #Break if and only if the line starts with the word NULL, followed by a space BREAK_ONLY_BEFORE = ^NULL\s
Websphere trlog events, though, always begin with a timestamp. Yet they're unique enough in format that Splunk has a custom event typing configuration for them.
[websphere_trlog] #Appears in the typeahead for source_types pulldown_type = true #Break if and only if the line contains a date BREAK_ONLY_BEFORE_DATE = True #Use the special event typer TYPING_CONFIG = /etc/event-types/current/was_trlog.xml
Custom sourcetype example
This fictional example would tell Splunk to divide events in a file or stream by presuming any line that consists of all digits is the start of a new event, for any source whose source type was configured or determined by Splunk to be sourcetype::my_custom_sourcetype .
[my_custom_sourcetype] BREAK_ONLY_BEFORE = ^\d+\s*$
Real-world example
The log event:
"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""
"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""
"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""
contains several lines that are part of the same request. The differentiator between requests is "Path". The customer would like all these lines shown as one log event entry.
To have a multiple line event get index properly where the differentiator between the events is Path you will want to add the following to your props.conf:
[source::source-to-break] SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = Path=
Attributes
You may need to set one or more of these values in props.conf:
- SHOULD_LINEMERGE
Boolean
Default: True
If this attribute is set to False , Splunk will treat every line of source data as a separate event.
- MAX_EVENTS
Integer
Default: 256
The maximum number of lines to allow in a single event. Splunk will forcibly break an event at this point if it has not yet found a pattern that indicates the end of an event (or equivalently, the beginning of a new event).
- BREAK_BEFORE_DATE
Boolean
Default: True
If SHOULD_LINEMERGE is set to False , this setting is ignored.
- BREAK_ONLY_BEFORE_DATE
Boolean
Default: False
If SHOULD_LINEMERGE is False, this value is ignored.
- BREAK_ONLY_BEFORE
Regular Expression
Default: Empty
If set and it matches the current event, splunk creates a new event for the current line, and will output the current accumulation of lines as a single event.
If SHOULD_LINEMERGE is False, this value is ignored.
- MUST_BREAK_AFTER
Regular Expression
Default: There is no default set.
If SHOULD_LINEMERGE is set to True , this specifies one or more literal strings or regular expressions (separated by | ) that Splunk should always treat as the end of an event.
- MUST_BREAK_BEFORE Note This setting is deprecated.
Regular Expression
Default: ^\s*$
If SHOULD_LINEMERGE is set to True , this specifies one or more literal strings or regular expressions (separated by | ) that Splunk should always treat as the start of an event.
- MUST_NOT_BREAK_AFTER
Regular Expression
Default: There is no default set.
If SHOULD_LINEMERGE is set to True , this specifies one or more literal strings or regular expressions (separated by | ) that Splunk should never treat as the end of an event. Splunk will merge the matching line into the previous event, and then keep looking for an event boundary.
- AUTO_LINEMERGE
Boolean
Default: True
Whether or not Splunk should automatically attempt to find the boundaries between multiline events (which can be from 1 to MAX_EVENTS lines long) using internal rules it creates based on statistical analysis of the source data.
Set this attribute to False if you've configured other attributes that reliably define event boundaries. Set it to True if you want Splunk to try to figure out the event boundaries instead. If the data contains only single-line events, just set SHOULD_LINEMERGE to False and ignore all merging and breaking attributes.
If this attribute is set to True , specific line-breaking attributes such as MUST_BREAK_AFTER take precedent, as do regular expression matches. Only when those do not match does Splunk attempt to automatically demarcate events.
Specifying both first and last lines of an event
In the example below, we have log entries that are specifically marked with BEGIN and END lines.
----------------- 2006/06/21 17:06:17.848 INFO [ABCReconnectServer] ******** BEGIN ABC Reconnect Server start ****** 2006/06/21 17:06:17.951 INFO [XmlBeanDefinitionReader] Loading XML bean definitions from class path resource [com/abc/databus/local/app/databus-ctx.xml] 2006/06/21 17:06:18.120 INFO [CollectionFactory] JDK 1.4+ collections available <...> 2006/06/21 17:07:05.729 INFO [ServicesPublisher] abcAdminService registered: rmi:localhost:1198/eduAdminService 2006/06/21 17:07:05.730 INFO [ABCServerBootMgr] ...rmi services registered. 2006/06/21 17:07:05.730 INFO [ABCReconnectServer] ******** END ABC Reconnect Server start ******
To tell Splunk to use the BEGIN and END lines to demarcate events, we would add the properties shown below.
$SPLUNK_HOME/etc/bundles/local/props.conf
[my_custom_sourcetype] MUST_NOT_BREAK_AFTER = BEGIN MUST_BREAK_AFTER = END
Attributes
- MUST_BREAK_AFTER
The text string that marks the last line of the event.
- MUST_NOT_BREAK_AFTER
The text string that marks the first line of the event.
This documentation applies to the following versions of Splunk: 2.1 , 2.2 , 2.2.1 , 2.2.3 , 2.2.6 View the Article History for its revisions.