Splunk® Data Stream Processor

Function Reference

On April 3, 2023, Splunk Data Stream Processor reached its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.

All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.

Apply Line Break

This topic describes how to use the function in the .

Description

The Apply Line Break function breaks and merges universal forwarder events using a specified break type. Events typically come from the universal forwarder in 64KB chunks, and require additional parsing to be processed in the correctly. Use this function to configure the to break and merge these events so that it is in the format that you want.

This function performs two actions: line breaking and line merging.

  • In the line breaking phase, this function uses a specified line breaker pattern to split the incoming stream of data into separate lines.
  • In the line merging phase, this function takes the separated lines from the line breaking phase and merges them into events.

This function must be added immediately after the source function or, in the case there are multiple sources, after the union of sources. See the "Examples" section for examples on how to use this function.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<R>>
This function outputs collections of records with schema R.

Syntax

The required fields are in bold.

If linebreak_type = auto:

linebreak
linebreak_type = auto

If linebreak_type = advanced:

linebreak
linebreak_type = advanced
line_breaker = <regex>
truncate = <integer>

If linebreak_type = config:

linebreak
linebreak_type = config
props_conf = <string>

Required arguments

linebreak_type
Syntax: auto | advanced | config
Description: See the table for a description of each break type. All three of these break types perform line merging after breaking by default.
Example in Canvas View: auto
Break Type Description
Auto Break events based on the location of timestamps in the data, and merges lines after the timestamp. Creates a new event when another timestamp is detected. This setting uses built-in timestamp rules to detect timestamps.
Advanced Break events based on a custom regular expression pattern. The default pattern is [\r\n]+, which breaks data into an event for each line, delimited by any number of carriage return (\r) or newline (\n) characters.
Config Break events based on an existing line breaking props.conf configuration. See props.conf. Use this break type if you want to reuse and migrate your existing props.conf line_breaking settings to the .
props_conf
Syntax: string
Description: Required if you are using the config linebreak_type. The line breaking stanza configurations from props.conf. See configure event line breaking in the Splunk Enterprise documentation for a description of available line breaking attributes.
Example in Canvas View: MAX_EVENTS=3 MUST_NOT_BREAK_BEFORE=^.*fifth SHOULD_LINEMERGE=true BREAK_ONLY_BEFORE_DATE=false

Optional arguments

line_breaker
Syntax: Regular expression
Description: A regular expression that determines how the incoming data is broken into initial events, before line merging takes place.
Default: [\r\n]+
Example in Canvas View: [\r\n]+
truncate
Syntax: a non-negative integer
Description: The default maximum line length, in bytes. The rounds down line length when this attribute would otherwise land mid-character for multibyte characters.
Default: 10000
Example in Canvas View: 5000

Examples

Examples of common use cases follow. These examples assume that you have added the function to your pipeline.

If you are using the SPL2 Pipeline Builder, you must escape any backslash ( \ ) characters. If you are in the Canvas View, backslash characters are automatically escaped. See Using regular expressions in the Canvas View vs the SPL2 Pipeline View.

1. SPL2 Example: Reuse an existing props.conf setting that creates a new event when it encounters a new line with a date and merges the lines following the date.

This example assumes that you are in the SPL View.

| from forwarders("forwarders:all") | apply_line_breaking props_conf="BREAK_ONLY_BEFORE_DATE=true\nSHOULD_LINEMERGE=true" linebreak_type="config" |...;

Incoming event:

Event {
 body = "This is the first line. 
12/31/2017-05:43:11.325
This is the third line. 
This is the fourth line. 
Fifth. 
"
}

Outgoing events:

Event {
 body = "This is the first line."
}
Event {
  body = 12/31/2017-05:43:11.325This is the third line.This is the fourth line.Fifth." 
}

2. SPL2 Example: Break an event at every new line using a regular expression pattern.

This example assumes that you are in the SPL View, and shows a partial pipeline with two sources, Splunk DSP Firehose and the Forwarders service.

$firehose = | from splunk_firehose();
$forwarders = | from forwarders("forwarders:all");
| from $forwarders | union $statement_2 | apply_line_breaking linebreak_type="auto";

Incoming event:

Event {
 body = "This is the first line. 
12/31/2017-05:43:11.325
This is the third line. 
"
}

Outgoing events:

Event {
 body = "This is the first line."
}
Event {
  body = "12/31/2017-05:43:11.325" 
}
Event {
   body = "This is the third line."
} 

3. SPL2 Example: Break a single event into multiple events using an existing props.conf configuration

This example assumes that you are in the SPL View.

| from forwarders("forwarders:all") | apply_line_breaking props_conf="SHOULD_LINEMERGE = True\nBREAK_ONLY_BEFORE = Path=" linebreak_type="config" |...;

Incoming event:

Event {
 body = "This is first event \r\n{{"2007-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""}}\n{{"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""}}\n{{"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""}}\n"
}

Outgoing events:

Event {
 body = "This is first event"
}
Event {
 body = "{{"2007-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC&ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET, IP=209.51.249.195, Content=", ""}}{{"2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", ""}}{{"2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&CrmId=clientabc&UserId=p12345678&AccountId=&AgentHost=man&AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=&PinPGRate=&PinMenu=&", ""}}"
}

4. SPL2 Example: Break events at the timestamp and merges the lines following the timestamp until another timestamp is found

This example assumes that you are in the SPL View.

| from forwarders("forwarders:all") | apply_line_breaking linebreak_type="auto" |...;

Incoming event:

Event {
 body = "06-01-2020 11:17:29.259 -0700 INFO StatusMgr - destHost=my.host.com, destIp=44.226.233.96, destPort=9997, eventType=connect_try, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor\n06-01-2020 11:17:29.628 -0700 INFO StatusMgr - destHost=my.host.com, destIp=44.226.233.96, destPort=9997, eventType=connect_done, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor\n
"
}

Outgoing events:

Event {
 body = "06-01-2020 11:17:29.259 -0700 INFO StatusMgr - destHost=my.host.com, destIp=44.226.233.96, destPort=9997, eventType=connect_try, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor"
}
Event {
 body = "06-01-2020 11:17:29.628 -0700 INFO StatusMgr - destHost=my.host.com, destIp=44.226.233.96, destPort=9997, eventType=connect_done, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor"
}
Last modified on 25 March, 2022
Aggregate with Trigger   Apply Timestamp Extraction

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters