Set up streaming
A modular input can stream data to Splunk as plain text or as XML data. In the schema for the modular input, use the
<streaming_mode> tag to specify the streaming mode. Specify
simple for plain text or
xml for XML data.
For example, to specify XML data:
Simple streaming mode
Simple mode (plain text) is the default streaming mode and is similar to how Splunk treats data that is streamed from scripted inputs. In simple mode, Splunk treats the data much like it treats data read from a file. For more information on streaming from scripted inputs, refer to Scripted inputs overview in this manual.
In simple streaming mode, Splunk supports all character sets described in Configure character set encoding
XML streaming mode
With the Modular Inputs feature, new with Splunk 5.0, there is a new way to stream XML data to Splunk. With this format for streaming XML you can:
- Clearly break events without the use of special markers.
- Easily forward data in a distributed environment by arbitrarily specifying done keys.
- Easily allow a single stream of data to specify source, sourcetype, host, and index.
The format of XML streaming differs, depending on which mode your script specifies:
- one script instance per input stanza mode
- single script instance mode
In XML streaming mode, the XML stream itself must be encoded in UTF-8.
Default parameters when streaming events
Splunk provides default values for the following parameters when streaming events. If Splunk does not find a definition for these parameters in inputs.conf files, Splunk uses the default values for these parameters.
However, the default value varies, depending on whether you are using one script instance per input stanza mode or single script instance mode. The following table lists the default values for these parameters. The third column of the table lists the default values when using traditional scripted inputs.
|Parameter||One script instance per input stanza||Single script instance||Traditional scripted inputs|
(for example, myScheme://abc)
| scheme name
(for example, myScheme)
(<path> = envvar-expanded path from inputs.conf
|sourcetype||scheme name||scheme name|| exec
(or if present, the layered value of the sourcetype)
|host||Layered host for each stanza||Global default host from inputs.conf||Layered host from its stanza|
|index||Layered index for each stanza||Global default index from inputs.conf||Layered index from its stanza|
Specify the time of events in the input stream
If an input script knows the time of the event that it generates you can use the <time> tag to specify the time in the input stream. Specify the time using a UTC UNIX timestamp. Subseconds are supported (for example, <time>1330717125.125</time>).
- Note: When writing modular input scripts, it is best to specify the time of an event with the
When specifying the time of events, in
props.conf set SHOULD_LINEMERGE to false. Refer to Configure event linebreaking for more information on setting this property.
Setting SHOULD_LINEMERGE to false does the following:
- Prevents the merging of events because of a missing timestamp.
- Does not override the value set with the <time> tag with a timestamp in the event.
The following example shows how to specify time events in the input stream:
<stream> <event stanza="my_config://aaa"> <time>1330717125</time> <data>type=CCC</data> </event> <event stanza="my_config://bbb"> <time>1330717125</time> <data>type=DDD</data> </event> . . . </stream> # Modify $SPLUNK_HOME/etc/apps/myapp/default/props.conf [my_config] SHOULD_LINEMERGE = false
Streaming example (XML mode)
The streaming examples in XML mode in this section illustrate the differences between the following:
- one script instance per input stanza mode
- single script instance mode
The examples also show how you can override the default values for the following parameters:
- Note: For these examples, the introspection scheme enables XML streaming mode, as described in Define a scheme for introspection.
One script instance per input stanza mode
This example shows some example XML that a script can stream to splunkd for indexing, using one script instance per input stanza mode. In this mode, there is a separate instance of the script for each input stanza in
inputs.conf configuration files.
<stream> <event> <time>1370031029</time> <data>event_status="(0)The operation completed successfully."</data> </event> <event> <time>1370031031</time> <data>event_status="(0)The operation completed successfully."</data> </event> </stream>
In this example, the tags clearly delineate the events. This effectively line-breaks the events without any line-breaking configuration.
The values for source, sourcetype, host, and index are the default values, as described in Default parameters when streaming events. You can override the default values by including the new values in the event. The following example specifies custom values for source and index:
<stream> <event> <time>1370031035</time> <data> event_status="(0)The operation completed successfully."</data> <source>my_source</source> <index>test1</index> </event> </stream>
- Note: Subsequent events can specify new values for the source and index parameters, or simply use the default values.
Single script instance mode
This example shows some example XML that a script can stream to splunkd for indexing, using single script instance mode. In this mode, there is only a single instance of the script.
- Note: Because you are using a single instance of the script, use the stanza attribute to the <event> tag to specify the stanza for each event. Specifying the stanza attribute is not needed when streaming in one script instance per input stanza mode.
<stream> <event stanza="my_config://aaa"> <time>1370031041</time> <data> event_status="(0)The operation completed successfully."</data> <host>my_host</host> </event> </stream>
In this example, the value of stanza should be an existing stanza name from
inputs.conf that the event belongs to. If the stanza name is not present (or refers to a non-existant stanza name in the conf file) then Splunk automatically sets the parameters for source, sourcetype, host, and index.
This example overrides the default value for the host parameter.
Stream unbroken events in XML
The XML streaming examples in the previous sections use the
<data> tag to delineate, or break, separate events. However, often when you stream data to Splunk, you do not want to break events, and instead let Splunk interpret the events. You typically send unbroken data in chunks and let Splunk apply line breaking rules.
You may want to stream unbroken events either because you are streaming a known format to Splunk, or you may not know the format of the data and you want Splunk to interpret it. The S3 example in this document streams unbroken events in XML mode.
Use the <time> tag when possible
When streaming unbroken events, Splunk attempts to read timestamps from the body of the events, and break the event based on the timestamps. However, if known, the <time> tag should be provided for unbroken events. When the unbroken segments are merged, the value from the first <time> tag is used. However, it may be overridden by any timestamp extraction rules for the sourcetype.
Use the <done> tag with unbroken events
<done> tag to denote an end of a stream with unbroken events. The
<done> tag tells Splunk to flush the data from its buffer rather than wait for more data before processing it. For example, Splunk may buffer data that it has read, waiting for a newline character before processing the data. This prevents the data from being indexed until the newline character is read. If you want Splunk to index the data without the newline character, then send the
unbroken attribute to the
<event> tag. Then after you have reached the end of the data you are sending in chunks, send the
<done/> tag as indicated in the following example.
<stream> <event unbroken="1"> <data>09/08/2009 14:01:59.0398 part of the event ...</data> </event> <event unbroken="1"> <data>final part of the event</data> <done/> </event> <event unbroken="1"> <data>second event</data> <done/> </event> </stream>
When sending unbroken events:
- You can specify source, sourcetype, host, index, and stanza specifications just as you would when sending broken events.
- The script is responsible for sending a
<done/>tag. This is important for forwarders because they can't switch a stream until they see a
- When the data goes through the time extraction process, if a subset of the event is identified as a timestamp, that time becomes the event's time, and the timestamp is used for event aggregation. Refer to Configure event linebreaking for more information.
Modular inputs configuration
This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18