Formatting event data in DSP
For the Write to Splunk Enterprise and Write to Index source functions to properly transform your records into HEC event JSON format, you must have at least one of the following:
- a non-empty
body
field - at least one valid entry in the
attributes
map - at least one valid non-empty top level field that is not part of the DSP events schema
If none of the above are true, then your records are dropped by the sink functions.
Because you can add custom top-level fields to your event schema in DSP and DSP supports records of varying schemas, your data can be formatted in a way that is incompatible with Splunk HEC. Use the following diagram and examples as a guide for how to format your data so that it is indexed appropriately.
Top-level fields and attribute entries that start with an underscore (_index) are ignored when creating HEC event JSON.
DSP event field | HEC event JSON | Type | Notes |
---|---|---|---|
body | event | any, except boolean
|
The "raw" event indexed in Splunk Enterprise. Best practices are to have a valid, non-empty body field in your DSP record. If your records are either missing the body field, of type boolean or an empty value then:
|
sourcetype or source_type | sourcetype | string | If not present, no sourcetype is included in the HEC event JSON, and Splunk Enterprise uses the default sourcetype httpevent . If both sourcetype and source_type fields are present, the sourcetype field is used unless the sourcetype is empty or null.
|
timestamp | time | long integer | The Data Stream Processor uses Unix epoch time in milliseconds. Your timestamp is automatically converted to Splunk epoch time format <sec>.<ms> . If blank or negative, time is set to now.
|
source | source | string | If not present, Splunk HEC uses the default http:test source.
|
host | host | string | The host value to assign to the event data. This is typically the hostname of the client from which you're sending data. |
attributes | fields | map<string, any> | Most entries in the DSP attributes map are directly mapped into the HEC event JSON fields object and be available as index-extracted fields in Splunk Enterprise. Entries that are not included in fields include:
any key that starts with underscore (_time), or any of the Splunk HEC metadata fields |
id | N/A | string | A DSP event field ignored by HEC. |
kind | N/A | string | A DSP event field ignored by HEC. |
nanos | N/A | integer | A DSP event field ignored by HEC. |
any custom fields | fields | any | All custom top-level fields that are not part of the DSP data pipeline events schema and that don't start with an underscore or have a non-empty value are mapped to the HEC fields JSON object for index-extraction. |
N/A | index | string | To set the index in HEC event JSON, you must pass the index name as an argument in the Write to Splunk Enterprise or Write to Index functions. If no index is selected, your data is sent to the default index associated with your HEC token. |
Example 1: The event has a non-empty body field
DSP event:
Event{body="mybody", host="myhost", attributes=null, source_type="mysourcetype", id="id12345", source=null, timestamp=1234567890012}
HEC event JSON:
{"event":"mybody", "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012"}
Explanation: The value of the DSP event body field is put as the HEC event JSON event
field to be indexed. The timestamp value is converted to Splunk epoch time format as a string.
Example 2: The event has both body and attributes fields
DSP event:
Event{body="mybody", host="myhost", attributes={attr1="val1", attr2="val2"}, source_type="mysourcetype", id="id12345", source=null, timestamp=1234567890012}
HEC event JSON:
{"event":"mybody", "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012", "fields":{"attr1":"val1", "attr2":"val2"}}
Explanation: DSP attributes are mapped to HEC event JSON fields, a catch-all for additional metadata in the HEC event.
Example 3: The event has attributes, an empty body, and a custom field
DSP event:
Event{host="myhost", attributes={level="INFO", category=["prod", "test"]}, source_type="mysourcetype", id="id12345", source=null, body="", timestamp=1234567890012, newfield="newval"}
HEC event JSON:
{"event":{}, "fields": {"level":"INFO","category":["prod","test"],"newfield":"newval"}, "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012"}
Explanation: The Splunk Enterprise HEC endpoint doesn't support empty event fields, so the empty body is converted to an empty JSON object. The fields in attributes and the custom field newfield get added to the fields entry. You can search for index-extracted fields in Splunk Enterprise using this search:
search newfield::newval
Example 4: The event contains only custom top-level fields
DSP event:
Event{value=12345, level="INFO"}
HEC event JSON:
{"event":{}, "fields": {"value":12345,"newfield":"INFO"}, "time":"1567112419.503"}
Explanation: The Splunk Enterprise HEC endpoint doesn't support empty event fields, so the empty body is converted to an empty JSON object. Custom top-level fields are mapped into the fields
part of the HEC event JSON. A timestamp of "now" is added, because there is no timestamp associated with the event. These are index-extracted fields that can be queried out of Splunk Enterprise using this search:
search value::12345 AND newfield::INFO
Example 5: The event contains a map object in body
DSP event:
Event{body={key1="value1", foo="bar"}, kind="event", host="myhost", source_type="mysourcetype", id="id12345", source=null, timestamp=1234567890012, attributes={attr1="val1", attr2="val2"}}
HEC event JSON:
{"event":{"key1":"value1","foo":"bar"}, "fields":{"attr1":"val1","attr2":"val2"}, "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012"}
Explanation: The DSP body field can be of any datatype and is mapped to its appropriate JSON representation into the event HEC JSON field.
Example 6: The event attributes field is a nested map
DSP event:
Event{nanos=null, kind=null, host=myhost, attributes={foo={bar="baz"}}, source_type="mysourcetype", id="id12345", source=null, body="mybody", timestamp=1234567890012}
HEC event JSON:
{"event":"mybody", "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012", "fields":{"foo.bar":"baz"}}
Explanation: When the attributes in your DSP event is a nested map, they get translated into JSON objects with composite keys in fields of the HEC event JSON.
Example 7: The event schema contains index as a top-level field
DSP event:
Event{myint=13, body="mybody", index="index123", host="myhost", sourcetype="mysourcetype", attributes={attr1="val1"}, time=1234567890012, id="id12345"}
HEC event JSON:
{"event":"mybody", "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012", "fields":{"myint":13, "attr1":"val1"}, "index":"index123"}
{"event":"mybody", "sourcetype":"mysourcetype", "host":"myhost", "time":"1234567890.012", "fields":{"myint":13, "attr1":"val1"}, "index":"index-from-function-arg"}
Explanation: Because index
is not part of the DSP event schema, the Data Stream Processor treats it as a custom top-level field and ignores the field entirely when creating the HEC event JSON. Instead, you must pass the index as an argument to the Write to Splunk Enterprise or Write to Index functions.
In the first HEC event JSON example, get("index");
was passed in as a function argument in the Write to Splunk Enterprise or Write to Index functions.
In the second HEC event JSON example, the string literal index-from-function-arg
was passed in as the function argument.
If you are using batching and you want to route your data to different indexes, see Optimize performance for sending to Splunk Enterprise.
Example 8: The event timestamp field is a string of numbers
DSP event:
Event{sourcetype="mysourcetype", timestamp="123456789012", body="mybody"}
HEC event JSON:
{"event":"mybody","sourcetype":"mysourcetype","time":"1566245454.551", "fields":{"timestamp":"123456789012"}}
Explanation: The Data Stream Processor timestamps are in milliseconds since Unix epoch format of type long. If your timestamp has some other format, for example string, the timestamp is treated as an unknown field and put into the fields
JSON object. The timestamp assigned in the HEC event JSON is event processing time.
Example 9: The event body is a byte array
DSP event:
Event{sourcetype="mysourcetype", body=java.nio.HeapByteBuffer[], timestamp=1234}
HEC event JSON:
{"event":"dGVzdC1ib2R5", "sourcetype":"mysourcetype", "time":"1.234"}
Explanation: If the value of body
is a byte array, it will be base64-encoded in Splunk Enterprise ("dGVzdC1ib2R5" is the base64 encoding of body
as bytes). Use the to-string function to convert your byte arrays to strings before sending your data to Splunk Enterprise for indexing.
Create a connection to the Splunk platform in DSP | Formatting metrics data |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0
Feedback submitted, thanks!