On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
Working with nested data
Certain data, like metrics data, might contain multiple levels of nested fields. In order to make this data easier to work with and parse, you might want to consider simplifying the structure of your incoming data. The comes with an arsenal of functions that you can leverage to simplify your data structure so that you can capture and extract insights from your data, even when your data is buried within multiple levels of nesting.
The following use cases show how you can work with your data in these ways:
- Flatten fields with multivalue data.
- Flatten fields with nested data.
- Extract, create, and delete a nested map.
- Extract a list of nested keys or values from a top-level field.
- Extract an element from a list.
- Extract and promote a nested field to a top-level field.
- Extract a field value in a nested structure to use as an argument.
This topic assumes familiarity with the collection (list) and map data types. If you are unfamiliar with these complex data types, see Complex data types.
Guidelines for working with nested data
The exact data transformations that you need to include in your pipeline vary depending on the specific data that you are working with and the use case that you are trying to solve. The following are some general guidelines for how to work with nested data successfully.
- If you want to do any field extractions on nested data, you should first get to know your data. Ensuring that you are familiar with the formats and patterns present in your data makes it easier to create a field extraction that accurately captures field values from it.
- If your pipeline is streaming records that use the standard event or metric event schemas, and
body
andattributes
contain nested fields, then you'll need to first cast those fields before doing any transformations or extractions on them. By default, thebody
field is a union-typed field, and theattributes
field is a map of string type to union type.
Flatten a field containing a list of JSON objects into multiple records
When a field in your record contains multiple JSON objects corresponding to different metrics, you can flatten that field into multiple records. The following example uses the MV Expand function to flatten a body
field containing an array of three metrics into three individual records.
- Prepare the
body
field so that it can be parsed.- From the Data Pipelines Canvas view, click on the + icon where you want to flatten your data, and add an Eval function to the pipeline.
- Enter the following expression in the function field to cast
body
to a list. Since list is a complex data type, we need to use ucast:body = ucast(body, "collection<map<string, any>>", null)
- Expand the values in the list-typed field
body
into separate records, one record for each value.- Click the + icon after the Eval function and add the MV Expand function to the pipeline.
- In the MV Expand function, complete the following fields:
Field Description Example field Name of the field you want to expand. You can specify only one field to expand. body limit The number of values to expand in the multivalue field array. If there are any remaining values in the array, those values are dropped. If limit = 0 or null, then the limit is treated as 1000, the maximum limit. 0
- (Optional) Click Start Preview, and then select the MV Expand function.
Here's an example of what your data might look like before and after using the MV Expand function:
SPL2 example | Data before extraction | Data after extraction |
---|---|---|
| from read_from_aws_cloudwatch_metrics("my-connection-id") | eval body = ucast(body, "collection<map<string, any>>", null) | mvexpand limit=0 body; |
{ body: [ { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "net.in", unit: "bytes/second", type: "g", value: 3000, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } } ], source_type: "aws:cloudwatch", kind: "metric", attributes: { default_unit: "", default_type: "", _splunk_connection_id: "rest_api:all" } } |
{ body: { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, source_type: "aws:cloudwatch", kind: "metric", attributes: { default_unit: "", default_type: "", _splunk_connection_id: "rest_api:all" } } { body: { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, source_type: "aws:cloudwatch", kind: "metric", attributes: { default_unit: "", default_type: "", _splunk_connection_id: "rest_api:all" } } { body: { name: "net.in", unit: "bytes/second", type: "g", value: 3000, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, source_type: "aws:cloudwatch", kind: "metric", attributes: { default_unit: "", default_type: "", _splunk_connection_id: "rest_api:all" } } |
Flatten a map or a list with nested data
Your data might contain a list of nested lists, a map of nested maps, or a combination of both. Flattening fields with such nested data can make extracting data easier. In this example, we'll use the flatten scalar function to flatten both a list and a map.
- Flatten the list or map-typed field to make extracting data easier.
- Add an Eval function and enter the following expression in the function field:
field_name = flatten(field_name)
- If you are flattening a map, you can specify an optional delimiter to separate keys in the output:
field_name = flatten(field_name, delimiter)
- Add an Eval function and enter the following expression in the function field:
- Click Start Preview, and then click the Eval function to confirm that the function works as expected. Here are some examples of what your data look like before and after flattening:
SPL2 example Data before flattening Data after flattening Notes ... | eval flattened_list = flatten(list_field)
[1, null, "foo", ["1-deep", ["2-deep"]], [], 100]
[1, null, "foo", "1-deep", "2-deep", 100]
Returns the flattened list in a new top-level field called flattened_list
.... | eval flattened_map = flatten(map_field)
{"baz": {"foo": 1, "bar": "thing"}, "quux": 3}
{"quux":3,"baz.foo":1,"baz.bar":"thing"}
Returns the flattened map in a new top-level field called flattened_map
.... | eval flattened_map = flatten(map_field, "::")
{"baz": {"foo": 1, "bar": "thing"}, "quux": 3}
{"quux":3,"baz::bar":"thing","baz::foo":1}
Returns the flattened map in a new top-level field called flattened_map
. Also, delimits the keys in the map with::
.... | eval flattened_list_with_nested_map = flatten(map_field)
[[1, 2, 3], [{"key1": {"innerkey1": "innerval1"}}]]
[1,2,3,{"key1":{"innerkey1":"innerval1"}}]
Returns the flattened lists in a new top-level field called flattened_list_with_nested_map
. Does not flatten the nested maps that are included in the original list.
Extract a nested map from one field and add it to another field
When your data first gets read into the , the body
field might contain nested key-value pairs that you want to move to a different top-level field. The following example uses the map_set scalar function to move the dimensions
key from body
into attributes
.
- Prepare the
body
field so that it can be parsed and values can be extracted from it.- From the Data Pipelines Canvas view, click on the + icon where you want to flatten your data, and add an Eval function to the pipeline.
- Enter the following expression in the function field to cast
body
to a list. Since list is a complex data type, we need to use ucast:body = ucast(body, "collection<map<string, any>>", null)
- Select the key-value pair that you want to extract. We'll move the
dimensions
map from thebody
field intoattributes
.- Click the + icon at the position on your pipeline that you want to extract data from, and add an Eval function to the pipeline.
- In the View Configurations tab of the Eval function, enter the following SPL2 expression in the function field:
attributes=map_set(attributes, "dimensions", {"InstanceId": "i-065d598370ac25b90", "Region": "us-west-1"})
- Remove the
dimensions
map from thebody
field after extracting it.- In the same Eval function, click +Add.
- Enter the following SPL2 expression in the newly added function field:
body=map_delete(body, ["dimensions"])
- In the View Configurations tab of the Eval function, click Update to update the records with your transformations.
- Click Start Preview, and then click the Eval function to confirm the functions work as expected. Here's an example of what your data might look like before and after extraction:
SPL2 example Data before extraction Data after extraction ... | eval body=ucast(body, "map<string,any>", null) | eval attributes=map_set(attributes, "foo", "bar"), body = map_delete(body, "dimensions");
{ body: { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, source_type: "aws:cloudwatch", kind: "metric", attributes: { default_unit: "", default_type: "", _splunk_connection_id: "rest_api:all" } }
{ attributes: { default_unit: "", default_type: "", _splunk_connection_id: "rest_api:all", dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, body: { name: "mem.util", unit: "gigabytes", type: "g", value: 20 }, source_type: "aws:cloudwatch", kind: "metric" }
Extract all nested keys or values in a map
When a top-level field in your data is a map of multiple key-value pairs, you can get a list of all the nested keys or all the nested values within this top-level field by using the map_keys and map_values scalar functions.
- From the Data Pipeline Canvas view, and then click the + icon at the position on your pipeline where you want to extract data from and choose Eval from the function picker.
- In the View Configurations tab of the Eval function, enter the following SPL2 expression depending on which information you want to extract:
Information to extract SPL2 expression Output Keys keys = map_keys(field_name)
Create a new keys
top-level field containing the list of keys extracted from the top-level field you pass in.Values values = map_values(field_name)
Create a new values
top-level field containing the list of values extracted from the top-level field you pass in. - In the View Configurations tab of the Eval function, click Update to update the records with the newly created field.
- Click Start Preview, and then click the Eval function to make sure it's working as expected. Here's an example of what your data might look like:
SPL2 example Data example Function output ... | eval keys = map_keys(attributes)
{ body: { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, source_type: "aws:cloudwatch", kind: "metric", attributes: { default_unit: "gigabytes", default_type: "g", _splunk_connection_id: "rest_api:all" } }
A new top-level field is added to the data schema: keys: [ "default_unit", "default_type", "_splunk_connection_id" ]
... | eval values = map_values(attributes)
A new top-level field is added to the data schema: values: [ "gigabytes", "g", "rest_api:all" ]
Extract an element from a list using bracket notation
You can extract an element from a list using bracket notation. In the following example, we'll extract the first element in the body
field.
- From the Data Pipelines Canvas view, click the + icon at the position on your pipeline where you want to extract data from, and then choose Eval from the function list.
- You must first cast the
body
field to a list. In the View Configurations tab of the Eval function, enter the following SPL2 expression in the function field:body = ucast(body, "collection<map<string, any>>", null)
- In the View Configurations tab of the Eval function, click + Add to create a new function field and enter the following SPL2 expression:
extracted_element = body[0]
- In the View Configurations tab of the Eval function, click Update to update the records with your transformations.
- Click Start Preview, and then click the Eval function to confirm the functions work as expected. Here's an example of what your data might look like before and after extraction:
SPL2 example Data before extraction Data after extraction ... | eval body = ucast(body, "collection<map<string, any>>", null), extracted_element = body[0]
{ body: [ { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "net.in", unit: "bytes/second", type: "g", value: 3000, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } } ], attributes: { default_unit: "", default_type: "g", _splunk_connection_id: "rest_api:all" } }
{ body: [ { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "net.in", unit: "bytes/second", type: "g", value: 3000, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } } ], extracted_element: { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, attributes: { default_unit: "", default_type: "g", _splunk_connection_id: "rest_api:all" } }
Extract an element from a list using scalar functions
When a top-level field in your data contains a list but you only want to work with a specific element in that list, you can extract that element if you know its index position in the list. This example uses the mvindex
scalar function under the List category to extract a record from an array in the body
field.
- From the Data Pipelines Canvas view, click the + icon at the position on your pipeline where you want to extract data from, and then choose Eval from the function list.
- Before using the
mvindex
function, thebody
field has to be cast to a list. In the View Configurations tab of the Eval function, enter the following SPL2 expression in the function field:body = ucast(body, "collection<map<string, any>>", null)
- In the View Configurations tab of the Eval function, click + Add to create a new function field and enter the following SPL2 expression the newly added function field:
extracted_element = mvindex(body, 0)
The first argument of the
mvindex
function is the name of the field from which you want to extract data. The second argument is the index indicating the position of the element you want to extract.Index numbers can be negative. -1 gets the last element in a list, -2 gets the second to last element in a list, and so on. If the index is out of range or does not exist, the function returns null.
- In the View Configurations tab of the Eval function, click Update to update the records with your transformations.
- Click Start Preview, and then click the Eval function to confirm the functions work as expected. Here's an example of what your data might look like before and after extraction:
SPL2 example Data before extraction Data after extraction ... | eval body = ucast(body, "collection<map<string, any>>", null), extracted_element = mvindex(body, 0)
{ body: [ { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "net.in", unit: "bytes/second", type: "g", value: 3000, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } } ], attributes: { default_unit: "", default_type: "g", _splunk_connection_id: "rest_api:all" } }
{ body: [ { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "mem.util", unit: "gigabytes", type: "g", value: 20, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, { name: "net.in", unit: "bytes/second", type: "g", value: 3000, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } } ], extracted_element: { name: "cpu.util", unit: "percent", type: "g", value: 45, dimensions: { InstanceId: "i-065d598370ac25b90", Region: "us-west-1" } }, attributes: { default_unit: "", default_type: "g", _splunk_connection_id: "rest_api:all" } }
Extract a field value in a nested structure to use as an argument
Some fields in your data might contain nested values that you want to extract and use as a function argument. In this example, we'll route Splunk universal forwarder data to different Splunk indexes based on the value of the nested index
field in attributes
. To do this, we'll use the spath scalar function.
- Configure the Send to a Splunk Index with Batching function to route data depending on the nested
index
field inattributes
.- Click on the + icon, and add the Send to a Splunk Index with Batching function to the pipeline.
- Select the connection_id that you want to use.
- Enter the following SPL2 expression in the index field:
cast(spath(attributes, "index"), "string")
- In the default_index field, enter "main".
- (Optional) Configure the Send to a Splunk Index with Batching function for optimal throughput.
- In the splunk_parameters field, enter the following values:
- hec-gzip-compression = true
- hec-token-validation = true
- hec-enable-ack = false
- async = true
- In the splunk_parameters field, enter the following values:
- Because sink functions do not show preview data, you should activate your pipeline and navigate to Splunk Enterprise or Splunk Cloud Platform environments to see if your events are being routed correctly.
Promote a nested field to a top-level field
Some fields in your data might contain nested values that you want to extract and assign to a top-level field. For example, the attributes
field of your data contains an index
key whose value you want to extract in order to format your records so that they match the Splunk HEC metric JSON format. You can extract the value of the index
key with the map_get
scalar function under the Map category.
- From the Data Pipelines Canvas view, click the + icon at the position on your pipeline where you want to extract data from, and then choose To Splunk JSON from the function picker.
- In the View Configurations tab of the To Splunk JSON function, enter the following SPL2 expression in the index field:
cast(map_get(attributes, "index"), "string")
The SPL2 expression also casts the extracted value to string so that it can be used as an input for the
To Splunk JSON
function. See Casting and To Splunk JSON in the Function Reference for more details. - In the View Configurations tab of the To Splunk JSON function, toggle the keep_attributes button if you want the attributes map to be available as index-extracted fields in the Splunk platform.
- Click Start Preview, and then click the To Splunk JSON function to confirm that the function works as expected. Here's an example of what your data might look like before and after extraction:
SPL2 example Data before extraction Data after extraction ... | to_splunk_json index = cast(map_get(attributes, "index") keep_attributes=true
{ host: "myhost", source: "mysource", source_type: "mysourcetype", kind: "metric", body: [ { name: "Hello World" } ], attributes: { atrr1: "val1", index: "myindex" } }
{ json: "{"event":"Hello World", "source":"mysource", "sourcetype":"mysourcetype", "host":"myhost", "index":"myindex", "fields":{"attr1":"val1"}}" }
See also
Extracting fields in events data | Summarize records with the stats function |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!