Splunk® Data Stream Processor

Use the Data Stream Processor

DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Working with nested data

Certain data, like metrics data, might contain multiple levels of nested fields. In order to make this data easier to work with and parse, you might want to consider simplifying the structure of your incoming data. The comes with an arsenal of functions that you can leverage to simplify your data structure so that you can capture and extract insights from your data, even when your data is buried within multiple levels of nesting.

The following use cases show how you can work with your data in these ways:

  • Flatten fields with multivalue data.
  • Flatten fields with nested data.
  • Extract, create, and delete a nested map.
  • Extract a list of nested keys or values from a top-level field.
  • Extract an element from a list.
  • Extract and promote a nested field to a top-level field.
  • Extract a field value in a nested structure to use as an argument.

This topic assumes familiarity with the collection (list) and map data types. If you are unfamiliar with these complex data types, see Complex data types.

Guidelines for working with nested data

The exact data transformations that you need to include in your pipeline vary depending on the specific data that you are working with and the use case that you are trying to solve. The following are some general guidelines for how to work with nested data successfully.

  • If you want to do any field extractions on nested data, you should first get to know your data. Ensuring that you are familiar with the formats and patterns present in your data makes it easier to create a field extraction that accurately captures field values from it.
  • If your pipeline is streaming records that use the standard event or metric event schemas, and body and attributes contain nested fields, then you'll need to first cast those fields before doing any transformations or extractions on them. By default, the body field is a union-typed field, and the attributes field is a map of string type to union type.

Flatten a field containing a list of JSON objects into multiple records

When a field in your record contains multiple JSON objects corresponding to different metrics, you can flatten that field into multiple records. The following example uses the MV Expand function to flatten a body field containing an array of three metrics into three individual records.

  1. Prepare the body field so that it can be parsed.
    1. From the Data Pipelines Canvas view, click on the + icon where you want to flatten your data, and add an Eval function to the pipeline.
    2. Enter the following expression in the function field to cast body to a list. Since list is a complex data type, we need to use ucast:
      body = ucast(body, "collection<map<string, any>>", null)
  2. Expand the values in the list-typed field body into separate records, one record for each value.
    1. Click the + icon after the Eval function and add the MV Expand function to the pipeline.
    2. In the MV Expand function, complete the following fields:
      Field Description Example
      field Name of the field you want to expand. You can specify only one field to expand. body
      limit The number of values to expand in the multivalue field array. If there are any remaining values in the array, those values are dropped. If limit = 0 or null, then the limit is treated as 1000, the maximum limit. 0
  3. (Optional) Click Start Preview, and then select the MV Expand function.

Here's an example of what your data might look like before and after using the MV Expand function:

SPL2 example Data before extraction Data after extraction
 | from read_from_aws_cloudwatch_metrics("my-connection-id") | eval body = ucast(body, "collection<map<string, any>>", null) | mvexpand limit=0 body;
{
   body: 
   [
       { 
            name: "cpu.util",
            unit: "percent",
            type: "g",
            value: 45,
            dimensions: {
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        {  
            name: "mem.util",
            unit: "gigabytes",
            type: "g",
            value: 20,
            dimensions: {
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        {
            name: "net.in",
            unit: "bytes/second",
            type: "g",
            value: 3000,
            dimensions: { 
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        }
    ],
    source_type: "aws:cloudwatch",
    kind: "metric",
    attributes: {  
        default_unit: "",
        default_type: "",
        _splunk_connection_id: "rest_api:all"
    }
}
{ 
    body: { 
        name: "cpu.util",
        unit: "percent",
        type: "g",
        value: 45,
        dimensions: { 
            InstanceId: "i-065d598370ac25b90",
            Region: "us-west-1"
        }
    },
    source_type: "aws:cloudwatch",
    kind: "metric",
    attributes: { 
        default_unit: "",
        default_type: "",
        _splunk_connection_id: "rest_api:all"
    }
}
{ 
    body: { 
        name: "mem.util",
        unit: "gigabytes",
        type: "g",
        value: 20,
        dimensions: { 
            InstanceId: "i-065d598370ac25b90",
            Region: "us-west-1"
        }
    },
    source_type: "aws:cloudwatch",
    kind: "metric",
    attributes: { 
        default_unit: "",
        default_type: "",
        _splunk_connection_id: "rest_api:all"
    }
}
{ 
    body: { 
        name: "net.in",
        unit: "bytes/second",
        type: "g",
        value: 3000,
        dimensions: { 
            InstanceId: "i-065d598370ac25b90",
            Region: "us-west-1"
        }
    },
    source_type: "aws:cloudwatch",
    kind: "metric",
    attributes: { 
        default_unit: "",
        default_type: "",
        _splunk_connection_id: "rest_api:all"
    }
}

Flatten a map or a list with nested data

Your data might contain a list of nested lists, a map of nested maps, or a combination of both. Flattening fields with such nested data can make extracting data easier. In this example, we'll use the flatten scalar function to flatten both a list and a map.

  1. Flatten the list or map-typed field to make extracting data easier.
    1. Add an Eval function and enter the following expression in the function field:
      field_name = flatten(field_name)
    2. If you are flattening a map, you can specify an optional delimiter to separate keys in the output:
      field_name = flatten(field_name, delimiter)
  2. Click Start Preview, and then click the Eval function to confirm that the function works as expected. Here are some examples of what your data look like before and after flattening:
    SPL2 example Data before flattening Data after flattening Notes
    ... | eval flattened_list = flatten(list_field)
    [1, null, "foo", ["1-deep", ["2-deep"]], [], 100] [1, null, "foo", "1-deep", "2-deep", 100] Returns the flattened list in a new top-level field called flattened_list.
     ... | eval flattened_map = flatten(map_field)
    {"baz": {"foo": 1, "bar": "thing"}, "quux": 3} {"quux":3,"baz.foo":1,"baz.bar":"thing"} Returns the flattened map in a new top-level field called flattened_map.
    ... | eval flattened_map = flatten(map_field, "::")
    {"baz": {"foo": 1, "bar": "thing"}, "quux": 3} {"quux":3,"baz::bar":"thing","baz::foo":1} Returns the flattened map in a new top-level field called flattened_map. Also, delimits the keys in the map with ::.
     ... | eval flattened_list_with_nested_map = flatten(map_field)
    [[1, 2, 3], [{"key1": {"innerkey1": "innerval1"}}]] [1,2,3,{"key1":{"innerkey1":"innerval1"}}] Returns the flattened lists in a new top-level field called flattened_list_with_nested_map. Does not flatten the nested maps that are included in the original list.

Extract a nested map from one field and add it to another field

When your data first gets read into the , the body field might contain nested key-value pairs that you want to move to a different top-level field. The following example uses the map_set scalar function to move the dimensions key from body into attributes.

  1. Prepare the body field so that it can be parsed and values can be extracted from it.
    1. From the Data Pipelines Canvas view, click on the + icon where you want to flatten your data, and add an Eval function to the pipeline.
    2. Enter the following expression in the function field to cast body to a list. Since list is a complex data type, we need to use ucast:
      body = ucast(body, "collection<map<string, any>>", null)
  2. Select the key-value pair that you want to extract. We'll move the dimensions map from the body field into attributes.
    1. Click the + icon at the position on your pipeline that you want to extract data from, and add an Eval function to the pipeline.
    2. In the View Configurations tab of the Eval function, enter the following SPL2 expression in the function field:
      attributes=map_set(attributes, "dimensions", {"InstanceId": "i-065d598370ac25b90", "Region": "us-west-1"})
  3. Remove the dimensions map from the body field after extracting it.
    1. In the same Eval function, click +Add.
    2. Enter the following SPL2 expression in the newly added function field:
      body=map_delete(body, ["dimensions"])
  4. In the View Configurations tab of the Eval function, click Update to update the records with your transformations.
  5. Click Start Preview, and then click the Eval function to confirm the functions work as expected. Here's an example of what your data might look like before and after extraction:
    SPL2 example Data before extraction Data after extraction
    ... | eval body=ucast(body, "map<string,any>", null) | eval attributes=map_set(attributes, "foo", "bar"), body = map_delete(body, "dimensions");
    { 
        body: { 
            name: "mem.util",
            unit: "gigabytes",
            type: "g",
            value: 20,
            dimensions: {
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        source_type: "aws:cloudwatch",
        kind: "metric",
        attributes: { 
            default_unit: "",
            default_type: "",
            _splunk_connection_id: "rest_api:all"
        }
    }
    {  
        attributes: { 
            default_unit: "",
            default_type: "",
            _splunk_connection_id: "rest_api:all",
            dimensions: {  
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        body: {  
            name: "mem.util",
            unit: "gigabytes",
            type: "g",
            value: 20
        },
        source_type: "aws:cloudwatch",
        kind: "metric"
    }

Extract all nested keys or values in a map

When a top-level field in your data is a map of multiple key-value pairs, you can get a list of all the nested keys or all the nested values within this top-level field by using the map_keys and map_values scalar functions.

  1. From the Data Pipeline Canvas view, and then click the + icon at the position on your pipeline where you want to extract data from and choose Eval from the function picker.
  2. In the View Configurations tab of the Eval function, enter the following SPL2 expression depending on which information you want to extract:
    Information to extract SPL2 expression Output
    Keys
    keys = map_keys(field_name)
    Create a new keys top-level field containing the list of keys extracted from the top-level field you pass in.
    Values
    values = map_values(field_name)
    Create a new values top-level field containing the list of values extracted from the top-level field you pass in.
  3. In the View Configurations tab of the Eval function, click Update to update the records with the newly created field.
  4. Click Start Preview, and then click the Eval function to make sure it's working as expected. Here's an example of what your data might look like:
    SPL2 example Data example Function output
     ... | eval keys = map_keys(attributes)
    { 
        body: { 
            name: "mem.util",
            unit: "gigabytes",
            type: "g",
            value: 20,
            dimensions: {
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        source_type: "aws:cloudwatch",
        kind: "metric",
        attributes: { 
            default_unit: "gigabytes",
            default_type: "g",
            _splunk_connection_id: "rest_api:all"
        }
    }
    A new top-level field is added to the data schema:
    keys: [
           "default_unit",
           "default_type",
           "_splunk_connection_id"
        ]
    
     ... | eval values = map_values(attributes)
    A new top-level field is added to the data schema:
    values: [
           "gigabytes",
           "g",
           "rest_api:all"
        ]
    

Extract an element from a list using bracket notation

You can extract an element from a list using bracket notation. In the following example, we'll extract the first element in the body field.

  1. From the Data Pipelines Canvas view, click the + icon at the position on your pipeline where you want to extract data from, and then choose Eval from the function list.
  2. You must first cast the body field to a list. In the View Configurations tab of the Eval function, enter the following SPL2 expression in the function field:
    body = ucast(body, "collection<map<string, any>>", null)
  3. In the View Configurations tab of the Eval function, click + Add to create a new function field and enter the following SPL2 expression:
    extracted_element = body[0]
  4. In the View Configurations tab of the Eval function, click Update to update the records with your transformations.
  5. Click Start Preview, and then click the Eval function to confirm the functions work as expected. Here's an example of what your data might look like before and after extraction:
    SPL2 example Data before extraction Data after extraction
    ... | eval body = ucast(body, "collection<map<string, any>>", null), extracted_element = body[0]
    {
        body: [
            {
                name: "cpu.util",
                unit: "percent",
                type: "g",
                value: 45,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "mem.util",
                unit: "gigabytes",
                type: "g",
                value: 20,
                dimensions: { 
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "net.in",
                unit: "bytes/second",
                type: "g",
                value: 3000,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            }
        ],
        attributes: {
            default_unit: "",
            default_type: "g",
            _splunk_connection_id: "rest_api:all"
        }
    }
    {
        body: [
            {
                name: "cpu.util",
                unit: "percent",
                type: "g",
                value: 45,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "mem.util",
                unit: "gigabytes",
                type: "g",
                value: 20,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "net.in",
                unit: "bytes/second",
                type: "g",
                value: 3000,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            }
        ],
        extracted_element: {
            name: "cpu.util",
            unit: "percent",
            type: "g",
            value: 45,
            dimensions: {
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        attributes: {
            default_unit: "",
            default_type: "g",
            _splunk_connection_id: "rest_api:all"
        }
    }
    

Extract an element from a list using scalar functions

When a top-level field in your data contains a list but you only want to work with a specific element in that list, you can extract that element if you know its index position in the list. This example uses the mvindex scalar function under the List category to extract a record from an array in the body field.

  1. From the Data Pipelines Canvas view, click the + icon at the position on your pipeline where you want to extract data from, and then choose Eval from the function list.
  2. Before using the mvindex function, the body field has to be cast to a list. In the View Configurations tab of the Eval function, enter the following SPL2 expression in the function field:
    body = ucast(body, "collection<map<string, any>>", null)
  3. In the View Configurations tab of the Eval function, click + Add to create a new function field and enter the following SPL2 expression the newly added function field:
    extracted_element = mvindex(body, 0)

    The first argument of the mvindex function is the name of the field from which you want to extract data. The second argument is the index indicating the position of the element you want to extract.

    Index numbers can be negative. -1 gets the last element in a list, -2 gets the second to last element in a list, and so on. If the index is out of range or does not exist, the function returns null.

  4. In the View Configurations tab of the Eval function, click Update to update the records with your transformations.
  5. Click Start Preview, and then click the Eval function to confirm the functions work as expected. Here's an example of what your data might look like before and after extraction:
    SPL2 example Data before extraction Data after extraction
    ... | eval body = ucast(body, "collection<map<string, any>>", null), extracted_element = mvindex(body, 0)
    {
        body: [
            {
                name: "cpu.util",
                unit: "percent",
                type: "g",
                value: 45,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "mem.util",
                unit: "gigabytes",
                type: "g",
                value: 20,
                dimensions: { 
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "net.in",
                unit: "bytes/second",
                type: "g",
                value: 3000,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            }
        ],
        attributes: {
            default_unit: "",
            default_type: "g",
            _splunk_connection_id: "rest_api:all"
        }
    }
    {
        body: [
            {
                name: "cpu.util",
                unit: "percent",
                type: "g",
                value: 45,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "mem.util",
                unit: "gigabytes",
                type: "g",
                value: 20,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            },
            {
                name: "net.in",
                unit: "bytes/second",
                type: "g",
                value: 3000,
                dimensions: {
                    InstanceId: "i-065d598370ac25b90",
                    Region: "us-west-1"
                }
            }
        ],
        extracted_element: {
            name: "cpu.util",
            unit: "percent",
            type: "g",
            value: 45,
            dimensions: {
                InstanceId: "i-065d598370ac25b90",
                Region: "us-west-1"
            }
        },
        attributes: {
            default_unit: "",
            default_type: "g",
            _splunk_connection_id: "rest_api:all"
        }
    }
    

Extract a field value in a nested structure to use as an argument

Some fields in your data might contain nested values that you want to extract and use as a function argument. In this example, we'll route Splunk universal forwarder data to different Splunk indexes based on the value of the nested index field in attributes. To do this, we'll use the spath scalar function.

  1. Configure the Send to a Splunk Index with Batching function to route data depending on the nested index field in attributes.
    1. Click on the + icon, and add the Send to a Splunk Index with Batching function to the pipeline.
    2. Select the connection_id that you want to use.
    3. Enter the following SPL2 expression in the index field:
      cast(spath(attributes, "index"), "string")
    4. In the default_index field, enter "main".
  2. (Optional) Configure the Send to a Splunk Index with Batching function for optimal throughput.
    1. In the splunk_parameters field, enter the following values:
      • hec-gzip-compression = true
      • hec-token-validation = true
      • hec-enable-ack = false
      • async = true
  3. Because sink functions do not show preview data, you should activate your pipeline and navigate to Splunk Enterprise or Splunk Cloud Platform environments to see if your events are being routed correctly.

Promote a nested field to a top-level field

Some fields in your data might contain nested values that you want to extract and assign to a top-level field. For example, the attributes field of your data contains an index key whose value you want to extract in order to format your records so that they match the Splunk HEC metric JSON format. You can extract the value of the index key with the map_get scalar function under the Map category.

  1. From the Data Pipelines Canvas view, click the + icon at the position on your pipeline where you want to extract data from, and then choose To Splunk JSON from the function picker.
  2. In the View Configurations tab of the To Splunk JSON function, enter the following SPL2 expression in the index field:
    cast(map_get(attributes, "index"), "string")

    The SPL2 expression also casts the extracted value to string so that it can be used as an input for the To Splunk JSON function. See Casting and To Splunk JSON in the Function Reference for more details.

  3. In the View Configurations tab of the To Splunk JSON function, toggle the keep_attributes button if you want the attributes map to be available as index-extracted fields in the Splunk platform.
  4. Click Start Preview, and then click the To Splunk JSON function to confirm that the function works as expected. Here's an example of what your data might look like before and after extraction:
    SPL2 example Data before extraction Data after extraction
     ... | to_splunk_json index = cast(map_get(attributes, "index") keep_attributes=true
    {
        host: "myhost",
        source: "mysource",
        source_type: "mysourcetype",
        kind: "metric",
        body: [ 
            { 
                name: "Hello World"
            }
        ],
        attributes: {  
            atrr1: "val1",
            index: "myindex"
        }
    }
    {
        json: "{"event":"Hello World", "source":"mysource", "sourcetype":"mysourcetype", "host":"myhost", "index":"myindex", "fields":{"attr1":"val1"}}"
    }

See also

Functions
MVexpand
flatten
map_set
map_keys
map_values
Casting
mvindex
spath
Related topics
Extracting fields in events data
Complex data types
Accessing list elements using bracket notation
Accessing map elements using dot notation
Last modified on 21 March, 2022
Extracting fields in events data   Summarize records with the stats function

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters