Splunk® Cloud Services

SPL2 Search Reference

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Multivalue and array functions

For an overview about the stats and charting functions, see Overview of SPL2 stats and chart functions.

dataset() or dataset(<fields>)

The dataset function aggregates events into arrays of SPL2 field-value objects.

Usage

You can use this function in the SELECT clause in the from command and with the stats command.

  • If you specify dataset(), the function returns all of the fields in the events that match your search criteria.
  • If you specify dataset<fields>, the function returns only the specified fields in each event that match your search criteria. The list of fields must be comma separated.

To use named arguments, you must specify the argument name before the argument values. Specify the field list in an array, enclosing the list in square brackets. The syntax for named arguments is ...dataset(fields: [<field>, <field>,...]. For example:

...dataset(fields: [uid, username]

Examples

Return all fields and values in an array

You can create a dataset array from all of the fields and values in the search results.

Consider this set of data:

_time department username
2020-04-13 13:02:45.000 PM Engineering Claudia Garcia
2020-04-13 10:52:41.000 AM IT Vanya Patel
2020-04-13 06:23:48.000 AM Personnel Alex Martin

Use the dataset function to create an array from all of the fields and values using the following search:

...| stats dataset()

The results look something like this:

[{_time: 2020-04-13 13:02:45.000 PM, department: Engineering, username: "Claudia Garcia"}, {_time: 2020-04-13 10:52:41.000 AM, department: IT, username: "Vanya Patel"}, {_time: 2020-04-13 06:23:48.000 AM, department: Personnel, username: "Alex Martin"}]

If you don't specify any fields with the dataset function, all of the fields are included in a single dataset array. A single dataset array is also returned if you specify a wildcard with the dataset function, for example: dataset(*).

Return specific fields in an array

You can specify multiple fields to populate the dataset arrays.

Consider this set of data:

_time host action department uid username
2020-04-13 13:02:45.000 PM mailsv2 Failed password Engineering 1066 Claudia Garcia
2020-04-13 10:52:41.000 AM mailsv1 Failed password IT 1815 Vanya Patel
2020-04-13 06:23:48.000 AM mailsv3 Session closed Personnel 1916 Alex Martin
2020-04-13 06:23:48.000 AM mailsv3 Failed password Engineering 1862 Wei Zhang
2020-04-12 20:18:36.000 PM mailsv1 Session closed Engineering 1690 Rutherford Sullivan

The following search creates a dataset array with only the department and username field values.

| FROM main SELECT dataset(department, username)

The results look something like this:

dataset(department, username)
[{department: Engineering, username: "Claudia Garcia"}, {department: IT, username: "Vanya Patel"}, {department: Personnel, username: "Alex Martin"}, {department: Engineering, username: "Wei Zhang"},{department: Engineering, username: "Rutherford Sullivan"}]

Use a BY clause to create separate arrays

You can use a BY clause to organize the search results into separate dataset arrays.

Consider this set of data.

_time host action department uid username
2020-04-13 13:02:45.000 PM mailsv2 Failed password Engineering 1066 Claudia Garcia
2020-04-13 10:52:41.000 AM mailsv1 Failed password IT 1815 Vanya Patel
2020-04-13 06:23:48.000 AM mailsv3 Session closed Personnel 1916 Alex Martin
2020-04-13 06:23:48.000 AM mailsv3 Failed password Engineering 1862 Wei Zhang
2020-04-12 20:18:36.000 PM mailsv1 Session closed Engineering 1690 Rutherford Sullivan

The following search organizes the dataset arrays by department. Each dataset array contains the user ID and username.

| FROM main SELECT dataset(uid, username) GROUP BY department

Because this search uses the from command, the GROUP BY clause is used. If you use this function with the stats command, you would specify the BY clause.

The results look something like this:

department dataset(uid, username)
Engineering [{uid: 1066, username: "Claudia Garcia"}, {uid: 1690, username: "Rutherford Sullivan"}, {uid: 1862, username: "Wei Zhang"}]
IT [{uid: 1815 , username: "Vanya Patel"}]
Personnel [{uid: 1916, username: "Alex Martin"}]

Rename the dataset field

By default the field name created by the dataset function is the name of the function and the fields specified. You can rename the field that appears in the search results by using the AS keyword. For example:

| FROM main | stats dataset(department, username) AS employees

The results look something like this:

employees
[{department: Engineering, username: "Claudia Garcia"}, {department: IT, username: "Vanya Patel"}, {department: Personnel, username: "Alex Martin"}]

Start the search with the SELECT clause

This example shows how to use the dataset function with the SELECT clause in the from command.

| SELECT dataset(department, username) FROM main

The results look something like this:

dataset(department, name)
[{department: Engineering, username: "Claudia Garcia"}, {department: IT, username: "Vanya Patel"}, {department: Personnel, username: "Alex Martin"}, {department: Engineering, username: "Wei Zhang"},{department: Engineering, username: "Rutherford Sullivan"}]

list(<value>)

The list function returns a multivalue entry from the values in a field. The order of the values reflects the order of the events.

Usage

You can use this function with the stats, streamstats, and timechart commands.

  • If more than 100 values are in the field, only the first 100 are returned.
  • This function processes field values as strings.

To use named arguments, you must specify the argument name before the argument value. For example:

...list(value:rowNumber)

Example

To illustrate what the list function does, let's start by generating a few simple results.

  1. Use the from and streamstats commands to generate a set of 11 results that are simply timestamps and a count of the results, which are used as row numbers. Empty arrays with the from command generate the timestamps.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber

    There are 11 results. Th first few results look something like this:

    _time rowNumber
    2019-10-10T07:19:02.000-07:00 1
    2019-10-10T07:19:02.000-07:00 2
    2019-10-10T07:19:02.000-07:00 3
    2019-10-10T07:19:02.000-07:00 4
    2019-10-10T07:19:02.000-07:00 5

    Notice that each result appears on a separate row, with a line between each row.


  2. Add the stats command with the list function to the search. The numbers are returned in ascending order in a single, multivalue result.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber | stats list(rowNumber) AS numbers

    The results look something like this:

    numbers
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11

    Notice that this is a single result with multiple values. There are no lines between each value.

  3. Compare this result with the results returned by the values function.

pivot(<key>,<value>)

The pivot function aggregates the values in a field and returns the results as an object.

Usage

The <key> argument can be a single field or a string template, which can reference multiple fields.

The <value> argument must be an aggregate, such as count() or sum().

You can use this function with the SELECT clause in the from command, or with the stats command.

Examples

Using the pivot function

Consider the following data that's in the main dataset:

status host
200 www1
200 www2
200 www1
400 www2
400 www1
400 www3
403 www2
404 www2

Use the pivot function to create an object that summarizes the HTTP status values for each separate host. The search looks like this:

FROM main GROUP BY host SELECT host, pivot(status, count())

The results look something like this:

host pivot(status,count())
www1 {"200":2,"400":1}
www2 {"200":1,"400":1,"403":1,"404":1}
www3 {"400":1}


You can use the pivot function with the stats command to accomplish the same results:

FROM main | stats pivot(status,count()) as pivotStatus by host

In this search the output field, pivot(status,count()), is renamed using the AS clause.

This search returns the same results as the previous search and looks something like this:

host pivotStatus
www1 {"200":2,"400":1}
www2 {"200":1,"400":1,"403":1,"404":1}
www3 {"400":1}

Creating nested objects with the pivot function

You can use nested pivot functions to create nested objects.

Consider the following data:

status host action
200 www1 purchase
200 www2 purchase
200 www1 addtocart
200 www2 purchase
400 www1 addtocart
400 www1 purchase
403 www2 changequantity
404 www1 view
404 www3 purchase
404 www2 changequantity
404 www1 purchase

You can use one pivot function as the <value> argument in another pivot function to create nested objects:

FROM main GROUP BY status SELECT status, pivot(host, pivot(action,count())) AS nestedPivot

The results look something like this:

status nestedPivot
200 {"www1":{"addtocart":1,"purchase":1},"www2":{"purchase":2}}
400 {"www1":{"addtocart":1,"purchase":1}}
403 {"www2":{"changequantity":1}}
404 {"www1":{"purchase":1,"view":1},"www2":{"changequantity":1},"www3":{"purchase":1}}

Using a string template with the pivot function

You can use a string template in the <value> argument of the pivot function. In this example, the string template contains two template expressions, ${name} and ${city}, which are field names. The entire string template is enclosed in double quotation marks:

SELECT pivot("${name} in ${city}", count()) AS mylist FROM main

Suppose these are the values in the name and city fields:

name city date
Claudia London 2020-04-13
Alex Berlin 2020-04-19
Wei Sydney 2020-04-21
Claudia London 2020-04-30

The results look something like this:

mylist
{"Alex in Berlin":1,"Claudia in London":2,"Wei in Sydney":1}

You can add the flatten command to the search:

SELECT pivot("${name} in ${city}", count()) AS mylist FROM main | flatten mylist

The results look something like this:

Alex in Berlin Claudia in London Wei in Sydney mylist
1 2 1 {"Alex in Berlin":1,"Claudia in London":2,"Wei in Sydney":1}

values(<value>)

The values function returns a list of the distinct values in a field as a multivalue entry.

Usage

You can use this function with the stats, streamstats, and timechart commands.

  • By default there is no limit to the number of values returned.
  • This function processes field values as strings.

The order of the values is lexicographical.

Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.

  • Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
  • Uppercase letters are sorted before lowercase letters.
  • Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

To use named arguments, you must specify the argument name before the argument value. For example:

...list(value:rowNumber)

Example

To illustrate what the values function does, let's start by generating a few simple results.

  1. Use the from and streamstats commands to generate a set of 11 results that are simply timestamps and a count of the results, which are used as row numbers. Empty arrays with the from command generate the timestamps.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber

    There are 11 results. Th first few results look something like this:

    _time rowNumber
    2019-10-10T07:19:02.000-07:00 1
    2019-10-10T07:19:02.000-07:00 2
    2019-10-10T07:19:02.000-07:00 3
    2019-10-10T07:19:02.000-07:00 4
    2019-10-10T07:19:02.000-07:00 5

    Notice that each result appears on a separate row, with a line between each row.

  2. Add the stats command with thevalues function the the search. The results are returned in lexicographical order.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber | stats values(rowNumber) AS numbers

    The results look something like this:

    numbers
    1
    10
    11
    2
    3
    4
    5
    6
    7
    8
    9

    Notice that this is a single result with multiple values. There are no lines between each value.

  3. Compare these results with the results returned by the list function.
Last modified on 05 September, 2020
PREVIOUS
Event order functions
  NEXT
Time functions

This documentation applies to the following versions of Splunk® Cloud Services: current


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters