Splunk® Cloud Services

SPL2 Search Reference

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Custom command functions

You can create a custom SPL2 command by declaring a custom command function. A custom command function is a function that performs like a command. There are two types of custom command functions:

  • A generating command function creates a set of events and is used as the first command in a search. Examples of built-in generating commands are from, union, and search.
  • A non-generating command function processes data that is piped in from generating commands or other non-generating commands. Examples of built-in non-generating commands are stats, eval, and sort.

Custom command function syntax

The required syntax is in bold.

function <function-name>
( [ $<parameter-name> [: <parameter-type>] [=<default-value>] ], ...) [: <function-type> ]
{
[ <SPL-statement>... ]
return | <search>
}

The parenthesis ( ) are required, even if a function parameter is not specified.

For non-generating command functions, the <parameter-name> is required.

The curly braces { } are required, even if a <SPL-statement> is not specified.

Required arguments

function-name
Syntax: function <function-name>
Description: The word function followed by the name that you want to give to the function. Function names must start with a letter or underscore ( _ ) character and can't contain spaces.
parameter-name
For non-generating command functions
Syntax: $<parameter-name>
Description: The name of a parameter that you want to use with a custom function. You can specify one or more parameters. One function parameter must be specified to identify the dataset that the custom command function uses. The names $source or $data are often used for this function parameter.
Separate multiple parameters with commas. Parameter names must start with a dollar sign ( $ ) followed by a letter or an underscore ( _ ) character and can't contain spaces. When you specify a function parameter, you have the option to include a parameter type. The default parameter type is any.
Example: ($source, $field, $count)
Default: None
search
Syntax: return | <search>
Description: The word return followed by a pipe ( | ) character and the search that you want the function to run.

Optional arguments

parameter-name
For generating command functions
Syntax: $<parameter-name>
Description: The name that you want to give to a parameter that you want to use with a custom function. You can specify zero or more parameters. Separate multiple parameters with commas. Parameter names must start with a dollar sign ( $ ) followed by a letter or an underscore ( _ ) character and can't contain spaces. When you specify a function parameter, you have the option to include a parameter type. The default parameter type is any.
Example: ($source, $field, $count)
Default: None
parameter-type
Syntax: <parameter-type>
Description: The data type that the parameter accepts. This is the input data type. The default parameter type is any. See Built-in data types. If you specify a parameter type, you must separate the <parameter-name> and the <parameter-type> with a colon ( : ).
Example: ($ipaddress: string, $kbps: int)
Default: any
default-value
Syntax: <default-value>
Description: A default value for the parameter. The value must be a constant, either a literal or a string. The value can't be a field name. Users can override the default value by specifying the function parameter with a value.
Example: ($count: int=5)
Default: None
function-type
Syntax: <function-type>
Description: The data type that the function returns. This is the output data type. The default function type is any. See Built-in data types.
Default: dataset, unless the function contains a terminating command, such as into, which does not return a results dataset.

Advantages of custom command functions

There are many advantages to using custom command functions.

Reusing parts of a search

You can use custom command functions to reuse parts of a search. For example, you might need to perform a complex stats aggregation, or an intricate filtering or formatting in multiple searches. You can create a single custom function to perform those actions. You can then use that function in as many searches as you want. To change the actions performed, you only need to change the function to propagate the changes to all of the searches that use the function.

For an example of a custom command function that automates parts of a search, see Testing custom functions.

You can also use a custom command function to store a sample set of events, which is called a dataset literal. To see an example, see Examples.

Simplifying search readability

You can use custom command functions to simplify your SPL2 by turning a section of SPL2 that might be difficult to parse into something more readable.

For example, suppose you have a search that has a lengthy section devoted to formatting the search results. Instead of exposing all of those lines in your search, you can create a function called tidy_data that hides the tedious formatting section, making the search easier to read.

It's a good practice to add a comment to your search to explain what the function does:

... | tidy_data // This is a function that formats the results. ...

This example uses a line comment to describe the function. For more information about line comments and block comments, see Using comments in SPL2 in the SPL2 Search Manual.

Generating events to test searches

It's convenient to automate the creation of a set of events using a generating command function. The events become a temporary dataset that you can then use to test your searches. In SPL2, there is no command that generates a set of events. However, you can use either the SPL2 repeat dataset function or a dataset literal to create a set of events in a temporary dataset. See Examples.

How to use custom command functions

When you create a custom command function that is a non-generating command, you must specify at least one function parameter that is used to identify the dataset that the command operates on. See Examples.

How to invoke a function

When you use a custom command function, you don't specify the $ prefix when you specify the parameter values. For example, suppose you have this makeresults command function to create a set of events:

function makeresults($count) {
    return | FROM repeat({}, $count) 
               | eval _time = now()
}

To invoke the generating makeresults function, specify the name of the function and the value for the $count parameter:

| makeresults 5

Similarly with non-generating command functions, you invoke the function after you specify the dataset. Consider this function:

function my_sourcetype($source, $field, $sourcetype: string="webaccess") {
    return | SELECT count(), $field, _time 
                FROM $source
                WHERE sourcetype=$sourcetype
                GROUP BY $field, _time 
}

To invoke the function, specify the dataset and the field. For example, to use this function on the host field in the main dataset you specify this search:

FROM main | my_sourcetype host

Overriding parameter default values

To override the default value for a parameter, you specify the value you want to use in the search. For example, this function has a default value for $sourcetype:

function my_sourcetype($source, $field, $sourcetype: string="webaccess") {
    return | SELECT count(), $field
                FROM $source
                WHERE sourcetype=$sourcetype
                GROUP BY $field
}

To override the function default for sourcetype with httpevent, specify the value in the search. Because the sourcetype is a string, it must be enclosed in double quotations:

FROM main | my_sourcetype host "httpevent"

Function parameters must be specified in your search in a particular order, unless you name the parameters.

Parameter order and named arguments

Consider the following function.. This function has six parameters, three of which are strings:

function my_sourcetype($source, $field, $sourcetype:string $status:int, $action:string, $categoryId:string) {
return | SELECT $field
      FROM $source
      WHERE sourcetype=$sourcetype AND status=$status AND action=$action AND categoryId=$categoryId
      GROUP BY $field
}

When you invoke a function, you must specify the parameter values in the order in which the parameters appear in the function syntax. For example:

FROM main | my_sourcetype host "webaccess" 200 "purchase" "simulation"

Unless you know the function well, it's difficult to know which value equates to each parameter.

To clarify this, you can name the parameters in your search. For example:

FROM main | my_sourcetype field=host sourcetype="webaccess" status=200 action="purchase" categoryId="simulation"

Another advantage of naming the parameters is that you can specify the parameters in any order. For example:

FROM main | my_sourcetype field=host status=200 categoryId="simulation" action="purchase" sourcetype="webaccess"

Using functions in modules

You create custom command functions in an SPL2 module. You can use a custom command function multiple times within that module. You can also import custom command functions into other modules.

Using functions in the API

You can also create and use custom command functions using the Search Service API.

Supported data types

When you create a custom command function, you have the option to specify the data type for the function or the function parameters.

The default data type for command functions is dataset. The default data type for function parameters is any.

To learn more about the supported data types, see Built-in data types.

Testing custom functions

Test your custom functions to verify that the function works as you intend.

You can test custom functions in isolation, without impacting other SPL2 statements, by using a dataset literal. A dataset literal is a dataset that you type into your search. A dataset literal consists of an array of objects consisting of field value pairs. For more information about dataset literals, see Dataset literals in the SPL2 Search Manual.

The following is an example of a custom function that is used to automate almost all of a complex search. Because the search is run on a regular basis against different datasets and different fields using similar criteria, it is more efficient to create a custom function for that search. Several of the parameters have default values, which can be overridden.

function my_sourcetype($source, $field, $sourcetype: string="webaccess", $time: timespan=5m, $count: int=10) {
    return | SELECT count(), $field, _time 
                FROM $source
                WHERE sourcetype=$sourcetype
                GROUP BY $field, span(_time, $time)
                HAVING count > $count

To invoke the function, you specify the dataset and the field. For example, to use this function on the host field in the main dataset you specify this search:

FROM main | my_sourcetype host

To test this function, substitute a dataset literal for the main dataset.

Suppose these are some of the events in your dataset:

_time host sourcetype action productId method
6 Apr 2022 9:39:48 PM www2 access_combined purchase PZ-SG-G05 POST
6 Apr 2022 9:34:10 PM www1 webaccess view GET
6 Apr 2022 9:34:02 PM www3 webaccess purchase SC-MG-G10 POST
6 Apr 2022 9:34:01 PM www2 access_combined remove CU-PG-G06 GET
6 Apr 2022 9:34:01 PM www1 webaccess purchase POST
6 Apr 2022 9:29:55 PM www3 access_combined addtocart SC-MG-G10 GET
6 Apr 2022 9:20:51 PM www1 access_combined addtocart DB-SG-G01 GET
6 Apr 2022 9:12:56 PM www2 webaccess changequantity FS-SG-G03 GET
6 Apr 2022 9:12:53 PM www1 access_combined DB-SG-G01 GET

You could select several events as the basis for your dataset literal. In this example, the first four events are used for the dataset literal:

[{_time: 6 Apr 2022 9:39:48, host:"www2", sourcetype:"access_combined", action:"purchase", productId:"PZ-SG-G05", method:"POST"}, 
{_time: 6 Apr 20229:34:10, host:"www1", sourcetype:"webaccess", action:"view", productId:"", method:"GET"}, 
{_time: 6 Apr 2022 9:34:02, host:"www3", sourcetype:"webaccess", action:"purchase", productId:"SC-MG-G10", method:"POST"}, 
{_time: 6 Apr 2022 9:34:01, host:"www2", sourcetype:"access_combined", action:"remove", productId:"CU-PG-G06", method:"GET"}]

Here's how the function is invoked in a search using the dataset literal:

FROM [ {_time: 6 Apr 2022 9:39:48, host:"www2", sourcetype:"access_combined", action:"purchase", productId:"PZ-SG-G05", method:"POST"}, {_time: 6 Apr 2022 9:34:10, host:"www1", sourcetype:"webaccess", action:"view", productId:"", method:"GET"}, {_time: 6 Apr 2022 9:34:02, host:"www3", sourcetype:"webaccess", action:"purchase", productId:"SC-MG-G10", method:"POST"}, {_time: 6 Apr 20229:34:01, host:"www2", sourcetype:"access_combined", action:"remove", productId:"CU-PG-G06", method:"GET"}] | my_sourcetype host

You might need to adjust some of the values in the dataset literal to perform a complete test.

Examples

1. Create a non-generating function

When you create a custom function that is a non-generating command, you must specify at least one function parameter that identifies the dataset that the command operates on.

In SPL2 there is no command that returns the most common values for a field in a dataset. However, you can define a custom function that processes the events in a dataset and returns the top-most values in a specific field.

For example:

function top($source, $field, $limit: int) {
    return | FROM $source 
        | stats count() by $field 
        | sort -count
        | head $limit
}
  • The function, top, defines three parameters.
  • The $source parameter identifies the dataset that the command operates on.
  • The $field parameter is used in the stats command. The stats command counts the events and organizes the count by the values in the $field.
  • The sort command sorts the results returned from the stats command in descending order.
  • The $limit parameter specifies the maximum number of results to return. The value specified for $limit must be an integer. The head command returns the top values in the results returned from the sort command up to the maximum number allowed by the $limit parameter.

Here's an example of how the top command function is invoked to return values from the host field and limits the values to 5:

| FROM main | top host 5

Internally, this search is expanded to this:

FROM main | stats count() by host | sort -count | head 5

For another example of a non-generating function, see Testing custom functions.

2. Specify a default value for a function parameter

You can add a default value to a function parameter. The default value is used when the function is run, unless you override the value.

The previous example defines a custom function, top, that processes the events in a dataset and returns the top-most values in a field. Here is the top function:

function top($source, $field, $limit: int) {
    return | FROM $source 
        | stats count() by $field 
        | sort -count
        | head $limit
}

You can specify a default value to use for the $limit parameter. In this example, 8 is the default value:

function top($source, $field, $limit: int=8) {
    return | FROM $source 
        | stats count() by $field 
        | sort -count
        | head $limit
}

When you invoke the top function now, you only need to specify the $field parameter in your search. In this example, host is the field parameter:

| FROM main | top host

Because there is a default value for $limit, the 8 top-most values are returned.

You can override the default value by specifying the $limit in your search. This example returns 12 values from the host field:

| FROM main | top host 12

3. Create a generating function using the repeat function

This example shows you how to create a custom command function that generates events using the repeat dataset function.

Start with a simple command function that uses the repeat dataset function:

function makeresults($count) {
    return | FROM repeat({}, $count)
 | eval _time = now()
 | streamstats count
 | eval _time=_time-(count*3600);
}
  • The name of the function is makeresults.
  • The function includes one parameter, $count, that specifies how many events to create.
  • The repeat function creates multiple identical events. The first eval command is used to add the _time field, which contains the timestamp when the event is created.
  • The streamstats command creates a count of the events.
  • The second eval command is used to convert the timestamps into incremental timestamps, by multiplying the count by 3600, the number of seconds in an hour. That number is subtracted from the time to create the incremental timestamps.

To invoke the makeresults function and create 5 events, use this search:

| makeresults 5

The timestamps are one hour apart, starting with the latest timestamp and ending with the earliest timestamp.

The results look something like this:

_time count
20 Apr 2022 2:35:58 PM 1
20 Apr 2022 1:35:58 PM 2
20 Apr 2022 12:35:58 PM 3
20 Apr 2022 11:35:58 AM 4
20 Apr 2022 10:35:58 AM 5

Alternatively, you can specify multiple key-value pairs in an object format to create multiple, duplicate fields in each event.

function makeresults($count) {
    return | FROM repeat({host: "www1", sourcetype: "access_combined"}, $count)
 | eval _time = now();
}

When you invoke the makeresults function with a count of 3, the results look something like this:

_raw _time host sourcetype
{"host': "www1", "sourcetype": "access_combined"} 20 Apr 2022 2:35:58 PM www1 access_combined
{"host": "www1", "sourcetype": "access_combined"} 20 Apr 2022 2:35:58 PM www1 access_combined
{"host": "www1", "sourcetype": "access_combined"} 20 Apr 2022 2:35:58 PM www1 access_combined

You can alter the duplicate events by adding the streamstats command to create a count of the events. Use the eval command to alter an event by the count number.

For example:

function makeresults($count) {
    return | FROM repeat({host: "www1", sourcetype: "access_combined"}, $count)
 | eval _time = now()
 | streamstats count() 
 | eval host = if(count=2, "www2", host);
}

When you invoke the makeresults function, the results look something like this:

_raw _time host sourcetype count
{"host": "www1", "sourcetype": "access_combined"} 20 Apr 2022 2:35:58 PM www1 access_combined 1
{"host": "www1", "sourcetype": "access_combined"} 20 Apr 2022 2:35:58 PM www2 access_combined 2
{"host": "www1", "sourcetype": "access_combined"} 20 Apr 2022 2:35:58 PM www1 access_combined 3

If you want to create multiple events with different field values use a dataset literal instead. See the next example.

4. Create a generating custom command using a dataset literal

This example shows you how to create a custom command function that generates events using a dataset literal.

This dataset literal contains a set of objects with key-value pairs with the type and name for cooperative and competitive board games:

[
   {"type": "cooperative", "name": "Forbidden Island"}, 
   {"type": "cooperative", "name": "Pandemic"}, 
   {"type": "cooperative", "name": "Sherlock Holmes: Consulting Detective"}, 
   {"type": "competitive", "name": "Settlers of Catan"}, 
   {"type": "competitive", "name": "Terraforming Mars"}, 
   {"type": "competitive", "name": "Ticket to Ride"}
]

You can use the dataset literal in a generating command function.

In this example, the name of the function is test_events. Here's how to specify the test_events function with a dataset literal:

function test_events () {
    return | FROM [
        {"type": "cooperative", "name": "Forbidden Island"}, 
        {"type": "cooperative", "name": "Pandemic"}, 
        {"type": "cooperative", "name": "Sherlock Holmes: Consulting Detective"}, 
        {"type": "competitive", "name": "Settlers of Catan"}, 
        {"type": "competitive", "name": "Terraforming Mars"}, 
        {"type": "competitive", "name": "Ticket to Ride"}
    ]
}

Even though the function does not have any parameters, the parenthesis are required before the return expression.

To invoke the test_events function, you only need to specify the name of the function:

test_events

The results look something like this:

Event name type
{"type": "cooperative", "name": "Forbidden Island"} Forbidden Island cooperative
{"type": "cooperative", "name": "Pandemic"} Pandemic cooperative
{"type": "cooperative", "name": "Sherlock Holmes: Consulting Detective"} Sherlock Holmes: Consulting Detective cooperative
{"type": "competitive", "name": "Settlers of Catan"} Settlers of Catan competitive
{"type": "competitive", "name": "Terraforming Mars"} Terraforming Mars competitive
{"type": "competitive", "name": "Ticket to Ride"} Ticket to Ride competitive

To help you get started using dataset literals, see Sample dataset literals in the SPL2 Search Manual. Use the sample dataset literals as-is or as a starting point for your own data.

See also

Custom eval functions
Documenting custom functions
Naming function arguments in the SPL2 Search Manual
Last modified on 24 July, 2023
PREVIOUS
Custom eval functions
  NEXT
Custom data types

This documentation applies to the following versions of Splunk® Cloud Services: current


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters