Custom command functions
You can create a custom SPL2 command by declaring a custom command function. A custom command function is a function that performs like a command. There are two types of custom command functions:
- A generating command function creates a set of events and is used as the first command in a search. Examples of built-in generating commands are
from
,union
, andsearch
.
- A non-generating command function processes data that is piped in from generating commands or other non-generating commands. Examples of built-in non-generating commands are
stats
,eval
, andsort
.
Custom command function syntax
The required syntax is in bold.
- function <function-name>
- ( [ $<parameter-name> [: <parameter-type>] [=<default-value>] ], ...) [: <function-type> ]
- {
- [ <SPL-statement>... ]
- return | <search>
- }
The parenthesis ( ) are required, even if a function parameter is not specified.
For non-generating command functions, the <parameter-name> is required.
The curly braces { } are required, even if a <SPL-statement> is not specified.
Required arguments
- function-name
- Syntax: function <function-name>
- Description: The word
function
followed by the name that you want to give to the function. Function names must start with a letter or underscore ( _ ) character and can't contain spaces.
- parameter-name
- For non-generating command functions
- Syntax: $<parameter-name>
- Description: The name of a parameter that you want to use with a custom function. You can specify one or more parameters. One function parameter must be specified to identify the dataset that the custom command function uses. The names
$source
or$data
are often used for this function parameter. - Separate multiple parameters with commas. Parameter names must start with a dollar sign ( $ ) followed by a letter or an underscore ( _ ) character and can't contain spaces. When you specify a function parameter, you have the option to include a parameter type. The default parameter type is
any
. - Example: ($source, $field, $count)
- Default: None
- search
- Syntax: return | <search>
- Description: The word
return
followed by a pipe ( | ) character and the search that you want the function to run.
Optional arguments
- parameter-name
- For generating command functions
- Syntax: $<parameter-name>
- Description: The name that you want to give to a parameter that you want to use with a custom function. You can specify zero or more parameters. Separate multiple parameters with commas. Parameter names must start with a dollar sign ( $ ) followed by a letter or an underscore ( _ ) character and can't contain spaces. When you specify a function parameter, you have the option to include a parameter type. The default parameter type is
any
. - Example: ($source, $field, $count)
- Default: None
- parameter-type
- Syntax: <parameter-type>
- Description: The data type that the parameter accepts. This is the input data type. The default parameter type is
any
. See Built-in data types. If you specify a parameter type, you must separate the <parameter-name> and the <parameter-type> with a colon ( : ). - Example: ($ipaddress: string, $kbps: int)
- Default: any
- default-value
- Syntax: <default-value>
- Description: A default value for the parameter. The value must be a constant, either a literal or a string. The value can't be a field name. Users can override the default value by specifying the function parameter with a value.
- Example: ($count: int=5)
- Default: None
- function-type
- Syntax: <function-type>
- Description: The data type that the function returns. This is the output data type. The default function type is
any
. See Built-in data types. - Default: dataset, unless the function contains a terminating command, such as
into
, which does not return a results dataset.
Advantages of custom command functions
There are many advantages to using custom command functions.
Reusing parts of a search
You can use custom command functions to reuse parts of a search. For example, you might need to perform a complex stats
aggregation, or an intricate filtering or formatting in multiple searches. You can create a single custom function to perform those actions. You can then use that function in as many searches as you want. To change the actions performed, you only need to change the function to propagate the changes to all of the searches that use the function.
For an example of a custom command function that automates parts of a search, see Testing custom functions.
You can also use a custom command function to store a sample set of events, which is called a dataset literal. To see an example, see Examples.
Simplifying search readability
You can use custom command functions to simplify your SPL2 by turning a section of SPL2 that might be difficult to parse into something more readable.
For example, suppose you have a search that has a lengthy section devoted to formatting the search results. Instead of exposing all of those lines in your search, you can create a function called tidy_data
that hides the tedious formatting section, making the search easier to read.
It's a good practice to add a comment to your search to explain what the function does:
...
| tidy_data // This is a function that formats the results.
...
This example uses a line comment to describe the function. For more information about line comments and block comments, see Using comments in SPL2 in the SPL2 Search Manual.
Generating events to test searches
It's convenient to automate the creation of a set of events using a generating command function. The events become a temporary dataset that you can then use to test your searches. In SPL2, there is no command that generates a set of events. However, you can use either the SPL2 repeat
dataset function or a dataset literal to create a set of events in a temporary dataset. See Examples.
How to use custom command functions
When you create a custom command function that is a non-generating command, you must specify at least one function parameter that is used to identify the dataset that the command operates on. See Examples.
How to invoke a function
When you use a custom command function, you don't specify the $
prefix when you specify the parameter values. For example, suppose you have this makeresults
command function to create a set of events:
function makeresults($count) { return | FROM repeat({}, $count) | eval _time = now() }
To invoke the generating makeresults
function, specify the name of the function and the value for the $count
parameter:
| makeresults 5
Similarly with non-generating command functions, you invoke the function after you specify the dataset. Consider this function:
function my_sourcetype($source, $field, $sourcetype: string="webaccess") { return | SELECT count(), $field, _time FROM $source WHERE sourcetype=$sourcetype GROUP BY $field, _time }
To invoke the function, specify the dataset and the field. For example, to use this function on the host
field in the main
dataset you specify this search:
FROM main | my_sourcetype host
Overriding parameter default values
To override the default value for a parameter, you specify the value you want to use in the search.
For example, this function has a default value for $sourcetype
:
function my_sourcetype($source, $field, $sourcetype: string="webaccess") { return | SELECT count(), $field FROM $source WHERE sourcetype=$sourcetype GROUP BY $field }
To override the function default for sourcetype
with httpevent
, specify the value in the search. Because the sourcetype
is a string, it must be enclosed in double quotations:
FROM main | my_sourcetype host "httpevent"
Function parameters must be specified in your search in a particular order, unless you name the parameters.
Parameter order and named arguments
Consider the following function.. This function has six parameters, three of which are strings:
function my_sourcetype($source, $field, $sourcetype:string $status:int, $action:string, $categoryId:string) { return | SELECT $field FROM $source WHERE sourcetype=$sourcetype AND status=$status AND action=$action AND categoryId=$categoryId GROUP BY $field }
When you invoke a function, you must specify the parameter values in the order in which the parameters appear in the function syntax. For example:
FROM main | my_sourcetype host "webaccess" 200 "purchase" "simulation"
Unless you know the function well, it's difficult to know which value equates to each parameter.
To clarify this, you can name the parameters in your search. For example:
FROM main | my_sourcetype field=host sourcetype="webaccess" status=200 action="purchase" categoryId="simulation"
Another advantage of naming the parameters is that you can specify the parameters in any order. For example:
FROM main | my_sourcetype field=host status=200 categoryId="simulation" action="purchase"
sourcetype="webaccess"
Using functions in modules
You create custom command functions in an SPL2 module. You can use a custom command function multiple times within that module. You can also import custom command functions into other modules.
Using functions in the API
You can also create and use custom command functions using the Search Service API.
Supported data types
When you create a custom command function, you have the option to specify the data type for the function or the function parameters.
The default data type for command functions is dataset
. The default data type for function parameters is any
.
To learn more about the supported data types, see Built-in data types.
Testing custom functions
Test your custom functions to verify that the function works as you intend.
You can test custom functions in isolation, without impacting other SPL2 statements, by using a dataset literal. A dataset literal is a dataset that you type into your search. A dataset literal consists of an array of objects consisting of field value pairs. For more information about dataset literals, see Dataset literals in the SPL2 Search Manual.
The following is an example of a custom function that is used to automate almost all of a complex search. Because the search is run on a regular basis against different datasets and different fields using similar criteria, it is more efficient to create a custom function for that search. Several of the parameters have default values, which can be overridden.
function my_sourcetype($source, $field, $sourcetype: string="webaccess", $time: timespan=5m, $count: int=10) { return | SELECT count(), $field, _time FROM $source WHERE sourcetype=$sourcetype GROUP BY $field, span(_time, $time) HAVING count > $count
To invoke the function, you specify the dataset and the field. For example, to use this function on the host
field in the main
dataset you specify this search:
FROM main | my_sourcetype host
To test this function, substitute a dataset literal for the main
dataset.
Suppose these are some of the events in your dataset:
_time | host | sourcetype | action | productId | method |
---|---|---|---|---|---|
6 Apr 2022 9:39:48 PM | www2 | access_combined | purchase | PZ-SG-G05 | POST |
6 Apr 2022 9:34:10 PM | www1 | webaccess | view | GET | |
6 Apr 2022 9:34:02 PM | www3 | webaccess | purchase | SC-MG-G10 | POST |
6 Apr 2022 9:34:01 PM | www2 | access_combined | remove | CU-PG-G06 | GET |
6 Apr 2022 9:34:01 PM | www1 | webaccess | purchase | POST | |
6 Apr 2022 9:29:55 PM | www3 | access_combined | addtocart | SC-MG-G10 | GET |
6 Apr 2022 9:20:51 PM | www1 | access_combined | addtocart | DB-SG-G01 | GET |
6 Apr 2022 9:12:56 PM | www2 | webaccess | changequantity | FS-SG-G03 | GET |
6 Apr 2022 9:12:53 PM | www1 | access_combined | DB-SG-G01 | GET |
You could select several events as the basis for your dataset literal. In this example, the first four events are used for the dataset literal:
[{_time: 6 Apr 2022 9:39:48, host:"www2", sourcetype:"access_combined", action:"purchase", productId:"PZ-SG-G05", method:"POST"}, {_time: 6 Apr 20229:34:10, host:"www1", sourcetype:"webaccess", action:"view", productId:"", method:"GET"}, {_time: 6 Apr 2022 9:34:02, host:"www3", sourcetype:"webaccess", action:"purchase", productId:"SC-MG-G10", method:"POST"}, {_time: 6 Apr 2022 9:34:01, host:"www2", sourcetype:"access_combined", action:"remove", productId:"CU-PG-G06", method:"GET"}]
Here's how the function is invoked in a search using the dataset literal:
FROM [
{_time: 6 Apr 2022 9:39:48, host:"www2", sourcetype:"access_combined", action:"purchase", productId:"PZ-SG-G05", method:"POST"},
{_time: 6 Apr 2022 9:34:10, host:"www1", sourcetype:"webaccess", action:"view", productId:"", method:"GET"},
{_time: 6 Apr 2022 9:34:02, host:"www3", sourcetype:"webaccess", action:"purchase", productId:"SC-MG-G10", method:"POST"},
{_time: 6 Apr 20229:34:01, host:"www2", sourcetype:"access_combined", action:"remove", productId:"CU-PG-G06", method:"GET"}]
| my_sourcetype host
You might need to adjust some of the values in the dataset literal to perform a complete test.
Examples
1. Create a non-generating function
When you create a custom function that is a non-generating command, you must specify at least one function parameter that identifies the dataset that the command operates on.
In SPL2 there is no command that returns the most common values for a field in a dataset. However, you can define a custom function that processes the events in a dataset and returns the top-most values in a specific field.
For example:
function top($source, $field, $limit: int) { return | FROM $source | stats count() by $field | sort -count | head $limit }
- The function,
top
, defines three parameters. - The
$source
parameter identifies the dataset that the command operates on. - The
$field
parameter is used in thestats
command. Thestats
command counts the events and organizes the count by the values in the$field
. - The
sort
command sorts the results returned from thestats
command in descending order. - The
$limit
parameter specifies the maximum number of results to return. The value specified for$limit
must be an integer. Thehead
command returns the top values in the results returned from thesort
command up to the maximum number allowed by the$limit
parameter.
Here's an example of how the top
command function is invoked to return values from the host
field and limits the values to 5:
| FROM main
| top host 5
Internally, this search is expanded to this:
FROM main
| stats count() by host
| sort -count
| head 5
For another example of a non-generating function, see Testing custom functions.
2. Specify a default value for a function parameter
You can add a default value to a function parameter. The default value is used when the function is run, unless you override the value.
The previous example defines a custom function, top
, that processes the events in a dataset and returns the top-most values in a field. Here is the top
function:
function top($source, $field, $limit: int) { return | FROM $source | stats count() by $field | sort -count | head $limit }
You can specify a default value to use for the $limit
parameter. In this example, 8 is the default value:
function top($source, $field, $limit: int=8) { return | FROM $source | stats count() by $field | sort -count | head $limit }
When you invoke the top
function now, you only need to specify the $field
parameter in your search. In this example, host
is the field parameter:
| FROM main
| top host
Because there is a default value for $limit
, the 8 top-most values are returned.
You can override the default value by specifying the $limit
in your search. This example returns 12 values from the host
field:
| FROM main
| top host 12
3. Create a generating function using the repeat function
This example shows you how to create a custom command function that generates events using the repeat
dataset function.
Start with a simple command function that uses the repeat
dataset function:
function makeresults($count) { return | FROM repeat({}, $count) | eval _time = now() | streamstats count | eval _time=_time-(count*3600); }
- The name of the function is
makeresults
. - The function includes one parameter,
$count
, that specifies how many events to create. - The
repeat
function creates multiple identical events. The firsteval
command is used to add the _time field, which contains the timestamp when the event is created. - The
streamstats
command creates a count of the events. - The second
eval
command is used to convert the timestamps into incremental timestamps, by multiplying thecount
by 3600, the number of seconds in an hour. That number is subtracted from the time to create the incremental timestamps.
To invoke the makeresults
function and create 5 events, use this search:
| makeresults 5
The timestamps are one hour apart, starting with the latest timestamp and ending with the earliest timestamp.
The results look something like this:
_time | count |
---|---|
20 Apr 2022 2:35:58 PM | 1 |
20 Apr 2022 1:35:58 PM | 2 |
20 Apr 2022 12:35:58 PM | 3 |
20 Apr 2022 11:35:58 AM | 4 |
20 Apr 2022 10:35:58 AM | 5 |
Alternatively, you can specify multiple key-value pairs in an object format to create multiple, duplicate fields in each event.
function makeresults($count) { return | FROM repeat({host: "www1", sourcetype: "access_combined"}, $count) | eval _time = now(); }
When you invoke the makeresults
function with a count of 3, the results look something like this:
_raw | _time | host | sourcetype |
---|---|---|---|
{"host': "www1", "sourcetype": "access_combined"} | 20 Apr 2022 2:35:58 PM | www1 | access_combined |
{"host": "www1", "sourcetype": "access_combined"} | 20 Apr 2022 2:35:58 PM | www1 | access_combined |
{"host": "www1", "sourcetype": "access_combined"} | 20 Apr 2022 2:35:58 PM | www1 | access_combined |
You can alter the duplicate events by adding the streamstats
command to create a count of the events. Use the eval
command to alter an event by the count
number.
For example:
function makeresults($count) { return | FROM repeat({host: "www1", sourcetype: "access_combined"}, $count) | eval _time = now() | streamstats count() | eval host = if(count=2, "www2", host); }
When you invoke the makeresults
function, the results look something like this:
_raw | _time | host | sourcetype | count |
---|---|---|---|---|
{"host": "www1", "sourcetype": "access_combined"} | 20 Apr 2022 2:35:58 PM | www1 | access_combined | 1 |
{"host": "www1", "sourcetype": "access_combined"} | 20 Apr 2022 2:35:58 PM | www2 | access_combined | 2 |
{"host": "www1", "sourcetype": "access_combined"} | 20 Apr 2022 2:35:58 PM | www1 | access_combined | 3 |
If you want to create multiple events with different field values use a dataset literal instead. See the next example.
4. Create a generating custom command using a dataset literal
This example shows you how to create a custom command function that generates events using a dataset literal.
This dataset literal contains a set of objects with key-value pairs with the type and name for cooperative and competitive board games:
[ {"type": "cooperative", "name": "Forbidden Island"}, {"type": "cooperative", "name": "Pandemic"}, {"type": "cooperative", "name": "Sherlock Holmes: Consulting Detective"}, {"type": "competitive", "name": "Settlers of Catan"}, {"type": "competitive", "name": "Terraforming Mars"}, {"type": "competitive", "name": "Ticket to Ride"} ]
You can use the dataset literal in a generating command function.
In this example, the name of the function is test_events
. Here's how to specify the test_events
function with a dataset literal:
function test_events () { return | FROM [ {"type": "cooperative", "name": "Forbidden Island"}, {"type": "cooperative", "name": "Pandemic"}, {"type": "cooperative", "name": "Sherlock Holmes: Consulting Detective"}, {"type": "competitive", "name": "Settlers of Catan"}, {"type": "competitive", "name": "Terraforming Mars"}, {"type": "competitive", "name": "Ticket to Ride"} ] }
Even though the function does not have any parameters, the parenthesis are required before the return expression.
To invoke the test_events
function, you only need to specify the name of the function:
test_events
The results look something like this:
Event | name | type |
---|---|---|
{"type": "cooperative", "name": "Forbidden Island"} | Forbidden Island | cooperative |
{"type": "cooperative", "name": "Pandemic"} | Pandemic | cooperative |
{"type": "cooperative", "name": "Sherlock Holmes: Consulting Detective"} | Sherlock Holmes: Consulting Detective | cooperative |
{"type": "competitive", "name": "Settlers of Catan"} | Settlers of Catan | competitive |
{"type": "competitive", "name": "Terraforming Mars"} | Terraforming Mars | competitive |
{"type": "competitive", "name": "Ticket to Ride"} | Ticket to Ride | competitive |
To help you get started using dataset literals, see Sample dataset literals in the SPL2 Search Manual. Use the sample dataset literals as-is or as a starting point for your own data.
See also
- Custom eval functions
- Documenting custom functions
- Naming function arguments in the SPL2 Search Manual
Custom eval functions | Custom data types |
This documentation applies to the following versions of Splunk® Cloud Services: current
Feedback submitted, thanks!