Custom search commands
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Custom search commands
Although the Splunk search language is large, you may find that you'd like to write your own custom search command. You can add a custom search script to Splunk to create your own search command. Currently, you can only use Python to add your search script. Also note that the search command API does not support recursive searching. If you'd like to build a search that runs recursively, use the REST search API.
Get started
There are only two steps to building a search command into Splunk: 1. Build your search command in Python. 2. Add an entry to commands.conf to make Splunk aware of your custom command.
Types of commands
- streaming
- A streaming command is applied as results travel through the search pipeline.
- If your script is not streaming, it will only process a single chunk of results. You may specify a search (that contains only streaming commands) to be executed before your non-streaming script, if your script is the very first non-streaming command in the pipeline or if you have 'requires_preop' set to true (it's false by default).
- generating
- A generating command must be the first command specified in a search. Generating commands rely on being passed useful arguments.
- retevs
- 'retainsevents' in commands.conf
- This settings indicates that this script outputs 'events' when given 'events' as input.
- By default this is set to false, meaning that the Timeline will never represent the output of this command. Although there is no universal definition of what an event is, generally, if you intend to retain the '_raw' and '_time' fields, set retevs to true.
- reqsop
- 'requires_preop' in commands.conf.
- This setting indicates if the string in the 'preop' variable must be executed, regardless if this script is the first non-streaming command in a search pipeline or not.
- timeorder
- represents both 'generates_timeorder' and 'overrides_timeorder' in commands.conf.
- 'overrides_timeorder' overrides the order of the input to the script. For example, if the input to this script is in descending time order, the output will be in ascending time order.
- 'generates_timeorder' only applies to generating commands. This setting indicates that the script will ignore the order of the input and always generate output in descending time order.
Build your search command in Python
Python search commands rely on Intersplunk.py to grab events from the search pipeline and pass the modified events back. The arguments passed to your script in sys.argv are the same arguments you'll use when searching with the command.
Handling input
The simplest way to get data to your search command is to use splunk.Intersplunk.readResults, which takes three optional parameters and returns a list of dicts representing the list of input events. The optional parameters are 'input_buf', 'settings', and 'has_header'.
- 'inputbuf' = None | file
- indicates where to read input from.
- Set to None by default, which means your search command expects to get data from sys.stdin.
- 'settings' = None | dict
- indicates where to store any information found in the input header.
- Set to None by default, which means don't record the settings.
- 'has_header' = True | False
- indicates whether or not we expect an input header and is True by default.
Here's an example call to splunk.Intersplunk.readResults:
results = splunk.Intersplunk.readResults(None, None, True)
This indicates that you're reading results from the search pipeline. The input to your script will either be pure CSV, or a header section followed by a blank line followed by pure CSV. By setting True in the above example, your command will expect a header with your results. If you set this to False, you must also set the enableheader key in the commands.conf entry for your command.
If your script does not expect a header section in the input, you can directly use the Python csv module to read the input. For example:
import csv
r = csv.reader(sys.stdin)
for l in r:
...
The advantage of this configuration is that you can break at any time in the for loop and only lines in the input that you've iterated to will already be read into memory, leading to much better performance for some cases.
Sending output
Intersplunk can also be used to construct your script's output. splunk.Intersplunk.generateErrorResults takes a string and writes the correct error output to sys.stdout. splunk.Intersplunk.outputResults takes a list of dict objects and writes the appropriate CSV output to sys.stdout.
To output data, add:
splunk.Intersplunk.outputResults(results)
The output of your script is expected to be pure CSV. To indicate an error, return a CSV with a single "ERROR" column and a single row (besides the header row) with the contents of the message.
Debugging your script
If your script has 'supports_getinfo' = true, the first argument to your script must either be __GETINFO__ or __EXECUTE__. Setting 'supports_getinfo' = true is a good tool for debugging as it allows your script to be called with the command arguments at parse time, before any execution of the search. Any syntax errors will stop your search query being executed. If you call your script with __GETINFO__, you can also dynamically specify the properties of your script (such as streaming or not) depending on your arguments.
If your script has 'supports_getinfo' set to True, you should first make a call like:
(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)
Which will strip the first argument from sys.argv and check if you are in GETINFO mode or EXECUTE mode. If you are in GETINFO mode, your script should use splunk.Intersplunk.outputInfo() to return the properties of your script or splunk.Intersplunk.parseError() if the arguments are invalid. The definition of outputInfo() is as follows:
def outputInfo(streaming, generating, retevs, reqsop, preop, timeorder=False):
Note: You can also set these attributes in commands.conf.
Add an entry to commands.conf
You must create a commands.conf entry for your command in $SPLUNK_HOME/etc/apps/<app_name>/local/commands.conf. To see all the possible settings in commands.conf, check out the command.conf.spec, in the Admin Manual.
Here is a very basic example that just enables your script:
[<script_name>] filename = mypyscript.py
The stanza name in commands.conf is the name of the search script. You'll use this name to call your script from your search. Also, you must set the 'filename' key, which is the name of the script file. Your script should be in either $SPLUNK_HOME/etc/apps/<app_name>/bin/ or $SPLUNK_HOME/etc/searchscripts. It's best to put your script in the app directory.
Example
# Copyright (C) 2005-2009 Splunk Inc. All Rights Reserved. Version 3.0
import csv
import sys
import splunk.Intersplunk
import string
(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)
if len(sys.argv) < 2:
splunk.Intersplunk.parseError("No arguments provided")
trendInfoList = [] # list of dictionaries of information about trendlines
validTypes = ['sma', 'wma', 'ema']
maxPeriod = 10000
i = 1
while i<len(sys.argv):
# expect argument in format: <type><period>(<fieldname>) [as <newname>]
arg = sys.argv[i]
pos = arg.find('(')
if (pos < 1) or arg[-1] != ')':
splunk.Intersplunk.parseError("Invalid argument '%s'" % arg)
name = arg[0:pos]
field = arg[pos+1:len(arg)-1]
if len(field) == 0 or field[0:2] == '__':
splunk.Intersplunk.parseError("Invalid or empty field '%s'" % field)
trendtype = None
period = 0
try:
for t in validTypes:
if name[0:len(t)] == t:
trendtype = t
period = int(name[len(t):])
if (period < 2) or (period > maxPeriod):
raise ValueError
except ValueError:
splunk.Intersplunk.parseError("Invalid trend period for argument '%s'" % arg)
if trendtype is None:
splunk.Intersplunk.parseError("Invalid trend type for argument '%s'" % arg)
newname = arg;
if (i+2<len(sys.argv)) and (string.lower(sys.argv[i+1]) == "as"):
newname = sys.argv[i+2]
i += 3
else:
i += 1
trendInfoList.append({'type' : trendtype, 'period' : period,
'field' : field, 'newname' : newname,
'vals': [], 'last': None})
if isgetinfo:
splunk.Intersplunk.outputInfo(False, False, True, False, None, True)
# outputInfo automatically calls sys.exit()
results = splunk.Intersplunk.readResults(None, None, False)
for res in results:
# each res is a dict of fields to values
for ti in trendInfoList:
if ti['field'] not in res:
continue
try:
ti['vals'].append(float(res[ti['field']]))
except ValueError:
continue # ignore non-numeric values
if len(ti['vals']) > ti['period']:
ti['vals'].pop(0)
elif len(ti['vals']) < ti['period']:
continue # not enough data yet
newval = None
if ti['type'] == 'sma':
# simple moving average
newval = sum(ti['vals']) / ti['period']
elif ti['type'] == 'wma':
# weighted moving average
Total = 0
for i in range(len(ti['vals'])):
Total += (i+1)*(ti['vals'][i])
newval = Total / (ti['period'] * (ti['period']+1) / 2)
elif ti['type'] == 'ema':
# exponential moving average
if (ti['last'] is None):
newval = ti['vals'][-1]
else:
alpha = float(2.0 / (ti['period'] + 1.0))
newval = (alpha * ti['vals'][-1]) + (1 - alpha) * ti['last']
ti['last'] = newval
res[ti['newname']] = str(newval)
splunk.Intersplunk.outputResults(results)
Answers
Have questions? Visit Splunk Answers to see what questions and answers other Splunk users had about custom search commands.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.