Search with the Python SDK
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Search with the Python SDK
Make sure you have authenticated and gotten a session ID.
Create a search
Import necessary modules:
import splunk.search as se
Start a search:
foo = se.dispatch('search error')
Name your search anything. In this example, the search is called foo.
Note: If you are connecting to multiple servers, then you'll also need to provide hostPath and sessionKey parameters as well.
This starts running a search on the Splunk server for events containing the term error. This search is a job handle object called foo. This handle is keyed off of the search job ID that is generated by the server, and is available via foo.id.
A $JOB.id is a numerical value you can use in your web browser to check on the status of a particular job:
https://localhost:8089/services/search/jobs/12345
where 12345 is the ID that you just generated.
There are a few properties on the SearchJob object that will be of immediate use:
-
foo.isDone- a boolean value that indicates if the search has completed. -
foo.count- the number of events that have been matched against the search. -
foo.cursorTime- the current position of the search cursor; when dispatching a search, the cursor moves in a reverse chronological order. -
foo.events- the raw events contained within your search.
Regex and Python
You have to be careful about escaping characters when working with regular expressions in Python,. The correct way to submit your original search is to identify the string as a raw string via the r'<string>' constructor:
splunk.search.dispatch(r'search index=mail sourcetype!=sugarstate startminutesago=1440 | rex "\"from\s+(?![^\.]+\.splunk\.[^\s]+)[^\s]+\s+\(\[(?<clientip>\d+\.\d+\.\d+\.\d+)" | where (clientip NOT LIKE "192.%") AND (clientip NOT LIKE "10.%") AND (clientip > "")')
Note that the string is prefixed with 'r', which follows the python convention for rawstring and unicode construction. See python regex documentation.
Now, in your Splunk searches:
search.dispatch('search foo | rex "this\nthat\"there"')
Python interprets the \n as a literal carriage return and the quote as escaped.
So Splunk registers your search as:
search foo | rex "this that"there"
Note your carriage return has become a space, the middle quote has become "hot", and the regex has become quote-unbalanced.
So you must mark your string as a raw string:
search.dispatch(r'search foo | rex "this\nthat\"there"')
Then Python will pass the string along unprocessed:
search foo | rex "this\nthat\"there"
Work with search results
Th foo.events object works just like a list, and you can iterate and slice it to obtain specific events. The events are stored in reverse chronological order.
for x in job.events:
print x
This code iterates over every event returned in the search and prints out the raw text. The iterator begins returning data as soon as it receives the first event, and continues until the isDone=True.
You can also retrieve specific rows of data using the standard python slice operator:
-
foo.events[2]- returns the 3rd event in the search results. -
foo.events[2:10]- returns events 3 through 10 as a list. -
foo.events[-1]- returns the last event in the results.
The items returned by iterating or slicing are actually result objects that have additional properties:
-
job.events[0].raw- the raw event text (the same value as print job.events[0]) -
job.events[0].time- the event timestamp, as a datetime.datetime object -
job.events[0].fields- a dictionary of all the fields associated with the event
For example if you wanted to see the host field for an event:
job.events[0].fields['host']
Or if you wanted to see all of the host entries for each event:
for x in job.events:
print x.fields['host']
Or alternatively, in shorthand:
for x in job.events:
print x['host']
If you want to print out a human-readable timestamp for events that came from the 'firewall' sourcetype:
for x in job.events:
if x['sourcetype'] == 'firewall':
print x.time.ctime()
When you are finished with the search job, remove it from the server by calling:
job.cancel()
Otherwise, the job will persist on disk until the specified timeout (TTL), which is 24 hours by default.
Examples
The following code authenticates, generates a search and returns a search ID.
from httplib2 import Http
from urllib import urlencode
import xml.dom.minidom as xml
# set variables
endpoint = 'https://localhost:8089'
authURI = endpoint + '/services/auth/login/'
jobURI = endpoint + '/services/search/jobs/'
authData = {'username': 'admin', 'password': 'changeme'}
headers = {}
# initialize our connection handler
h = Http()
# open a connection and do a POST for auth
resp, content = h.request(authURI, "POST", urlencode(authData))
# parse our token out of the response
xmlDoc = xml.parseString(content)
tokenElements = xmlDoc.getElementsByTagName('sessionKey')
if not tokenElements:
print 'No session key found! Are you running the free version?'
tokenElements = xmlDoc.getElementsByTagName('msg')
print 'Reason=%s' % tokenElements[0].firstChild.nodeValue
headers['Authorization'] = ''
else:
sessionKey = tokenElements[0].firstChild.nodeValue
print 'sessionKey=%s' % sessionKey
headers['Authorization'] = 'Splunk %s' % sessionKey
# set up our search job
postargs = { 'search': "search * hoursago=24" }
payload = urlencode(postargs)
# open a connection and do a POST for a new job
resp, content = h.request(jobURI, "POST", headers=headers, body=payload)
print 'server returned code %s.' % resp.status
print content
You should get a job ID returned:
server returned code 201. >>> <?xml version='1.0'?> <response><sid>1213220104.17</sid></response>
The following examples returns results from a remote server.
import splunk.auth
import splunk.search as se
import time
splunk.mergeHostPath('https://foo.example.com:8089', True)
splunk.auth.getSessionKey('admin', 'changeme')
job = se.dispatch('search sourcetype=access_common 404')
print job.isDone
for result in job: print result
This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.