Data access automation API

Splunk Phantom's Automation API allows security operations teams to develop detailed and precise automation strategies. Playbooks can serve many purposes, ranging from automating minimal investigative tasks that can speed up analysis to large-scale response to a security breach. The following APIs are supported to leverage the capabilities of the platform using Playbooks.

collect

 phantom.collect(container, #this can be a container or an action results object
                datapath,
                scope='new',
                limit=100,
                none_if_first=False)

This API allows users to collect or gather any information from the associated Artifacts of a Container or action results that you get in the action callback or via the get_action_results() API.

For example, to obtain a listing of all IP addresses or all file hashes across all Artifacts, you can use this API, by specifying the appropriate data path into the Artifact JSON. Or for extracting all the 'country_iso_code' from the action results of action 'geolocate ip', you can use this API by just passing in the 'results' object. You can specify either one datapath as a string for the information you want to extract from action results or you can specify more than one datapaths in a list of datapath strings.

Parameter	Description
container	The container that is available to the user in on_start() or any action callback or in on_finish() or this can be a results object that you get in the action callback or via the get_action_results() API.
datapath	The path of the element in the JSON schema to be able to access/retrieve it from associated Artifacts of a Container or the action results object. Example datapaths for a Container: Collect all fileHashes from all the artifacts of a container: phantom.collect(container, "artifact:.cef.fileHash") Collect all fileHashes of a specific type (events) of a container: phantom.collect(container, "artifact:events.cef.fileHash") You can specify a substring to be searched across matching artifact types. The substring only applies to artifact type. phantom.collect(container, "artifact:event.cef.fileHash") This will find file hashes across all the artifacts that have "event" as a substring. Hence "FW Events", "Network Events" should match but not "Alert". Example datapaths for action results: These are exactly as specified in each app's action 'Action Output' section. Extract longitude from the results of the 'gelocate ip' action: phantom.collect(results, "action_result.data..longitude") Extract the number of positive detections from the results of the 'file reputation' action via the VirusTotal app: phantom.collect(results, "action_result.data..positives") Extract 3 items from action results of the 'file reputation' action via the 'Reversing Labs' app: def file_reputation_cb(action, success, container, results, handle): paths = ['action_result.data..status', 'action_result.parameter.hash', 'action_result.summary.positives'] data = phantom.collect(results, paths) phantom.debug(data) Example output (list of lists): [ [ "MALICIOUS", "70FEEC581CD97454A74A0D7C1D3183D1", 26 ] ] If the datapath was specified as a string the result is a list (NOT a list of lists) unlike the above output. data = phantom.collect(results, 'action_result.parameter.hash') phantom.debug(data) Example output (list): [ "70FEEC581CD97454A74A0D7C1D3183D1", ] If you specify a list of datapaths for extracting data from action results, the results will be a table, where each column represents the respective datapath. If you specify a single datapath as a string Phantom will simplify and return just the data corresponding to one column.
scope	This OPTIONAL parameter (default = 'new') defines if the data has to be collected from artifacts over what range of time window. 'scope' can be new, which implies that the information has to be collected only from 'new' artifacts since the playbook last ran on that container. all scope implies that the information has to be collected from 'all' artifacts belonging to this container. An active playbook runs on a container after it has been created and every time new artifacts are added to the container. 'scope' is especially useful when you want every instance of playbook run to only process 'new' artifacts that have been posted/added to the container. However every time you modify the playbook, it is considered a new playbook and hence the playbook execution will start with all artifacts in the container till that instance in time and after that the 'scope' parameter will only collect what has been added to the container after the previous instance of playbook execution. Please see the following two parameters to further alter/control the behavior of this API
limit	This OPTIONAL parameter enforces the maximum number of artifacts that can be retrieved in this call. Default is 100.
none_if_first	When the collect call is executed from a playbook for the first time on a container, even with scope='new', it will collect all the artifacts since the container was created. This parameter allows you to change the behavior of the collect call executed for the first time from this playbook on a container. Use this parameter to specify whether you would like the playbook to collect all artifacts since the container was created, or only those added since the first time the playbook was executed on the container. If you would like the playbook to not get any existing artifacts the first time it is run on the container, specify True for this parameter. Then, on subsequent runs, it will only get the artifacts added since the first run.

collect2

 phantom.collect2(container=None,
             action_results=None,
             action_name=None,
             datapath=None,
             filter_artifacts=None,
             tags=None,
             scope='new',
             limit=100,
             trace=False)

This is an extension of the phantom.collect() API. It adds the filter_artifacts parameter, a list of artifacts whose values will be returned.

Parameter	Required?	Description
container	Required	This is the container dictionary object that is passed to the playbook across various functions.
action_results	Optional, unless `action_name` is not provided.	These are the action results passed into any callback function, or a subset of action results that had been filtered from a `phantom.condition()` call.
action_name	Optional, unless `action_results` is not provided.	This is the custom 'name' specified for the action in the the phantom.act() API. This allows action results to be returned based on the action name.
datapath	Required	A list of datapaths. A datapath is the path of the element in the JSON schema to be able to access/retrieve it from associated action results or artifacts. Please refer to the phantom.collect() API for examples.
filter_artifacts	Optional	These are ids of artifacts that were returned from a `phantom.condition()` call.
tags	Optional	A list of tags used to further filter artifacts.
scope	Optional	Scope of artifacts to retrieve, defaults to 'new'. Please refer to the `phantom.collect()` API for more details.
limit	Optional	Maximum number of results to be returned. Please refer to phantom.collect() API documentation.
trace	Optional	Set this parameter to 'True' for verbose debugging of the API call.

collect_from_contains

 phantom.collect_from_contains(container=None,
                              action_results=None,
                              contains=None,
                              tags=None,
                              scope=None,
                              filter_artifacts=None,
                              include_params=True,
                              limit=None,
                              trace=False)

This API is meant to function similarly to collect, but instead of needing to know the datapaths for the values you want, you instead provide a contains value. This will return a flat list of all the unique values which match at least one contains in the list. Returns None on failure.

Parameter	Required?	Description
container	Optional, unless `action_results` is not provided.	Container, passing this will search for contains in the CEF values of that container.
action_results	Optional, unless `container` is not provided.	Action result, like what is passed to a callback from `phantom.act()` (as 'result'). Search for values matching the contains in this action result.
contains	Required	A list of contains to filter by.
tags	Optional	A list of tags used to further filter artifacts.
filter_artifacts	Optional	These are ids of artifacts that were returned from a `phantom.condition()` call.
include_params	Optional	If set to false, ignore values with matching contains if they are a parameter to an action. This value is only used if an action_result is passed in.
scope	Optional	Scope of artifacts to retreive, defaults to 'new'. Please refer to the `phantom.collect()` API for more details. This value is only used if a container is provided.
limit	Optional	Maximum number of artifacts to match. This value is only used if a container is provided.
trace	Optional	Set this parameter to 'True' for verbose debugging of the API call.

import phantom.rules as phantom

def geolocate_ip(action, success, container, results, handle):

    # We have already created various artifacts for this event
    collected_ips = phantom.collect_from_contains(container=container, contains=["ip"])
    # [ "8.8.8.8", "8.8.4.4", "1.1.1.1", ... ]

    parameters = []
    for ip in collected_ips:
        parameters.append({
            'ip': ip
        })

    phantom.act("geolocate ip", parameters=parameters, app={ "name": "MaxMind" }, name="geolocate_ip")
    return

def collect_from_action_result(results):

    return phantom.collect_from_contains(action_results=results, contains=["url", "domain"])

get_action_results

 phantom.get_action_results(action=None,
                           action_run_id = 0,
                           app_run_id = 0,
                           result_data=True,
                           action_name=None,
                           playbook_run_id=0,
                           flatten=True)

This is an API supported for the purposes of letting the user retrieve the action results at any time using the action json that was given in the action callback or the action_run_id that was in the action json. The API call get_summary() also returns one or more app_run_id(s) that can be passed in as the optional parameter.

Parameter	Required?	Description
action	Optional, unless `action_run_id` and `app_run_id` are not provided.	Action json object provided in the action callback. Using this provides the action results from the action that completed and triggered the callback function.
action_run_id	Optional, unless `action` and `app_run_id` are not provided.	ID of the action run. Using this allows you to obtain action results from any completed action runs from the current playbook. 'action_run_id' can be obtained from the above noted action json object or by calling the phantom.get_summary() API which enumerates all the actions that were executed in the playbook.
app_run_id	Optional, unless `action` and `action_run_id` are not provided.	ID of the app run. This 'app_run_id' can be obtained by calling the phantom.get_summary() API which enumerates all the actions that were executed in the playbook.
result_data	Optional	This is a boolean parameter, default is True. If the user does NOT need to obtain the full action results (which in some cases can be a lot of data) but just summary information, this parameter should be specified as False.
action_name	Optional	This is the unique name provided to an action execution via the phantom.act() parameter 'name'.
playbook_run_id	Optional	This is the playbook run id that uniquely identifies the playbook execution instance. Default value of 0 implies the current playbook execution instance.
flatten	Optional	boolean. Default=True. An action can be executed on more than one asset and for many sets of parameters. Flattening provides a result dictionary object for each combination of asset and parameter even if many parameters were used in a single action. Setting this variable to False generates results as provided in action callbacks or when viewing the action results in Investigation widgets.

A single phantom.act() API call can be executed on multiple sets of parameters on more than one asset. Each instance of phantom.act() call is identified by a unique 'action_run_id'. One action execution on each asset results in a corresponding app execution, each of which is identified by a unique 'app_run_id'. Parameters of an action execution via each app (on their respective asset) can be part of the same app run.

import phantom.rules as phantom
import json

def collect_params(container, datapath, key_name):
    params = []
    items = set(phantom.collect(container, datapath, scope='all'))
    for item in items:
        params.append({key_name:item})
    return params

def on_start(container):

    parameters = collect_params(container, 'artifacts:*.cef.sourceAddress', 'ip')
    phantom.act('geolocate ip', parameters=parameters, name='my_geolocate_ip')

    return

def on_finish(container, summary):

    summary_json = phantom.get_summary()
    if 'result' in summary_json:
        for action_result in summary_json['result']:
            if 'action_run_id' in action_result:
                action_results = phantom.get_action_results(
                                    action_run_id=action_result['action_run_id'],
                                    result_data=False, flatten=False)
                phantom.debug(action_results)


    return

The return value of this API is a list of JSON dictionaries; a dictionary per app run (which runs an instance for each asset that was used to run the action on) that has the 'action_results'..

NOTE that the action_result JSON object shown below is generated with parameters result_data=False and flatten=False sent to get_action_results() API in the playbook shown above. If the parameter result_data was specified as True, the dictionaries in the 'action_results' list would also have to include 'data' that has the full action result information. Setting the flatten parameter to True generates the same data but nested 'action_results' data lists are reorganized to have a rather flat hierarchy with a list of higher level objects. This is primarily for backward compatibility.

[

    {
        "asset_id": 237,
        "status": "success",
        "name": "my_geolocate_ip",
        "app": "MaxMind",
        "action_results": [
            {
                "status": "success",
                "message": "Country: France",
                "parameter": {
                    "ip": "2.2.2.2",
                    "context": {...}
                },
                "summary": {
                    "country": "France"
                }
            },
            {
                "status": "success",
                "message": "Country: Australia",
                "parameter": {
                    "ip": "1.1.1.1",
                    "context": {...}
                },
                "summary": {
                    "country": "Australia"
                }
            }
        ],
        "app_id": 42,
        "app_run_id": 1076,
        "asset": "maxmind",
        "action": "geolocate ip",
        "message": "'my_geolocate_ip' on asset 'maxmind': 2 actions succeeded... ",
        "summary": { ...},
        "action_run_id": 1083
    }
]

get_extra_data

phantom.get_extra_data(action, action_run_id, app_run_id)

This is an API supported for the purposes of letting the user retrieve the extra data retrieved during an action execution. In some cases the action result is too huge/large and moving it around in the UI or showing to the users all the time does not help. Hence app authors can choose to store larger amounts of data in "extra_data" which can then be retrieved on an on-demand basis via this API in the playbooks. You can specify action, action_run_id, or app_run_id as a key to obtain the data.

Parameter	Rquired?	Description
action	Optional	Action JSON object provided in the action callback. Using this provides the extra data from the action that completed and triggered the callback function.
action_run_id	Optional	ID of the action run. Using this allows you to obtain extra data from any completed action runs from the current playbook. 'action_run_id' can be obtained from the above noted action json object or by calling the phantom.get_summary() API which enumerates all the actions that were executed in the playbook.
app_run_id	Optional	ID of the app run. This 'app_run_id' can be obtained by calling the phantom.get_summary() API which enumerates all the actions that were executed in the playbook.

import phantom.rules as phantom
import json

def domain_reputation_cb(action, success, container, results, handle):
    if not success:
        return
    extra_data = phantom.get_extra_data(action)
    phantom.debug("Testing extra data: ")
    phantom.debug(extra_data)
    return


def on_start(container):
    phantom.act('domain reputation', parameters=[{ "domain" : "bjtuangouwang.com" }], assets=["passivetotal"], callback=domain_reputation_cb)
    return

def on_finish(container, summary):
    phantom.debug("Summary: " + summary)
    return

NOTE: At least one parameter MUST be specified. The return value of this API is a list of JSON dictionaries that has the action results along with extra data.

[
  {
    "asset_id": 7,
    "extra_data": [
      {
        "status": "success",
        "extra_data": [{...}],
        "parameter": {}
      }
    ],
    "asset": "passivetotal"
  }
]

get_filtered_data

phantom.get_filtered_data(name=None)

This API allows users to retrieve the filtered data that was saved via phantom.condition(). In the API phantom.condition(), if the 'name' was specified, the filtered data (filtered action results and/or filtered artifact IDs) is saved under the specified key and the same key can then be used to later retrieve the data.

Parameter	Required?	Description
name	Required	This parameter is the same name that was used in the name parameter of phantom.condition() to save the filtered action results and filtered artifacts

The return value of this API is a tuple, filtered_action_results and filtered_artifacts.

import phantom.rules as phantom
import json
from datetime import datetime, timedelta

...

def filter_1(action=None,
             success=None,
             container=None,
             results=None,
             handle=None,
             filtered_artifacts=None,
             filtered_results=None):

    # collect filtered artifact ids for 'if' condition 1
    matched_artifacts_1, matched_results_1 = phantom.condition(
        container=container,
        action_results=results,
        conditions=[
            ["geolocate_ip_1:action_result.data.*.country_iso_code", "!=", "UK"],
            ["artifact:*.cef.bytesIn", "!=", 99],
        ],
        logical_operator='or',
        name="filter_1:condition_1")

...

def on_finish(container, summary):

    filtered_results, filtered_artifacts = phantom.get_filtered_data(name="filter_1:condition_1")

get_format_data

phantom.get_format_data(name=None)

This is an API supported for the purposes of retrieving data saved via the phantom.format() API. If the user had specified the 'name' parameter value in the phantom.format() API, the name can be used to retrieve the data to be used later. For sample usage, please refer to the phantom.format() API documentation.

get_raw_data

phantom.get_raw_data(container)

This API lets the user retrieve container raw data as it exists at the source. This allows users to access and automate on raw data in cases where there is information that was not parsed into artifacts.

Parameter	Description
container	This is the JSON container object as available in `on_start`, all callbacks, or `on_finish()` functions

import phantom.rules as phantom
import json


def on_start(container):
    raw_data = phantom.get_raw_data(container)
    phantom.debug(raw_data)
    return

def on_finish(container, summary):
    return

get_raw_data pulls raw data from container ["data"], and is often used to store raw emails and the ticketing tools raw data from on_poll. When pulling data, get_raw_data specifically uses the ["data"] section of the container to do so. This is shown in the following example:

phantom.debug(phantom.get_raw_data(container))
phantom.update(container, {"data": {"this": "is a test"}})
phantom.debug(phantom.get_raw_data(container))
in a custom block on a container that does not leverage container['data'].  

The output:
Wed May 13 2020 11:08:38 GMT-0600 (Mountain Daylight Time): phantom.get_raw_data(): called for playbook run '39792' and container id: '9420'
Wed May 13 2020 11:08:38 GMT-0600 (Mountain Daylight Time): {}
Wed May 13 2020 11:08:39 GMT-0600 (Mountain Daylight Time): successfully updated container(id: 9420)
Wed May 13 2020 11:08:39 GMT-0600 (Mountain Daylight Time): phantom.get_raw_data(): called for playbook run '39792' and container id: '9420'
Wed May 13 2020 11:08:39 GMT-0600 (Mountain Daylight Time): {"this": "is a test"}

get_apps

phantom.get_apps(action, asset, app_type)

This is an API supported for the purposes of letting the user enumerate all the apps installed on the system for each of the actions. The call returns a flat listing of all actions and apps with matching criteria.

Parameter	Description
action	The name of the action like 'block ip'. Allows users to retrieve information about assets that support the action 'block ip'.
asset	The Asset name that allows users to retrieve only those apps that match the specified asset.
app_type	Allows users to retrieve only those apps that match the specified type of the app. Types are like 'reputation', 'information', etc.

def on_start(container):
    apps=[]
    apps = phantom.get_apps()
    phantom.debug(apps)
    apps=phantom.get_apps(action='file reputation')
    phantom.debug(apps)
    apps = phantom.get_apps(asset='my_smtp_asset')
    phantom.debug(apps)
    apps = phantom.get_apps(app_type='information')
    phantom.debug(apps)

    return

All of these parameters are optional, if the user does not specify any parameter, all the configured apps in the system are retrieved.

The return value of this API is a list of JSON dictionaries that have the following schema:

[
  {
    "asset_disabled": false,
    "product_version_match": true,
    "app_type": "sandbox",
    "product_vendor": "Cuckoo",
    "product_name": "Cuckoo",
    "app_match_product_version": ".*",
    "asset_name": "cuckoo",
    "ap_name": "Cuckoo",
    "action": "detonate file",
    "app_version": "1.2.8",
    "asset_product_version": "",
    "asset_type": "sandbox"
  },
  ...

]

get_assets

phantom.get_assets(action=None, tags=None, types=None)

As explained in the phantom.act() API, users can either specify a specific asset on which the action has to be executed or not specify an asset and the action will be executed on all possible assets.

This API, allows users with programmatic access to assets setup in the system.

Parameter	Description
action	The name of the action like 'block ip'. Allows users to retrieve information about assets that support the action 'block ip'.
tags	A list of 'tags' that allows users to retrieve only those assets that have been tagged with the specified keyword.
types	A list of 'types' of assets that must be used to retrieve the specific assets

def on_start(container):
    assets = phantom.get_assets()
    phantom.debug(assets)
    assets = phantom.get_assets(action='file reputation')
    phantom.debug(assets)
    assets = phantom.get_assets(types=['reputation service'])
    phantom.debug(assets)
    return

Since all of these parameters are optional, if the user does not specify any parameter, all the configured assets in the system are retrieved.

The return value of this API is a list of JSON dictionaries that have the following schema:

[
  {
    "description": "VirusTotal",
    "tags": [],
    "product_vendor": "VirusTotal",
    "product_version": "Private 2.0",
    "product_name": "VirusTotal",
    "disabled": true,
    "version": 1,
    "type": "reputation service",
    "id": 11,
    "name": "virustotal_private"
  },
  ...
]

get_container

json_object = phantom.get_container(container_id)

This API is used to retrieve the JSON for a container (as a Python object). See the Containers REST documentation for more details.

Parameter	Required?	Description
container_id	Required	The ID of the container to fetch.

Example usage:

def on_start(container):

    cdata = phantom.get_container(container['id'])
    phantom.debug('Container Data: {}'.format(cdata))

    return

get_parent

json_object = phantom.get_parent_handle()

This API is used to retrieve the 'handle' that has been set in the 'phantom.playbook()' API call (synchronous mode) in the parent / caller playbook. This API can be called from anywhere (on_start(), on_finish() or any other function) in the child playbook.

This API only works when the parent calls the child playbook in synchronous mode. See phantom.playbook() API for more details on how to call the playbooks in synchronous mode.

Example usage:

In the PARENT playbook...

	some_handle="some_handle from parent pb"
	# 'some_handle' is now passed to the child playbook via the handle parameter.
    playbook_run_id = phantom.playbook("local/child_pb", container=container, name="playbook_local_child_pb_1", callback=decision_1, handle=some_handle)

In the CHILD playbook...

def on_start(container):

    handle_from_parent=phantom.get_parent_handle() # this call can be done from any function of the child playbook

    phantom.debug("handle sent by parent playbook: {}".format(handle_from_parent))

    return

get_playbook

phantom.get_playbook_info()

This is an API to retrieve the current playbook's information such as id, run id, name, repo, and parent_playbook_run_id (if this playbook was executed from another playbook) and the running playbook's effective user id.

The return value of this API is a list containing a single dictionary.

[{
	'parent_playbook_run_id': '0',
	'name': 'test_plabook',
	'run_id': '37',
	'scope_artifacts': [],
	'scope': 'new',
	'id': '562',
	'repo_name': 'local',
    'effective_user_id':5
}]

get_summary

phantom.get_summary()

This is an API supported for the purposes of letting the user retrieve the summary of the playbook execution in a json format.

import phantom.rules as phantom
import json

def on_start(container):
    phantom.act('geolocate ip', parameters=[{ "ip" : "1.1.1.1" }])
    return

def on_finish(container, summary):
    summary_json = phantom.get_summary()
    phantom.debug(summary_json)
    return

The return value of this API is a list of JSON representation of the playbook execution

{
    "status": "success",
    "message": "",
    "result": [
        {
            "status": "success",
            "close_time": "2016-02-11T06:45:22.005343+00:00",
            "app_runs": [
                {
                    "asset_id": 40,
                    "status": "success",
                    "app": "MaxMind",
                    "app_id": 27,
                    "app_run_id": 224,
                    "asset": "maxmind",
                    "action": "geolocate ip",
                    "summary": "Country: Australia",
                    "parameter": "{\"ip\": \"1.1.1.1\"}",
                    "action_run_id": 104
                }
            ],
            "create_time": "2016-02-11T06:45:20.917+00:00",
            "action": "geolocate ip",
            "message": "1 action succeeded",
            "type": "investigate",
            "id": 104
        }
    ],
    "playbook_run_id": 167
}

parse_errors, parse_success, parse_result

phantom.parse_errors(action_results)
phantom.print_errors(action_results)
phantom.parse_success(action_results)
phantom.parse_results(action_results)

Parsing action_results. These APIs allows the users to pass in the action_results directly from callback into these helper routines to be able to conveniently access data.

API	Description
parse_errors()	This API collects all the errors and returns to the user errors per asset and per parameter.	=	print_errors()	This API is a convenient way to just quickly dump any errors, if there are any in the action_results.
parse_success()	Processes the action_results are removes any records that had errors, so that the user can conveniently and confidently access only results of successful actions on respective asset and parameter.
parse_results()	Processes the action_results and transforms the contents to be organized by success and failed categories. NOTE: Please review the phantom.collect() API before using, as these convenience APIs have very limited use scenarios.

set_parent_handle

phantom.set_parent_handle()

This API is used to set the 'handle' from the synchronously called / executed child playbook that is then accessed in the parent playbook via the handle parameter of the callback function.

This API only works when the parent calls the child playbook in synchronous mode. See phantom.playbook() API for more details on how to call the playbooks in synchronous mode.

NOTE: The last call to set_parent_handle will overwrite the handle being sent to the callback function. So in a parent playbook, if there is a join block where two child playbooks called synchronously are joining to a callback, the value of handle in the callback depends on which child playbook called the set_parent_handle last.

Example usage:

In the CHILD playbook...

some_handle="some_handle from child pb"
phantom.set_parent_handle(some_handle)

In the PARENT playbook...

def playbook_callback(..., handle=None, ...):

    phantom.debug("handle sent by child playbook: {}".format(handle))

    return

Data access automation API

collect

collect2

collect_from_contains

get_action_results

get_extra_data

get_filtered_data

get_format_data

get_raw_data

get_apps

get_assets

get_container

get_parent

get_playbook

get_summary

parse_errors, parse_success, parse_result

set_parent_handle

Comments

Data access automation API

Was this topic useful?