Troubleshoot adaptive response actions in search head cluster deployments on Splunk Cloud Platform
Issue
The adaptive response framework displays error messages on Splunk Cloud Platform (SCP) search head cluster (SHC) deployments when using Common Information model (CIM) Add-on versions 5.0.2 and lower. Errors occur on Splunk Cloud Platform deployments using the CIM Add-on and Splunk Enterprise Security deployments that bundle the CIM Add-on.
If you are a Splunk Cloud Platform customer, you can configure your Splunk Cloud Platform Enterprise Security search head with an API key, which allows you to authenticate from the KV Store collection and Common Action Model (CAM) queue. The CAM adaptive response relay worker is installed on-prem and configured to communicate with Splunk Cloud Platform using the Common Information Model. For more information, see Configure your Splunk Cloud Platform ES search head with an API key.
The on-prem CAM relay worker runs every 60 seconds on the Splunk Cloud Platform CAM queue and checks whether an alert action exists in the queue or not. if an alert action exists in the CAM queue, the CAM relay worker runs the alert action. The adaptive response framework displays "500 Server Error" messages when connecting to Splunk Cloud from the on-prem CAM relay worker.
For example:
2022-07-15 09:52:59,874+0000 ERROR pid=16227 tid=MainThread file=relaymodaction.py:run:328 | Failed to fetch results: 500 Server Error: Internal Server Error for url: https://customer-gsoc.splunkcloud.com:8089/services/alerts/modaction_queue/peek/LOG-HF09.mycustomer.com@@cff33f3c137b6af7faecc825381fdeb73841964d
Adaptive response action errors cause a delay between the time when the alert is sent to the queue and the time when the on-prem CAM relay worker dequeues the alert. For example, If an on-prem CAM relay worker tries to connect to Splunk Cloud every 60 seconds and there is an 18 minute delay, , this implies that the CAM relay worker can connect to Splunk Cloud Platform successfully only after 18 attempts.
The following architectural diagram depicts the process workflow for adaptive response actions in a search head cluster deployment on Splunk Cloud Platform:
Cause
The connection between the modular action relay heavy forwarder and the Cloud stack causes the adaptive response framework failures within a search head cluster Cloud environment. When configuring the modular action relay, the remote search head URI is set using the following format:protocol://servername:port
, which was initially intended to be the URL of a single search head.
In a search head cluster environment, this connection setting cannot be assigned to a static member within the search head cluster as all search head cluster members can generate adaptive response actions at any time. If the remote URI is set to a single search head within the search head cluster, it results in a failure because the remote relay can only process actions that are related to the search results on the static search head member of the cluster.
Search head cluster environments on Splunk Cloud Platform provide an alternative to designating a static search head. All cloud stacks are accessible using a load-balanced stack URL. Requests to this URL can be redirected to any member within the search head cluster. Typically, this stack URL is assigned as the remote search head URI on the modular action relay. When the URI is set to this generic stack URL, the modular action relays requests using the load balancer. If the load balancer redirects the request to a member of the search head cluster that did not initiate the adaptive response action, the fetch request for search results fails.
Solution
Ensure that the modular action relay's heavy forwarder requests get directed to the appropriate member in the search head cluster, which initiates the adaptive response action. The search head that initiates the adaptive response action has the search results related to the adaptive response action.
Adaptive response actions are created using searches that use the following format:... | sendalert <ar-action-command>
.
These adaptive response actions are queued to the CAM queue and KV Store collection. Each entry contains a payload of an adaptive response action.
Following is an example of the payload for an adaptive response action:
{ "app": "search", "owner": "admin", "results_file": "/opt/splunk/var/run/splunk/dispatch/scheduler__admin__search__RMD510d9054342d784cd_at_1664755380_283_E007D213-8F37-44C9-9663-8393A9765418/sendalert_temp_results.csv.gz", "results_link": "https://important-impala-mym.stg.splunkcloud.com:443/app/search/@go?sid=scheduler__admin__search__RMD510d9054342d784cd_at_1664755380_283_E007D213-8F37-44C9-9663-8393A9765418", "search_uri": "/servicesNS/admin/search/saved/searches/danny-2", "server_host": "sh-i-0a554cea1f83c1c7e", "server_uri": "https://127.0.0.1:8089", "session_key": "KEqwK4a44mUOAQk_apYg3pH4ePQvgRQDK9dWeTGr3K69HWqLWIhkR8RmAVsphDt04AyV9W^HnjUsy5hHV5Zq1H28fLyM6r5Zbq8EkmMOFO^25uxR_9e5rDfra1tFQMyloEu76l7sCKs0IlVkp7YNmzmA0qHWuaoa3f3pXkDTgtImLzURXgJTnl5qYh3Js6XA3sYYsvw_qEfGQGL8DP_rfkEuIV9C8EGwAmwTYnL3pC", "sid": "scheduler__admin__search__RMD510d9054342d784cd_at_1664755380_283_E007D213-8F37-44C9-9663-8393A9765418", "search_name": "danny-2", "configuration": { "_cam": "{\n \"category\": [\"Information Gathering\"],\n \"task\": [\"scan\"],\n \"subject\": [\"device\"],\n \"technology\": [{\"vendor\": \"Operating System\", \"product\": \"Utility\"}],\n \"supports_adhoc\": true,\n \"supports_cloud\": true,\n \"supports_workers\": true,\n \"field_name_params\": [\"param.host_field\"],\n \"required_params\": [\"param.host_field\"]\n}", "_cam_workers": "[\"hf1\"]", "host_field": "src", "index": "main", "max_results": "5", "verbose": "0" } }
In this example, consider the following fields:
results_link
server_host
.
The URL in the results_link
field is used by the modular action relay directly to retrieve the related search results for the adaptive response actions. In search head cluster environments on Splunk Cloud Platform, the URL in the results_link
field typically directs to the Cloud stack's generic URL such as https://important-impala-mym.stg.splunkcloud.com
.
The server_host
field contains the search head on which the adaptive response action originates such as sh-i-0a554cea1f83c1c7e
The URL in the results_link
field shares the same domain name as the URI for the modular action relay's remote search head.
To ensure that the modular action relay's heavy forwarder requests get directed to the appropriate member in the search head cluster, the URL for the search head must be a combination of the server_host
and the results_link
fields. This URL is included in the Splunk_SA_CIM/bin/relaymodaction.py
file:
For example:
https://sh-i-0a554cea1f83c1c7e.important-impala-mym.stg.splunkcloud.com:443/...
On the remote heavy forwarder, update the Splunk_SA_CIM/bin/relaymodaction.py
file within the Common Information Model Add-on by deploying a patch that expects the domain name within the URL of the results_link
field to be the same as the domain name used in the remote search head URI setting for the relay modular action.
For example:
- Results link URI:
https://important-impala-mym.stg.splunkcloud.com:443/app/search/@go?sid=scheduler__admin...
- Remote Search Head URI:
https://important-impala-mym.stg.splunkcloud.com
Deploy the patch
The example in these steps reproduces an environment that uses the default adaptive response command set such as the ping
command.
See also:
Set up an Adaptive Response relay from a Splunk Cloud Platform Enterprise Security search head to an on-premises device
Prerequisite
- Ensure that the modular action relay is disabled on the heavy forwarder.
Follow these steps to deploy the patch on the remote heavy forwarder:
- In the heavy forwarder's file system, add the patch file:
Splunk_SA_CIM/bin/relaymodaction.py
- Check the CAM queue using the Lookup Editor to ensure that unprocessed adaptive response actions are available.
- Enable the modular action relay to restart processing.
- Run the following search on a member in the search head cluster to check on the processing status of the modular action relay.
index=_internal source="/opt/splunk/var/log/splunk/python_modular_input.log"
This displays log entries in the following format:
<timestamp> INFO pid=3953 tid=MainThread file=relaymodaction.py:fetch_results:172 | Successfully fetched results_file content for action with key hf1@@cb57310f74a54f72fb695c5962e45801998b88ba for worker hf1
Each successful fetch entry displays a successful dequeue.
When the modular action relay is run successfully a few times, the number of entries for adaptive response actions in the CAM queue should decrease and the queue must be empty when the runs are completed.<timestamp> pid=3953 tid=MainThread file=relaymodaction.py:dequeue:200 | Successfully dequeued action with key hf1@@cb57310f74a54f72fb695c5962e45801998b88ba for worker hf1
- Run the following search on each member of the search head cluster using the Search app to generate adaptive response actions:
| search | sendalert ping | makeresults count=1 | eval src="10.20.30.40",user="SPLUNKTEST" | sendalert ping param.index=main param._cam_workers="[\"hf1\"]" param.max_
Alternatively, you can convert the search into a scheduled saved search to automatically generate results.
Support and resource links for the Splunk Common Information Model Add-on | How to use the CIM data model reference tables |
This documentation applies to the following versions of Splunk® Common Information Model Add-on: 5.0.2, 5.1.0
Feedback submitted, thanks!