Troubleshoot the Splunk Add-on for CrowdStrike FDR
To troubleshoot your forwarder setup, see "Troubleshoot the forwarder/receiver connection" in the Forwarding Data manual.
Troubleshoot event ingestion
If "Crowdstrike FDR SQS based S3 consumer" is running but you do not see new events appear in your index, try the following to diagnose and mitigate:
- Try to make the search time window larger (Time interval on the right of the search expression area). Set it, for example, to seven days. Since the add-on assigns the events the time of event creation, not the time of ingestion, ingested events can be several days old and will not be seen within the default search time frame.
- Switch search time frame to last 15 or 60 minutes and run the following search:
index="_internal" sourcetype="crowdstrike_fdr_ta*". By default the add-on is configured to log only informational and error messages and this search should show you the latest logs and give you an idea about the Splunk Add-on for CrowdStrike FDRs activities. Here are message examples that you can find when you run this search:
cs_input_stanza=simple_consumer_input://my_input1i, error='aws_error_message='Proxy connection error: Indicates that provided proxy configuration does not allow TA to communicate with CrowdStrike AWS environment. Additional information about proxy settings can be found im log messages like
AWS proxy is disabled, aws_proxy=disabledand
AWS proxy is enabled, aws_proxy=https://*****:*****@proxy.host.fqnd:765
FILE processing summary: cs_input_stanza=simple_consumer_input://my_input1, cs_file_time_taken=223.106, cs_file_path=data/d811c19e-7729-4c9b-abb8-357d539aa4a0/part-00000.gz, cs_file_size_bytes=24178342, cs_file_error_count=0: indicates that one event file 'data/d811c19e-7729-4c9b-abb8-357d539aa4a0/part-00000.gz' of size 24178342 bytes has been ingested by input 'my_input1' during 223.106 seconds with 0 errors occurred during the process.
INGEST |< cs_input_stanza=simple_consumer_input://my_input1, cs_ingest_time_taken=229.321, cs_ingest_file_path=s3://crowdstrike-generated-big-batch-us-west-2/data/d811c19e-7729-4c9b-abb8-357d539aa4a0/part-00016.gz, cs_ingest_total_events=600540, cs_ingest_filter_matches=599705, cs_ingest_error_count=0: indicates that input 'my_input1' consumed S3 bucket file 's3://crowdstrike-generated-big-batch-us-west-2/data/d811c19e-7729-4c9b-abb8-357d539aa4a0/part-00016.gz' for 229.321 seconds and 599705 of total 600540 events in this file matched the filter criteria and have been sent to Splunk index. Pay attention to the number of matching events. If it's 0 for all logged messages, consider checking the selected filter - it can be incorrectly defined.
BATCH processing summary: cs_input_stanza=simple_consumer_input://si1, cs_batch_time_taken=230.002, cs_batch_path=data/d811c19e-7729-4c9b-abb8-357d539aa4a0, cs_batch_error_count=0: Indicates that one event file batch located at 'data/d811c19e-7729-4c9b-abb8-357d539aa4a0' has been ingested by input 'my_input1' during 230.002 seconds with 0 errors occurred during the process.
simple_consumer_input://si1, Skipping batch fdrv2/aidmaster/d811c19e-7729-4c9b-abb8-357d539aa4a0 according to input configuration: Indicates that whole batch 'data/d811c19e-7729-4c9b-abb8-357d539aa4a0' has been skipped because it was configured not to ingest this kind of events. Only inventory events can be skipped like this.
simple_consumer_input://my_input1, Stopping input as EVENT WRITER PIPE IS BROKEN. TA will re-try to ingest failed file after AWS SQS visibility_timeout expires: Indicates that during the file ingestion process the communication with indexers has been broken and input 'my_input1' has to shutdown to be started again by Splunk. In Enterprise Cloud Platform, this often happens when you apply new configuration to a running input or stop an input. The error is triggered when Splunk tries to restart or stop the input in response. If you cannot correlate this error message with corresponding input
enableAQUgure actions, check communication between ingesting host (heavy forwarder, IDM or search head) and indexers
- If none of the above messages appear, try to switch the add-on logging level to DEBUG. - IKGo to the Splunk Add-on for CloudStrike FDR Configuration screen and select the Logging tab. Then select DEBUG in the 'logging level dropdown box and click Save. Restart input to make it use the new logging level. Wait for several minutes to let the add-on log new information. Re-run search
index="_internal" sourcetype="crowdstrike_fdr_ta*". Look for the following messages to make sure that TA can successfully communicate with AWS infrastructure:
<<< aws_error_code=AWS.SimpleQueueService.NonExistentQueue, aws_error_message='The specified queue does not exist for this wsdl version.': indicates that AWS client error has taken place.
aws_error_messagecan vary depending on the exact AWS client issue.
<<< receive_sqs_messages_time_taken=0.940, receive_sqs_message_count=1: Indicates that a request for a new SQS message has been sent and one message was returned. If the value for receive_sqs_message_count is 0 then there are no messages in the SQS queue. Check there are no other consumers getting messages from this SQS queue. Also take into account that CrowdStrike FDR does not create new messages in SQS very often - one SQS message every 7-10 minutes. This means that you may have to wait for a new message to appear.
- Check for the following message:
<<< check_success_time_taken=0.934, found_SUCCESS=True. If the
found_SUCCESSis False, then the event batch referenced by received SQS message will be skipped and no ingestion takes place. To figure out which batch has failed the check look for preceding log message like
>>> check_success_bucket=crowdstrike-generated-big-batch-us-west-2, check_success_bucket_prefix=data/d811c19e-7729-4c9b-abb8-357d539aa4a0
- If you see the message:
<<< download_file_time_taken=7.107, download_file_path=data/d811c19e-7729-4c9b-abb8-357d539aa4a0/part-00023.gz, then the add-on is able to successfully download message files from the S3 bucket.
- If you do not see any add-on logs, run the following search:
index="_internal" ERROR. Look for error messages in the returned logs.
Cannot find the destination field 'ComputerName' in the lookup table: this error can indicate a corruption of a CSV lookup used at index time. This error can prevent ingestion of CrowdStrike events. If you see this error and new CrowdStrike events do not appear in the index, refer to "Recover index time host resolution lookup" below. For Splunk cloud environments contact Splunk support
Recover index time host resolution lookup
CrowdStrike event ingestion can be blocked by corruption of an index time host resolution lookup CSV. As a result of corruption, index time lookup fails with error message
Cannot find the destination field 'ComputerName' in the lookup table logged to _internal prefix. This corruption can be the result of running "Crowdstrike FDR host information sync" input when configured with incorrect source search head or limited user access. In version 1.2.0 of the Splunk Add-on for CloudStrike, additional validation has been added to the "Crowdstrike FDR host information sync" modular input code. This helps to prevent damaging the CSV lookup with bad data received during the sync process. If for some reason CSV lookup table has been corrupted follow the steps below to fix it:
If it's still running, disable the "Crowdstrike FDR host information sync" input.
On each heavy forwarder (IDM or Search head in case of Splunk Cloud environment) locate file
Splunk_TA_CrowdStrike_FDR/lookups/crowdstrike_ta_index_time_host_resolution_lookup.csv' under splunk etc/apps
Download Splunk_TA_CrowdStrike_FDR from splunkbase and unpack it. Locate
lookups/crowdstrike_ta_index_time_host_resolution_lookup.csv and replace the same file at Splunk instance.
Splunk will reload the updated CSV file automatically.
In a heavily-loaded environment, batches can be processed more than once. This can happen when a message is not processed at the expected time or when an input job is interrupted.
Processed message is visible again in SQS queue
Visibility timeout addresses the same SQS message again in case the software that started to process this message is not able to finish the processing or shutdown gracefully. When the processing time for a single batch takes more than the visibility timeout defined for related SQS messages, it becomes visible in the queue again and other jobs can re-ingest the same data. This results in event duplication in the indexer. To mitigate this:
- When you configure the Splunk Add-on for CloudStrike, set the visibility timeout to six hours by default. This value is sufficient to ingest big event batches (300-400 files up to 25MB each) which is specific to heavy-loaded environments with around 10TB of raw event data per day. If your environment has different amounts of raw event data per day, figure out the biggest batch and change visibility timeout proportionally. Maximum allowed value is 12 hours, minimum value is five minutes). Decreasing visibility timeout will make the SQS message return to the SQS queue faster and have more opportunity to be processed until it expires and is removed from queue permanently
- Scale out data collection horizontally by adding additional heavy forwarders (HF)/IDM and use less inputs for each HF/IDM.
Visibility timeout troubleshooting
The Splunk Add-on for Crowdstrike FDR logs information to help you determine if selected visibility timeout is adequate for your environment. To mitigate:
- Run the search:
index="_internal" sourcetype="crowdstrike_fdr_ta-*" "BATCH ingest time taken"to see time taken to ingest event batches. You will see messages like this:
BATCH ingest time taken: 1446.540, data/d811c19e-7729-4c9b-abb8-357d539aa4a0, errors 0. This tells you how much time, in seconds, that it takes to process one event batch. Run this search with sufficient time frame selected, you can select "All time", or find the largest ingest time taken and update the visibility timeout setting an equal or larger size. This will improve the likelihood that all future event batches will be processed within visibility timeout.
- If the add-on finishes processing an event file after visibility timeout has expired, it logs a warning message like this:
ALERT: data/d811c19e-7729-4c9b-abb8-357d539aa4a0/part-00018.gz ingested 233.720 seconds after SQS message visibility timeout expiration. Run the following search to see these messages:
index="_internal" sourcetype="crowdstrike_fdr_ta-*" "seconds after SQS message visibility timeout expiration". Use this maximum value to adjust input visibility time. Consider creating a Splunk alert based on this log message to receive alerts about visibility timeout.
Interrupted input job
When indexer connectivity is lost, messages left after the configured visibility timeout will still available to process. Data ingested so far from the interrupted job is present on the indexer and will be ingested again with the next processing attempt. Try to avoid unnecessary input reconfiguration and establish stable connections between your instances.
Troubleshooting host resolution
search results over crowdstrike:events:sensor sourcetype events you do not see the
aid_computer_name field, then host resolution did not work out. Below are the steps to troubleshoot the host resolution process:
Search time host resolution troubleshooting
- Search for all events with
sourcetype=crowdstrike:inventory:aidmaster. AIDmaster events are used as a source for host resolution. If no events are found then there is no host information ingested and there is nothing to use for host resolution. In that case, check "Crowdstrike FDR SQS based S3 consumer" modinput configuration to make sure AIDmaster events have been chosen for ingestion. If you see AIDmated events, narrow the search and try to find an unresolved record.
- Check the
aid_masterinformation for aid values in events. You should be able to find AIDmaster events with the same aid. If information is missing, then Splunk lacks the information ingested to resolve all host names.
- Use an SPL search to check the lookup
crowdstrike_ta_host_resolution_lookup. Look for the AID inside this lookup.
- Find savedsearch
crowdstrike_ta_build_host_resolution_tableand execute it manually in Splunk Web, then check the lookup again.
Index time host resolution troubleshooting
If you configured and started "Crowdstrike FDR host information sync" input for index time host resolution, run the search
index="_internal" sourcetype="crowdstrike_fdr_ta-inventory_sync_service" for useful messages. Below is a list of log samples pointing to successful and failed host information sync operations:
Inventory collection successfully synced. File size: 677 bytes. Records count: 2. Time taken: 0.5001018047332764 seconds.
Failed to retrieve collection
Inventory collection is not synced as source collection is empty
Inventory collection is not synced as source collection has unexpected formatting
Unexpected error when retrieving kvstore collection
Failed to authenticate to splunk instance
Inventory collection final rewrite failed with error
Always a good idea to check
Even when ingestion of CrowdStrike seems is running smoothly there are several log messages that good to check from time to time:
LineBreakingProcessor - Truncating line because limit of 150000 bytes has been exceeded- indicates that actual CrowdStrike event was longer than the maximum value configured in TRUNCATE setting so the event was truncated. For sensor events TRUNCATE value set to 150000 which was confirmed by CrowdStrike as sufficient to handle all possible sensor events but this can change in future. It's better to search for this message without relation to limit value, for example:
index=* "LineBreakingProcessor - Truncating line because limit of ", to be aware of any case of exceeding the limit
Failed to parse timestamp in first MAX_TIMESTAMP_LOOKAHEAD- another log message that is good to be watched to make sure event timestamps are extracted correctly and there are no unreadable values or datetime format.
Estimate input throughput
Release notes for the Splunk Add-on for CrowdStrike
This documentation applies to the following versions of Splunk® Supported Add-ons: released