Troubleshoot the Splunk Add-on for AWS
Health Check Dashboards
You can choose the dashboards from the Health Check menu to troubleshoot data collection errors and performance issues.
The Health Overview dashboard gives you an at-a-glance view of data collection errors and performance metrics for all input types:
- Errors count by error category
- Error count over time by input type, host, data input, and error category
- Throughput over time by host, input type, and data input
The S3 Health Details dashboard focuses on the generic, incremental, and SQS-based S3 input types and provides indexing time lag and detailed error information of these multi-purpose inputs.
You can directly access internal log data for help with troubleshooting. Data collected with these source types is used in the Health Check dashboards.
|Data source||Source type|
Configure log levels
- Click Splunk Add-on for AWS in your left navigation bar on the Splunk Web home page.
- Click Configuration in the app navigation bar.
- Click the Logging tab.
- Adjust the log levels for each of the AWS services as needed by changing the default of
INFOto one of the other available options,
These log level configurations apply only to runtime logs. Some REST endpoint logs from configuration activity log at DEBUG, and some validation logs log at ERROR. These levels cannot be configured.
Problem saving during account or input configuration
If you experience errors or trouble saving while configuring your AWS accounts on the setup page, go to
$SPLUNK_HOME/etc/system/local/web.conf and change your timeout settings as shown below.
[settings] splunkdConnectionTimeout = 300
Problems deploying with a deployment server
If you use a deployment server to deploy the Splunk Add-on for Amazon Web Services to multiple heavy forwarders, you must configure the Amazon Web Services accounts using the Splunk Web setup UI for each instance separately, because the deployment server does not support sharing hashed password storage across instances.
S3 input performance issues
You can configure multiple S3 inputs for a single S3 bucket to improve performance. The Splunk platform dedicates one process for each data input, so provided that your system has sufficient processing power, performance improves with multiple inputs. See Performance reference for the S3 input in the Splunk Add-on for AWS.
Be sure that the S3 key names in multiple inputs against the same bucket do not overlap, to prevent indexing duplicate data.
S3 key name filtering issues
For example, the deny and allow list matches the full key name, not just the last segment.
For more help with regex:
- Watch the video in this blog post: http://blogs.splunk.com/2008/10/22/all-my-regexs-live-in-texas/.
- Read "About Splunk regular expressions" in the Knowledge Manager Manual, part of the Splunk Enterprise documentation.
S3 event line breaking issues
If your indexed S3 data has incorrect line breaking, configure a custom source type in
props.conf to control how the lines break for your events.
If S3 events are too long and get truncated, set
TRUNCATE = 0 in
props.conf to prevent line truncating.
More more information, see Configure event line breaking in the Getting Data In manual, part of the Splunk Enterprise documentation.
CloudWatch configuration issues
If you have a high volume of CloudWatch data, search
index=_internal Throttling to determine if you are experiencing an API throttling issue. If you are, contact AWS support to increase your CloudWatch API rate. You can also decrease the number of metrics you collect or increase the granularity in order to make fewer API calls.
If the granularity of your indexed data does not match your expectations, check that your configured granularity falls within what AWS supports for the metric you have selected. Different AWS metrics support different minimum granularities, based on the sampling period that AWS allows for that metric. For example, CPUUtilization has a sampling period of 5 minutes, whereas Billing Estimated Charge has a sampling period of 4 hours.
If you configured a granularity that is less than the sampling period for the selected metric, the reported granularity in your indexed data reflects the actual sampling granularity but is labeled with your configured granularity. Clear the
local/inputs.conf cloudwatch stanza with the problem, adjust the granularity configuration to match the supported sampling granularity so that newly indexed data is correct, and reindex the data.
CloudTrail data indexing problems
If you are not seeing CloudTrail data in the Splunk platform, follow this troubleshooting process.
- Review the internal logs by searching for:
- Check to see if the Splunk platform is connecting to SQS successfully by searching for the string "Connected to SQS".
- Check to see if the Splunk platform is processing messages successfully. Look for strings that follow the pattern "X completed, Y failed while processing notification batch".
- Check to see if the Splunk platform is discarding messages. Look for strings that follow the pattern "fetched X, wrote Y, discarded Z".
- Review your Amazon Web Services configuration to verify that SQS messages are being placed into queue. If messages are being removed and the logs do not show that our input is removing them, then there may be another script or input consuming messages from the queue. Review your data inputs to ensure there is not another input configured to consume the same queue.
- Go to the AWS console to view CloudWatch metrics with the detail set to 1 minute to view the trend. For more details, see https://aws.amazon.com/blogs/aws/amazon-cloudwatch-search-and-browse-metrics-in-the-console/. If you see messages consumed but no Splunk platform inputs are consuming them, check for remote services that might be accessing the same queue.
- If your AWS deployment contains large S3 buckets, with large number of subdirectories for multiple AWS accounts (60 or more accounts), perform one of the following tasks:
- Enable SQS notification for each S3 bucket and switch to a SQS S3 input. This is a best practice, and lets you add multiple copies of the input for scaling purposes.
- Split your inputs into multiple buckets (one per account), and multiple incremental inputs.
Billing Report issues
Problems accessing billing reports from AWS
Ensure there are Billing Reports available on the S3 bucket you select when you configure the billing input and that the AWS account you specify has the permission to read the files inside that bucket.
Problems understanding the billing report data
Splunk recommends accessing the saved searches included with the add-on to analyze billing report data.
Problems configuring the billing data interval
The default billing data ingestion collection intervals for billing report data is designed to minimize license usage. Review the default behavior and make adjustments with caution.
Configure the interval by which the Splunk platform pulls Monthly and Detailed Billing Reports:
- In Splunk Web, go to the Splunk Add-on for AWS inputs screen.
- Create a new Billing input or click to edit your existing one.
- Click the Settings tab.
- Customize the value in the Interval field.
SNS alert issues
Because the modular input module is inactive, it cannot check whether the AWS is correct or existing in the AWS SNS. If you cannot send the message to the AWS SNS account, you can perform the following procedures.
- Ensure the SNS topic name exists in AWS and the region ID is correctly configured.
- Ensure the AWS account is correctly configured in Splunk add-on for AWS.
If you still have the issue, you can perform the following search to check the log for AWS SNS:
Proxy settings for VPC endpoints
When using proxy with VPC endpoints, check the proxy setting defined in
$SPLUNK_HOME/etc/splunk-launch.conf. For example:
no_proxy = 169.254.169.254,127.0.0.1,s3.amazonaws.com,s3.ap-southeast-2.amazonaws.com
You must add each S3 region endpoint to the
no_proxy setting, and use the correct hostname in your region:
no_proxy setting does not allow for any spaces between the IP addresses.
Certificate verify failed (_ssl.c:741) error message
If you create a new input, and receive the following error message:
certificate verify failed (_ssl.c:741)
Perform the following steps to resolve the error:
- Navigate to
$SPLUNK_HOME/etc/auth/cacert.pemand open cacert.pem with a text editor.
- Copy the text from your deployment's proxy server certificate, and paste it into the cacert.pem file.
- Save your changes.
Configure SQS inputs for the Splunk Add-on for AWS
Access billing data for the Splunk Add-on for AWS
This documentation applies to the following versions of Splunk® Supported Add-ons: released, released