Troubleshoot the Edge Processor solution
Review this page if you are having difficulties with sending data through the Edge Processor solution. If the problem that you're experiencing is not described on this page, you can find more information by doing the following:
- Review the list of known issues in the product. See Known issues.
- Check the logs associated with your Edge Processor. See View logs for the Edge Processor solution.
If the problem persists, contact your Splunk representative for assistance. To help expedite the support process, you can generate a diagnostic report and send it to your Splunk representative.
Generate a diagnostic report for an Edge Processor instance
You can run the edge_diagnostic
tool to generate a diagnostic report for a specific Edge Processor instance. The diagnostic report contains logs about the performance and activity of the Edge Processor instance. Include this report when contacting your Splunk representative for assistance.
- On the host machine of your Edge Processor instance, download the package containing the
edge_diagnostic
tool.curl -o 'edge-diagnostic-linux.tar.gz' 'https://beam.scs.splunk.com/acies/diagnostic/edge-diagnostic-linux.tar.gz'
- Extract the
edge_diagnostic
tool into <install_directory>, where <install_directory> is the installation directory of the Edge Processor instance.tar -xf edge-diagnostic-linux.tar.gz -C <install_directory>
- Navigate to the installation directory.
cd <install_directory>
- Use the
edge_diagnostic
tool to generate a diagnostic report. By default, the tool generates a file named edge-diag-<host_name>-<timestamp>.tar.gz, where <host_name> is the host name of the machine that the Edge Processor instance is running on and <timestamp> is a timestamp indicating when the file was generated. You can use the-out
option to specify a different file name.- To generate a diagnostic report using the default settings, run this command:
./edge_diagnostic_linux_amd64
- To specify a different output file name, run this command, where <file_name> is the name that you want to use:
./edge_diagnostic_linux_amd64 -out <file_name>.tar.gz
The
edge_diagnostic
tool generates a diagnostic report file in the current directory. While the tool is running, it returns INFO logs about its status. - To generate a diagnostic report using the default settings, run this command:
- (Optional) If any unexpected behavior occurs with the
edge_diagnostic
tool, you can get more information about the status of the tool by running it with the-verbose
option. This option causes the tool to print DEBUG level logs about its status while generating the diagnostic report../edge_diagnostic_linux_amd64 -verbose
When contacting your Splunk representative for assistance, send them a copy of the generated file.
Edge Processor installation returns a "401 Unauthorized" error
When you try to install an Edge Processor instance, the installation fails. Additionally, when you review the <install_directory>/var/log/supervisor.log file on the host machine where you attempted the installation, you see an error message similar to the following:
2024/03/26 06:52:16 token refresh failed, will retry: sign token: {"status_code":401,"status":"401 Unauthorized"}
Cause
The security token that's required during the installation process is expired. This problem can happen because the system clock on the host machine is set to an incorrect time, causing the token to expire prematurely. The token is valid for 5 minutes only.
Solution
Make sure that the system clock on the host machine is correct, and synchronize it to a Network Time Protocol (NTP) server. Refer to the documentation for your operating system for more information.
Edge Processor installation displays "certificate has expired" error
When installing an Edge Processor instance, the instance fails to connect with the Edge Processor service, and you receive the following error message about an expired certificate:
splunk@xxx:/splunk/ep/splunk-edge/bin$ ./splunk-edge run 2023/09/12 20:22:51 instance not onboarded - attempting to do so... Error: provision service principal: POST https://xxx.api.scs.splunk.com/xxx/teleport/v1alpha2/instances/onboard: Post "https://xxx.api.scs.splunk.com/xxx/teleport/v1alpha2/instances/onboard": x509: certificate has expired or is not yet valid: current time 2023-09-12T20:22:51Z is after 2021-09-30T14:01:15Z
Cause
The operating system of the host machine for this Edge Processor instance is using CA certificates that are out-of-date, preventing the host machine from connecting with the Edge Processor service.
Solution
Update your CA certificates for the operating system or host and deploy your Edge Processor nodes with a more recent operating system distribution.
An Edge Processor instance that was previously "Healthy" is now "Disconnected"
When viewing details about an Edge Processor, you notice that an instance that previously had the Healthy status now has the Disconnected status.
Cause
The Edge Processor service lost contact with that instance. This problem can happen for a variety of reasons, such as the host machine of the instance going down or the communication between the instance and the Edge Processor service getting blocked.
If this problem happened after the recent Edge Processor update on August 22, 2024, please check that your role search, user search, and disk space limit values on the service account role for the scpbridge connection are set to the recommended limits. See Create a role for the service account for the latest recommended limit values.
This problem can also happen if you remove an Edge Processor instance from its host machine using any method other than the uninstallation command provided in the Edge Processor service. In this case, the service fails to detect that the instance has been removed.
Solution
To determine the root cause of the problem, check the logs for the supervisor that's associated with the disconnected Edge Processor instance.
- Log in to the host machine of the disconnected Edge Processor instance.
- Navigate to the <install_directory>/var/log directory, where <install_directory> is the installation directory of the Edge Processor instance.
- Review the logs in the supervisor.log file.
If the Edge Processor instance was removed using a method other than the uninstallation command, then you need to reinstall the instance and then uninstall it using the appropriate command. Using the command provided in the Edge Processor service allows the service to detect this change and stop listing the instance in the Edge Processor details.
- Reinstall the instance:
- On the Edge Processors page, select the Edge Processor that has Disconnected instance.
- In the panel that contains your Edge Processor details, select Manage instances.
- Select the Install/uninstall tab, and then expand the Step 1: Run commands to install/uninstall instances section.
- Select Install to view the commands for downloading and installing an Edge Processor instance on a Linux machine, and then select Copy to clipboard.
- On the machine that previously hosted the removed instance, open a command-line interface in a directory of your choice and then paste and run the commands. When the installation is complete, you will see the following message:
splunk-edge.service - Splunk edge starter Loaded: loaded (/etc/systemd/system/splunk-edge.service, enabled) Active: active (running)
This command contains sensitive information about your cloud environment. Do not share this command with anyone except your Splunk representative or trusted members in your organization.
- Verify that your instance is healthy by checking its status on the Edge Processors page or in the detailed view of your Edge Processor. It may take up to 1 minute for the status to change to Healthy.
- Uninstall the instance using the command provided in the Edge Processor service:
- In the Manage instances panel for your Edge Processor, on the Install/uninstall tab, expand the Step 1: Run commands to install/uninstall instances section and then select Uninstall.
- To copy the command for uninstalling an instance from a Linux machine, select Copy to clipboard.
- On the host machine where you reinstalled the instance, open a command-line interface in a directory of your choice and then paste and run the command.
An Edge Processor instance is in the "Warning" status
When viewing details about an Edge Processor, you notice that an instance is in the Warning status, which persists even after you wait a few minutes and refresh the page.
Transient changes in instance statuses are expected.
Cause
Possible causes for this problem include the following:
- A problem has occurred in the pair-connected Splunk Cloud Platform deployment, causing the Edge Processor service to receive incomplete status information from Edge Processor instances.
- The Edge Processor instance is consuming more resources than what is acceptable based on the CPU threshold and Memory threshold settings configured in the Edge Processor service.
Solution
First, verify the scpbridge connection settings.
- In your cloud tenant, do the following:
- Select the Settings icon () and then select System connections.
- On the System connections page, select the Edit icon () on the scpbridge connection and review the configuration settings. Confirm that they meet the requirements described in Connect your tenant to your Splunk Cloud Platform deployment.
- In the Splunk Cloud Platform deployment that's connected to the tenant, review the role and configuration of the service account used in the scpbridge connection. Confirm that these configurations meet the requirements described in Create a role for the service account and Create a service account.
- In your cloud tenant, refresh the scpbridge connection to make sure that it is using the latest configuration settings.
If the problem persists, then confirm that the CPU threshold and Memory threshold settings for Edge Processors are configured to allow a reasonable amount of resource usage. The recommended amount is 80% of the total allocated resources.
- On the Edge Processors page, select Shared settings.
- Select the Other settings tab.
- Check the values specified in the CPU threshold and Memory threshold fields.
- If you want to adjust those settings, do the following:
- Select Edit.
- In the CPU threshold field, enter the percentage of the total allocated CPU power that an Edge Processor instance can use before it enters the Warning state.
- In the Memory threshold field, enter the percentage of the total allocated memory that an Edge Processor instance can use before it enters the Warning state.
- Select Save to save your changes, and then select Edge Processors to return to the Edge Processor management page.
If the problem still persists, then try the following solutions.
Reduce CPU usage per instance by scaling out the Edge Processor
Add more instances to the Edge Processor. Having more instances associated with the Edge Processor reduces the amount of CPU processing power that any one instance needs to consume.
For more information, see Add more instances to an Edge Processor.
Reinstall the instance on a machine that has more memory
Install a new Edge Processor instance on a host machine that has more memory available. Then, uninstall the instance that has the Warning status. Make sure to update your data sources to start sending data to the new instance and stop sending data to the uninstalled instance.
For more information, see Add more instances to an Edge Processor and Uninstall an Edge Processor instance.
An Edge Processor instance is in the "Error" status
When viewing details about an Edge Processor, you notice that an instance is in the Error status.
Cause
The Error status indicates that something is wrong with the Edge Processor instance, but it is still in contact with the Edge Processor service. This problem can occur for a variety of reasons. For example, this status can occur if the Edge Processor is configured incorrectly, or if an internal component is stuck in a restart loop.
To get more information about the root cause of an Error status, view the logs for the instance. The logs are located on the host machine of the instance in <install_directory>/var/log/edge.log, where <install_directory> is the installation directory of the Edge Processor instance.
For example, the following messages in the logs indicate that mutually authenticated TLS (mTLS) or shared settings for your Edge Processors are not configured correctly:
Log message | Cause of problem |
---|---|
"failed to read client private key from path open <PEM_file_path>: no such file or directory" | The private key file that you specified in the Server private key field is invalid. |
"tls: private key does not match public key" | The private key and server certificate files that you specified in the Server private key and Server certificate fields are not part of the same key pair. |
"connecting socket: Connection refused" | The port you configured to use is not available. |
"status" : "failed" | There is an error in your most recent configuration. The Edge Processor is running a past configuration. |
Solution
The solution varies depending on the root cause of the problem.
For example, if the logs for the instance indicate that mTLS has not been configured correctly, then review the mTLS settings for your Edge Processor and re-upload the private key or certificate files. If the logs indicate a misconfigured port, configure your settings to use an available port. If the logs indicate a failed configuration, compare a previous successful configuration to your most recent configuration to find the root cause. For more information, see Obtain TLS certificates for data sources and Edge Processors and Add an Edge Processor.
An Edge Processor instance is stuck in the "Pending" status
When viewing details about an Edge Processor, you notice that an instance has been in the Pending status for longer than 2 minutes. The Pending status is supposed to change to the Healthy status once the Edge Processor service finishes applying configuration changes to the instance, which typically takes a few minutes.
Cause
The Edge Processor service is not working as expected. Something is preventing the service from completing the configuration changes on your instance.
Solution
Contact your Splunk representative for assistance.
An Edge Processor instance fails to run when using systemd
Your Edge Processor instance won't run when configured with systemd.
Cause
This might be related to insufficient SELinux or directory permissions.
Solution
Disable or configure SELinux and confirm that the user/group in the service file have write permission on the root Edge Processor directory and all subdirectories. Example SELinux configurations:
sudo chcon -t bin_t /opt/splunk-edge/bin/splunk-edge
sudo semanage fcontext -a -t bin_t '/opt/splunk-edge/bin/splunk-edge' sudo restorecon -Fv /opt/splunk-edge/bin/splunk-edge
"systemctl status" has a 401 response
The Edge Processor is failing to start up when it was able to before, and the command "systemctl status <process_name>" has a 401 response, where <process_name> is the name of the Edge Processor process on your host machine. <process_name> is splunk-edge by default.
Cause
The Splunk Cloud Services token, which is used to authenticate the Edge Processor, will expire if the Edge Processor is down for an hour or longer.
Solution
To retrieve a new Splunk Cloud Services token, see the following steps:
- Put the following command into your terminal:
systemctl stop <process_name>
- Go to https://console.scs.splunk.com/<tenant>/settings, where <tenant> is the name of your Edge Processor tenant.
- In the Access Token section, select the Copy to Clipboard icon (). You now have the Splunk Cloud Services token copied onto your clipboard.
- In your terminal, navigate to the <install_directory>/var directory, where <install_directory> is the installation directory of the Edge Processor:
cd <install_directory>/var
- Update the security token using the token value that you copied to your clipboard:
echo <token> > token
- Start the Edge Processor using the following command:
systemctl start <process_name>
- Use the following command to confirm that your Edge Processor is now working. Afterwards, navigate to the UI of the cloud service, which should now show the updated status:
systemctl status <process_name>
The "Received data" pane is missing some data
You have configured one or more data sources to send data to your Edge Processor. However, when you open the detailed view of your Edge Processor and check the Received data pane, you notice that some data is missing, or the pane shows a "No data to display" message.
Cause
Possible causes for this problem include the following:
- The Received data pane is displaying information from a time range that does not include the data you're looking for.
- The scpbridge connection, which pairs the Edge Processor service with a Splunk Cloud Platform deployment, has been configured incorrectly. This issue can prevent the Received data metrics from being communicated as expected.
Solution
Start by confirming that the Received data pane is displaying information from the correct time range. Do the following:
- On the Edge Processors page, in the row that lists your Edge Processor, select the Actions icon () and then select Open.
- Check the time range that's specified in the Metrics drop-down list. Make sure that it's set to a time range during which your data source was sending data to the Edge Processor.
If the data is still missing, then confirm whether the scpbridge connection is configured correctly. Do the following:
- In your cloud tenant, select the Settings icon () and then select System connections.
- On the System connections page, confirm the status of the scpbridge connection.
- If the Status: connected icon () displays beside the connection name, then the connection settings are valid. In this case, the problem is more likely to be due to permissions issues in the service account. See step 5 for instructions on confirming the permissions of the service account.
- If the Status: disconnected icon () displays instead, then the connection settings are invalid and you need to correct them.
- To verify the connection settings, select the Edit icon () on the scpbridge connection and then do the following:
- Confirm that the Host name, Management port, and Service account username values are correct for the Splunk Cloud Platform deployment that you want your cloud tenant to be connected to.
- Confirm that the password for the service account hasn't been changed since the scpbridge connection was last updated.
- If you need to update any connection settings, change the settings as needed and then select Apply.
- To confirm that the service account used by the scpbridge connection has the necessary permissions, do the following:
- Log in to the connected Splunk Cloud Platform deployment using your admin credentials.
- Confirm that the service account used by the scpbridge connection has permission to access all internal indexes:
- In the Settings menu, in the Users and authentication section, select Roles.
- In the row that lists the role used by your service account, select Edit > Edit.
This role and service account were created during the initial setup of the Edge Processor solution. See First-time setup instructions for the Edge Processor solution for more information.
- On the 3. Indexes tab, make sure that the Included check box is selected in the _* (All internal indexes) row. If that check box is not already selected, then select it and select Save.
- In your cloud tenant, refresh the connection to your Splunk Cloud Platform deployment:
If the Received data panel still does not display the expected information, then your data might be failing to reach your Edge Processor. See An Edge Processor is not receiving the expected data for additional troubleshooting guidance.
An Edge Processor is not receiving the expected data
The Received data pane in the detailed view of your Edge Processor is missing some data, or the pane displays a "No data to display" message.
Before proceeding with this troubleshooting guidance, confirm the following:
- The Received data pane is displaying information from a correct time range.
- The scpbridge connection is configured correctly.
See The "Received data" pane is missing some data on this page for more information.
Cause
Possible causes for this problem include the following:
- The data source or the Edge Processor is not configured correctly.
- Network requirements for allowing the data source and Edge Processor to communicate have not been met.
Solution
Check the configuration settings for the data source and the Edge Processor. Make sure of the following:
- The data source meets the requirements described in this documentation manual and is configured correctly. See the Get data into Edge Processors chapter for more information.
- The Receiver settings in your Edge Processor are configured correctly. For example, if your data source is a syslog device, then you must select the Syslog check box in the Receiver settings area. See Add an Edge Processor for more information.
- On the host machine of your Edge Processor, the firewall settings and ports are configured correctly. See Network requirements for more information.
For additional troubleshooting guidance for specific data sources, see the following sections on this page:
- An Edge Processor is not receiving data from a forwarder
- Syslog data fails to reach an Edge Processor
An Edge Processor is not receiving data from a forwarder
You've configured a universal forwarder or heavy forwarder to send data to an Edge Processor, but the data is not reaching the Edge Processor.
Before proceeding with this troubleshooting guidance, confirm that this problem is not caused by incorrect Receiver settings in your Edge Processor or by misleading information in the Received data pane. See The "Received data" pane is missing some data and An Edge Processor is not receiving the expected data on this page for more information.
Cause
Possible causes for this problem include the following:
- The forwarder is not configured correctly.
- Network requirements for allowing the forwarder and Edge Processor to communicate have not been met.
Solution
To determine the root cause of the problem, start by checking the Edge Processor logs for error messages. Do the following:
- In the Edge Processor service, navigate to the Edge Processors page.
- In the row that lists your Edge Processor, select the Actions icon () and then select View debug logs.
- To retrieve the Edge Processor logs, select a suitable time range and then select the Run icon ().
- Check the returned logs for any errors that mention the "s2s_receiver" and an "invalid protocol level". For additional troubleshooting guidance about these specific errors, see An Edge Processor fails to receive data from a forwarder, and logs an "invalid protocol level" error on this page.
If the logs do not seem to contain any relevant errors, then verify that your forwarder meets the requirements described in Get data from a forwarder into an Edge Processor and is configured correctly. In particular, confirm the following:
- The outputs.conf file defines a target group for the Edge Processor.
- The outputs.conf, inputs.conf, and props.conf files don't define any advanced routing or filtering settings that would prevent data from being forwarded to the target group for the Edge Processor.
- The forwarder is running and has an "active forward" sending data to the Edge Processor. To confirm this, run this command from the $SPLUNK_HOME/bin directory:
splunk list forward-server
If the forwarder configuration is correct but the problem persists, then use the netcat
tool to confirm whether network issues are preventing the forwarder and the Edge Processor from communicating. To do this, you must first have the netcat
tool installed on both the forwarder host machine and the Edge Processor host machine.
- On the host machine of your Edge Processor, use
netcat
to start listening for incoming data on the receiver port for Splunk forwarders. Do the following:- Confirm the port number that your Edge Processor is using to listen for forwarded data. In the Edge Processor service, on the Edge Processors page, select Shared settings and then check the port number specified in the Splunk forwarders section.
- On the Edge Processor host machine, run the following command, where <forwarder_port> is the port number that you confirmed during the previous step.
nc -1 <forwarder_port>
- On the host machine of your forwarder, send the message "hello world" to the host machine of the Edge Processor, and use
netcat
to confirm if the message was sent successfully. To do this, run the following command, where <edge_processor_host> is the IP address of the Edge Processor host machine and <forwarder_port> is the port number that you confirmed during step 1a:echo "hello world" | nc -v <edge_processor_host> <forwarder_port>
If the forwarder host machine successfully sends "hello world" to the Edge Processor host machine, then the following messages are returned:
- In the terminal of the forwarder host: "Connection to <edge_processor_host> port <forwarder_port> [tcp/de-noc] succeeded!"
- In the terminal of the Edge Processor host: "hello world"
If you do not see a successful result, that indicates that the forwarder is unable to communicate with the Edge Processor due to network problems. To resolve these problems, confirm that the firewall settings and ports on your Edge Processor host are configured correctly. See Network requirements for more information.
An Edge Processor fails to receive data from a forwarder, and logs an "invalid protocol level" error
You've configured a universal forwarder or heavy forwarder to send data to an Edge Processor, but the data is not reaching the Edge Processor. When you view the logs in the <install_directory>/var/log/edge.log file on the host machine of the Edge Processor, you see an error message similar to the following:
{..."logger":"s2s_receiver","location":"splunks2sreceiver/receiver.go:163","message":"Error from handle connection"..."error":"failed to read accepted capabilities by client: invalid protocol level, protocol level 0 must be used with protocol version lower than V4","errorVerbose":"invalid protocol level, protocol level 0 must be used with protocol version lower than V4....
Cause
This problem can occur if the following properties are configured in a conflicting manner in the outputs.conf file of the forwarder:
enableOldS2SProtocol
negotiateProtocolLevel
negotiateNewProtocol
Solution
- In the outputs.conf file of the forwarder, delete any lines that specify these settings:
enableOldS2SProtocol
negotiateProtocolLevel
negotiateNewProtocol
- Restart the forwarder to have the updated configuration take effect.
*nix Windows $SPLUNK_HOME/bin/splunk restart
%SPLUNK_HOME%\bin\splunk restart
An Edge Processor fails to receive data through a HEC data source and returns a 401 HTTP status response
Your Edge Processor is not receiving data from your HEC data source. You receive a 401 HTTP status response from your Edge Processor HEC data ingestion.
Cause
The HEC token authentication feature is turned on but you have not provided the correct token in your HEC data source.
Solution
If you do not want to use the HEC token authentication feature, you can turn it off in the shared Edge Processor settings. See Configure shared Edge Processor settings for more information.
If you want to continue using the HEC token authentication feature, do the following:
- Confirm that you have entered the correct token in the shared Edge Processor settings.
- Make sure that the token is configured in your HEC data source HTTP header. For example:
Authorization: Splunk <token>
Syslog data fails to reach an Edge Processor
Your Edge Processor is not receiving syslog data from your syslog data source. When you view the logs in the <install_directory>/var/log/edge.log file on the host machine of the Edge Processor, you see error messages from "input/tcp"
or "input/udp"
or "input/syslog"
.
Cause
This problem occurs when you use the UDP transport protocol to send data. This is a known issue with the UDP transport protocol that is not unique to the Edge Processor solution. UDP does not provide any data guarantees, so when you try to send syslog data to an Edge Processor through UDP, the data might not be delivered successfully.
Solution
If possible, send your syslog data through the TCP transport protocol to validate the connection. If UDP is needed, keep sending syslog data until data reaches Edge Processor. See Configure your device to send syslog data to an Edge Processor for instructions.
Edge Processor node crashes when I enable syslog on port 514
Your Edge Processor node crashes repeatedly when enabling syslog on ports 1025 or lower.
Cause
This might be due to restrictions related to SELinux or other OS level assignment of ports to non-root users.
Solution
Only enable syslog on ports 1025 or greater. If the node no longer crashes, it's most likely due to a restricted port number. You can consult with your linux administrator to change permissions to allow non-root users to use ports 1-1024. See Configure shared Edge Processor settings for more information.
Pipeline cannot be applied because its configuration is too large
When you try to apply a pipeline to an Edge Processor, the operation fails and you receive the following error message: "pipeline configuration too large".
Cause
The maximum number of pipelines that can be applied to that Edge Processor has been reached.
This limit is a soft limit that varies depending on the complexity and length of the overall configurations of the applied pipelines. For more information, see Tested and recommended service limits (soft limits) in the Splunk Cloud Platform Service Description.
Solution
- Reduce the number of pipelines applied to that particular Edge Processor by consolidating your data processing actions into the same pipeline, where possible.
For example, if your data is associated with 3 differenthost
values and you want to process all of that data in similar ways, you can apply 1 pipeline that selects data with those 3host
values for processing. This approach is recommended over applying 3 different pipelines that each select data with 1 specifichost
value for processing. - Review the event breaking and merging configurations in the source types that the applied pipelines are working with, and reduce the complexity of those configurations if possible. See Using source types to break and merge data in Edge Processors for more information.
Reducing the complexity of the source type configurations also reduces the complexity and length of the pipeline's overall configuration.
My data is missing from a destination
You've configured your Edge Processor to send processed data to a destination, but the data is missing from that destination.
Before proceeding with this troubleshooting guidance, make sure that your Edge Processor is successfully receiving the data. See View data flow information about an Edge Processor.
Cause
Possible causes for this problem include the following:
- The pipelines that are applied to the Edge Processor are not configured correctly.
- The destinations used in the applied pipelines are not configured correctly.
- The Edge Processor's data queue is full.
- Network requirements for allowing the Edge Processor to communicate with the destination have not been met.
Solution
To determine the root cause of the problem, start by checking the Edge Processor logs for error messages. Do the following:
- In the Edge Processor service, navigate to the Edge Processors page.
- In the row that lists your Edge Processor, select the Actions icon () and then select View debug logs.
- To retrieve the Edge Processor logs, select a suitable time range and then select the Run icon ().
- Check the returned logs for any errors that indicate data loss, network connectivity issues, a full "sending_queue", or problems with the destination configuration.
For additional troubleshooting guidance for specific error messages, see the following sections on this page:
- An Edge Processor is not connecting to the Splunk Cloud Platform
- An Edge Processor fails to send data, and logs a "Dropping data because sending_queue is full" error
- An Edge Processor fails to send data through HEC and logs a 403 error
- An Edge Processor fails to send data through HEC and logs an "Incorrect index" error
- My data is not appearing in an Amazon S3 bucket
If the logs do not seem to contain any relevant errors, then do the following to confirm that the Edge Processor is sending the volume of data that you expect:
- On the Edge Processors page, in the row that lists your Edge Processor, select the Actions icon () and then select Open.
- From the Metrics drop-down list, select the time range for the Edge Processor information that you want to check.
- In the Pipelines pane, confirm whether the Outbound data column displays the data volume that you expect.
If the data volume shown does not match what you expect, then the pipelines applied to the Edge Processor might not be configured correctly. Check your pipeline configurations to make sure that you are not accidentally filtering out data that you want to keep.
If the problem persists after you check the outbound data volume and pipeline configurations, then the root cause of the problem might be due to how data is being routed inside the destination after it leaves the Edge Processor, or due to network connectivity issues.
- If you're sending data to a Splunk platform deployment, then see My data is missing from an index for further troubleshooting guidance.
- Otherwise, contact your Splunk representative for assistance.
My data is missing from an index
You've configured an Edge Processor to send data to an index in Splunk Cloud Platform or Splunk Enterprise, but the data is missing from the index.
Before continuing with this troubleshooting guidance, make sure that the problem is not due to misconfigured destinations, pipeline filtering, or other specific errors. See My data is missing from a destination on this page for more information.
Cause
Possible causes for this problem include the following:
- Your data is being routed in unexpected ways after it reaches the Splunk platform deployment.
- Network requirements for allowing the Edge Processor to communicate with the Splunk platform deployment have not been met.
Solution
Start by checking the configuration settings that determine how indexers route data to an index, and making sure that your data is being routed to the expected index. Do the following:
- Review the configurations described in How does an Edge Processor know which index to send data to? and make sure that your data is being routed to the index that you expect.
- Make sure that the target index exists in the Splunk platform deployment. If you attempt to send data to an index that doesn't exist in the deployment, then your data is sent to the index specified by the
lastchanceindex
property in the indexes.conf file.
If these routing configurations are correct but the problem persists, then use the netcat
tool to confirm whether network issues are preventing the Edge Processor and Splunk platform deployment from communicating. To do this, you must first have the netcat
tool installed on the Edge Processor host machine.
Use the netcat
tool to connect from the Edge Processor host machine to the Splunk platform deployment. Run the following command, where <indexer> is the IP address or host name of an indexer in your Splunk platform deployment:
nc -v <indexer> 9997
If the connection is successful, then the following message is returned:
Connection to <indexer> port 9997 [tcp/palace-6] succeeded!
If you do not see a successful result, that indicates that the Edge Processor is unable to communicate with the indexer due to network problems. To resolve these problems, confirm that the firewall settings and ports on your Edge Processor host are configured correctly. See Network requirements for more information.
An Edge Processor is not connecting to the Splunk Cloud Platform
You've configured an Edge Processor to send data to an index in Splunk Cloud Platform or Splunk Enterprise, but the data is missing from the index. Additionally, when you check the Edge Processor logs, you see error messages such as the following:
This "connection refused" error message:
{"level":"INFO","time":"2023-09-15T17:14:17.396Z","logger":"DNSResolvingClientProvider.Peer","location":"v2/peer.go:387","message":"fail to connect","service":"edge-processor","hostname":"XYZ","commit":"6a7d87e4","version":"1.0.0","kind":"exporter","data_type":"logs","name":"S2S/acies_logs","errReason":"failed to connect to <SPLUNK-IP>:<PORT>: dial tcp <SPLUNK-IP>:<PORT>: connect: connection refused"}
This "connection timed out" error message:
{"level":"INFO","time":"2023-09-09T03:43:32.794Z","logger":"DNSResolvingClientProvider.Peer","location":"v2/peer.go:387","message":"fail to connect","service":"edge-processor","hostname":"XYZ","commit":"de064ae8","version":"1.0.0","kind":"exporter","data_type":"metrics","name":"S2S/shared.pipelines.acies_test_play_default_destination_aug_2023","errReason":"failed to connect to <SPLUNK-IP>:<PORT>: dial tcp <SPLUNK-IP>:<PORT>: connect: connection timed out"}
This "i/o timeout" error message:
{"level":"INFO","time":"2023-09-11T09:25:18.533Z","logger":"DNSResolvingClientProvider.Peer","location":"v2/peer.go:387","message":"fail to connect","service":"edge-processor","hostname":"acf798510cce","commit":"de064ae8","version":"1.0.0","kind":"exporter","data_type":"logs","name":"S2S/acies_logs","errReason":"failed to connect to <SPLUNK-IP>:<PORT>: dial tcp <SPLUNK-IP>:<PORT>: i/o timeout"}
Cause
The Edge Processor is unable to communicate with the Splunk platform deployment due to network problems.
Solution
Make sure that the firewall settings and ports on your Edge Processor host are configured correctly. See Network requirements for more information.
An Edge Processor fails to send data, and logs a "Dropping data because sending_queue is full" error
You've configured an Edge Processor to send data to a destination, but the data is missing from that destination. When you view the Edge Processor logs, you see an error message similar to the following:
{"level":"ERROR","time":"2024-03-27T06:27:33.628Z","location":"exporterhelper/queue_sender.go:196","message":"Dropping data because sending_queue is full. Try increasing queue_size.","service":"edge-processor","hostname":"EUAWS00LNX0215","commit":"958c65c0","version":"1.0.0","kind":"exporter","data_type":"logs","name":"S2S/shared.pipelines.default_splunk_cloud_destination","dropped_items":69,"callstack":"<callstack_details>"}
Cause
The Edge Processor is dropping data because its data queue is full.
When a destination is unavailable or an Edge Processor receives more data than it can send out, the Edge Processor holds data in a queue. If this queue fills up before the problem resolves, then the Edge Processor starts dropping any additional data that it receives. For more information about the queue, see What happens to my data if a destination becomes unavailable?
Solution
If you're sending data to a Splunk platform S2S destination, you can increase the size of the queue to avoid losing data due to a full queue. Otherwise, contact your Splunk representative for assistance.
To adjust the queue size for a Splunk platform S2S destination, do the following:
- In a browser, log in to the Splunk Cloud Console.
- Select More Options and select Settings.
- Select the Copy to clipboard icon beside the Access Token field.
- From the command line, set the following environment variables.
- Set TOKEN to be the access token that you copied in step 3.
- Set TENANT to be the name of your tenant.
- Set API_URL to be
https://<tenant>.api.scs.splunk.com
. - Set DATASET_NAME to be
shared.pipelines.<destination_name>
. For example, if you'd like to increase the size of the queue for your destination nameddefault_splunk_cloud_destination
, then your DATASET_NAME will beshared.pipelines.default_splunk_cloud_destination
. - Run the following command to modify the queue size. Replace <updated_queue_size> with the maximum number of data batches that you want the queue to hold.
curl --location --request PATCH "$API_URL/$TENANT/search/v3alpha1/datasets/$DATASET_NAME"\ --header "Authorization: Bearer $TOKEN"\ --header "Content-Type: application/json"\ --data-raw '{ "sendQueueSize": <updated_queue_size> }'
- Refresh your pipelines so that they use the updated
sendQueueSize
setting. See Refresh a pipeline for more information.
An Edge Processor is not receiving more data
You've configured an Edge Processor to send data to a destination, but the data is missing from that destination and the Edge Processor is not receiving more data.
Cause
- The Edge Processor is back pressuring data ingestion due to the persistent queue being full.
- The destination has an outage due to misconfigurations.
Solution
To check if your Edge Processor is back pressuring data ingestion, check the persistent queue to see if it is full.
- If you are using a forwarder to send data to an Edge Processor, see if your output queue has increased by checking the
tcpout queue
in your metrics.log file. - If you are using a HEC data source, you will see your HTTP connections timing out.
For more information on how the Edge Processor handles data in the persistent queue, see What happens to my data if a destination becomes unavailable? for more information.
An Edge Processor fails to send data through HEC and logs a 403 error
Your Edge Processor has a pipeline that sends data to a Splunk platform HEC destination, but that data is not arriving in the Splunk platform as expected. When you view the logs in the <install_directory>/var/log/edge.log file on the host machine of the Edge Processor, you see an error message containing the HTTP 403 code. For example:
"error":"HTTP 403 \"Forbidden\""
For information about other error codes and messages returned by HEC endpoints, see Possible error codes in the Getting Data In manual.
Cause
The HEC token that the Edge Processor is using to send data to the Splunk platform is invalid. Reasons why a HEC token might be invalid include, but are not limited to, the following:
- The token is turned off in the Splunk platform.
- The token was entered incorrectly in the original HTTP request or in the Splunk platform HEC destination settings.
Solution
First, confirm which HEC token your Edge Processor is using to send the data:
- If the data was originally transmitted to the Edge Processor through an HTTP request that specifies a HEC token in the
Authorization
header, then this token is used when the Edge Processor sends the data to the Splunk platform. - Otherwise, the HEC token specified in the Splunk platform HEC destination is used.
Then, confirm the status of the HEC token in the Splunk platform deployment:
- Log in to the Splunk platform deployment where the HEC token is configured.
- In Splunk Web, select Settings, then Data inputs.
- Select HTTP Event Collector.
- Confirm that the HTTP Event Collector page lists your HEC token, and that the status of the token is Enabled.
- If the status of the token is Disabled, then turn it on by selecting Enable in the Actions column.
An Edge Processor fails to send data through HEC and logs an "Incorrect index" error
Your Edge Processor has a pipeline that sends data to a Splunk platform HEC destination, but that data is not arriving in the Splunk platform as expected. When you view the logs in the <install_directory>/var/log/edge.log file on the host machine of the Edge Processor, you see an error message containing the phrase "Incorrect index".
For information about other error codes and messages returned by HEC endpoints, see Possible error codes in the Getting Data In manual.
Cause
The HEC token that the Edge Processor is using to send data to the Splunk platform doesn't have access to the destination index specified for the data.
Solution
First, confirm which HEC token your Edge Processor is using to send the data:
- If the data was originally transmitted to the Edge Processor through an HTTP request that specifies a HEC token in the
Authorization
header, then this token is used when the Edge Processor sends the data to the Splunk platform. - Otherwise, the HEC token specified in the Splunk platform HEC destination is used.
Then, update the index permission settings on the HEC token:
- Log in to the Splunk platform deployment where the HEC token is configured.
- In Splunk Web, select Settings, then Data inputs.
- Select HTTP Event Collector.
- Select the token that your Edge Processor is using to send data.
- In the Select Allowed Indexes control, select remove all for the Selected indexes pane. When no indexes are selected, the HEC token allows data to be sent to any index in the Splunk platform deployment.
- Select Save.
My data is not appearing in an Amazon S3 bucket
You've configured an Edge Processor to send data to an Amazon S3 bucket, but the data is missing from the bucket. Additionally, when you check the Edge Processor logs, you see the following error message:
{"level":"INFO","time":"2023-09-15T10:04:09.397Z","location":"exporterhelper/queued_retry.go:423","message":"Exporting failed. Will retry the request after interval.","service":"edge-processor","hostname":"XYZ,"commit":"6a7d87e4","version":"1.0.0","kind":"exporter","data_type":"logs","name":"S3","error":"operation error S3: PutObject, https response error StatusCode: 403, RequestID: <ID>, HostID: <ID>, api error Access Denied.","interval":"31.154004002s"}
Cause
The Edge Processor is not configured correctly to send data to an Amazon S3 destination.
Solution
Check that you have fulfilled the prerequisites to send data to an Amazon S3 bucket from your Edge Processor. See Send data from Edge Processors to Amazon S3 for more information.
My data is not being processed as expected
When you try to preview a pipeline, the preview results area displays a "No results" message or data that looks incorrect.
Alternatively, when you view the data that was sent from a pipeline to a destination, you notice that the data looks incorrect.
Cause
Reasons why a pipeline might not process data as expected include, but are not limited to, the following:
- The inbound stream of data is not being broken into events correctly. Data must be pre-processed into distinct events before being processed by a pipeline.
- The pipeline is not configured correctly.
- The pipeline preview is for the wrong destination.
Solution
For pipelines with multiple destinations, check to see if you are previewing the correct destination. If not, run the pipeline preview by selecting the Preview Pipeline icon () then select the destination name in the Preview drop-down list.
If this is not the case, make sure that event breaking and merging has been configured correctly for the source type of the data that you want to process.
- Navigate to the Source types page.
- Look for a source type with a name that matches the value of the
sourcetype
field in the data that you want to process.- If the source type exists, select it to view its configuration details. Confirm that the event breaking and merging behavior is configured correctly for the data that you want to process.
- If the source type does not exist, then add it to the Edge Processor service.
For more information about the configuration settings for source types, see Add source types for Edge Processors.
If the problem persists after you've verified the source type configuration, then complete the following steps to verify that the processing commands in your pipeline are configured correctly.
- If you don't already have your pipeline open for editing, do the following:
- From the side panel of the pipeline builder, select Sample data.
- Enter or upload sample data that matches the inbound data that you want this pipeline to process, and then select Apply. You can use text strings that represent raw data or CSV values that represent parsed, field-extracted data. See Getting sample data for previewing data transformations for more information.
- To generate a preview of what your data looks like after being processed by the pipeline, select the Preview Pipeline icon ().
- Verify that the preview results match how you want the pipeline to process your data. If the results do not match, or the preview cannot be generated, then make sure that the SPL2 statement of your pipeline is written correctly and contains only supported SPL2 commands. See Edge Processor pipeline syntax for more information.
When I send data to the Splunk platform, chunks of data from different events are intertwined together after indexing
When you search your Splunk platform deployment for data that you routed through an Edge Processor, the search returns events that incorrectly contain chunks of data from different events. This problem occurs even though your pipelines are configured correctly and the pipeline previews show data that looks correct.
Cause
This problem occurs if the host_segment
attribute is configured in the inputs.conf file for multiple forwarders, and the host_segment
settings are causing those forwarders to use the same host
value. Typically, data from different forwarders is associated with different host
values.
If your Edge Processors are routing data from multiple forwarders and the data from those forwarders are associated with the same host
, source
, and sourcetype
values, then indexers treat that data as pieces of the same event and the data is intertwined as a result.
Solution
For each forwarder that is sending data to an Edge Processor, open the inputs.conf file and check the host_segment
setting. Make sure that none of your forwarders have been configured to send data using the same host
value.
For more information, see the following pages:
- Set the event host with the host_segment attribute in the Splunk Cloud Platform Getting Data In manual.
- inputs.conf in the Splunk Enterprise Admin Manual.
Lookup dataset is not available
You created a lookup in the Splunk Cloud Platform deployment that is pair-connected with the Edge Processor tenant, and then refreshed the scpbridge connection to bring that lookup into the tenant as a lookup dataset. However, when you try to work with this lookup dataset, you encounter one or more of the following problems:
- The Datasets page in the tenant does not include your lookup dataset.
- When you open the lookup dataset in the Search page and try to run a search, the search results pane displays an error or 0 results.
- When you use the Enrich events with lookups dialog box to configure a lookup for your pipeline, the Lookup dataset menu does not include your lookup dataset.
Cause
A permissions error is preventing you from fully accessing the lookup dataset.
This problem can happen if your user account or the service account used by the scpbridge connection is missing read permissions for the following in Splunk Cloud Platform:
- The lookup table or definition
- The destination app that the lookup table or definition is associated with
Solution
- In Splunk Cloud Platform, select Settings, then select Lookups.
- Select either Lookup table files or Lookup definitions, depending on how you created your lookup.
If you're using a KV Store lookup, you must create a lookup definition for it.
- In the row that lists your lookup, select Permissions.
- Update the permissions as follows:
- Set the Object should appear in option to All apps (system).
- Make sure that Read permission is available to a role that is associated with your Splunk platform user account.
- Make sure that Read permission is available to the role used by the service account. Typically, the name of this role is scp_user, if you used the role name suggested in Create a role for the service account during the initial setup of the Edge Processor solution.
- Make sure that a role that is associated with your user account and the role used by the service account both have Read permission for the Destination app that is associated with the lookup.
- Select Apps, then select Manage Apps.
- Find the app that your lookup is associated with, and then select Permissions.
- Select Read permission for the necessary roles, and then select Save.
- Navigate to your Edge Processor tenant and then refresh the scpbridge connection.
When I edit a pipeline, I receive a system message saying that my pipeline has changed and I need to update the partition and "where" command configuration
When you open a pipeline for editing, the Edge Processor service displays the following system message:
Splunk has released a software update that affects how filtering clauses in pipelines are interpreted. This change can impact how Edge Processors determine which data to drop or send to the default destination. Update the partition and "where" command configurations in this pipeline as needed, and then save your changes.
Cause
This message appears because your pipeline is affected by a feature update that was released on January 22, 2024. See Updates to partitioning and filtering behavior in Edge Processor pipelines for more information.
The pipeline is still valid. However, the data processing behavior of the pipeline will be changed by the feature update after you save your changes to the pipeline.
Typically, the Edge Processor service automatically adjusts the configuration of your pipeline so that the pre-existing data processing behavior is preserved even after the feature update takes effect. However, in some cases, this adjustment fails. This system message indicates that the automatic adjustment failed, so you need to review and manually update the configuration of your pipeline to ensure that it continues to work as intended.
Solution
- Review the SPL2 statement of the pipeline, and identify any
where
clauses that immediately follow thefrom $source
command.
For example, in the following pipeline,host="buttercup"
andsource="test_server"
arewhere
clauses that immediately follow thefrom $source
command:$pipeline = | from $source | where host="buttercup" AND source="test_server" | eval index="my_test_index" | where result="success" | into $destination;
Before the feature update that was released on January 22, 2024, Edge Processors interpreted these
where
clauses as partition conditions, so any data excluded by thesewhere
clauses got sent to the Edge Processor's default destination. Now, Edge Processors interpret them as filters in the main body of the pipeline, so the excluded data gets dropped instead. - To retain the original data processing behavior of your pipeline, do the following:
- Update the partition of the pipeline to include the
where
clauses identified in step 1. - Delete the
where
clauses identified in step 1 from the SPL2 statement.
- Update the partition of the pipeline to include the
- (Optional) Preview your pipeline using sample data to confirm that it works as expected.
- Save your changes.
See Example 2: The pipeline configuration is not automatically adjusted for more information.
Destinations associated with the connected Splunk Cloud Platform deployment are not working as expected
When you try to send data to a Splunk platform S2S destination that has the Tenant paired property in the Kind field, you encounter errors and the Edge Processor fails to send data to that destination.
Cause
If the Kind field in a destination has the Tenant paired property, that destination is available to Edge Processors through a connection named scpbridge. Reasons why this destination might not work as expected include, but are not limited to, the following:
- The scpbridge connection settings are incorrect. This scenario can occur if the credentials of the service account have changed since the last time the connection was updated, or if the Splunk Cloud Platform deployment has been updated in a way that changes its connection information.
- An index that was previously available as part of this connection has been deleted or changed in the Splunk Cloud Platform deployment.
For more information about the scpbridge connection, see First-time setup instructions for the Edge Processor solution and Send data from Edge Processors to the Splunk Cloud Platform deployment connected to your tenant.
Solution
To verify or update your scpbridge connection settings, complete the following steps.
- In your cloud tenant, select the Settings icon () and then select System connections.
- On the System connections page, confirm the status of the scpbridge connection.
- If the Status: connected icon () displays beside the connection name, then the connection settings are valid. In this case, the problem is more likely to be due to permissions issues in the service account or a change in the indexes that are available in the Splunk Cloud Platform deployment. See the next set of instructions in this section for more guidance.
- If the Status: disconnected icon () displays instead, then the connection settings are invalid and you need to correct them.
- To verify the connection settings, select the Edit icon () on the scpbridge connection and then do the following:
- Confirm that the Host name, Management port, and Service account username values are correct for the Splunk Cloud Platform deployment that you want your cloud tenant to be connected to.
- Confirm that the password for the service account hasn't been changed since the scpbridge connection was last updated.
- If you need to update any connection settings, change the settings as needed and then select Apply.
If you are unable to send data from an Edge Processor to an index that is associated with the scpbridge connection, make sure that the index is still available in the connected Splunk Cloud Platform deployment and accessible by the scpbridge connection.
- Log in to the connected Splunk Cloud Platform deployment using your admin credentials.
- Confirm that the index has not been deleted from the deployment.
- Confirm that the service account used by the scpbridge connection has permission to access the index:
- In the Settings menu, in the Users and authentication section, select Roles.
- In the row that lists the role used by your service account, select Edit > Edit.
This role and service account were created during the initial setup of the Edge Processor solution. See First-time setup instructions for the Edge Processor solution for more information.
- On the 3. Indexes tab, make sure that the Included check box is selected for your index. If that check box is not already selected, then select it and select Save.
- In your cloud tenant, refresh the connection to your Splunk Cloud Platform deployment:
When I try to delete the "scpbridge" connection, an error occurs
On the System connections page, when you select the Delete icon () on the scpbridge connection, the Edge Processor service returns an error message indicating that the connection could not be deleted.
Cause
This error message appears because the scpbridge connection cannot be deleted after it is created. The Edge Processor solution uses this connection to store and read logs and metrics from Edge Processors, and cannot operate correctly without this connection.
For more information about the scpbridge connection, see First-time setup instructions for the Edge Processor solution and Send data from Edge Processors to the Splunk Cloud Platform deployment connected to your tenant.
Solution
You cannot delete the scpbridge connection. However, if necessary, you can update the connection settings to connect the Edge Processor solution to a different Splunk Cloud Platform deployment.
To update the scpbridge connection settings, do the following:
Set up alerts for Edge Processor metrics |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)
Feedback submitted, thanks!