Troubleshooting and investigation
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Troubleshooting and investigation
This section describes procedures and practices for using Splunk to troubleshoot problems in a cross-tier environment using IT data from multiple sources. When a customer of a web application sees an application error in the browser, the root cause of that error may be several logical or physical hops away, for example in a database or a remote web service. Splunk saves you time by allowing you to investigate from one place with the ability to cross-correlate between logs.
Splunk provides many commands and tools for troubleshooting and monitoring. One that is particularly useful for application management is the transaction command, which allows you to correlate events in different logs and combine them into transactions that cross servers and tiers.
Note: This topic currently provides an outline, which will be expanded over time. The outline and contents may change.
Outline
- Ad-hoc investigation
- Using search and domain expertise alone
- sometimes as simple as time and host, or time and userID.
- Manual transaction tracing
- Requirements for the transaction command, e.g., need shared or mappable IDs for correlation
- Note: often you gotta get developers' help to log this correlation info!
- Making tracing easier & more repeatable
- What Splunk commands/techniques to use
- Also: performance gotchas and best practices
- How-tos to fix specific cases
- Overlap between webserver logs and application logs - getting a correct count
- One-to-many/many-to-many mappings
- Tracking down open/hung transactions
- Search for overlapping transactions
Useful things to look at for troubleshooting
The following information is present in the log files and can be used for troubleshooting:
- SOAP faults: These appear in the XML logs, often with an error code. You can set a threshold (e.g., more than 10 faults in one minute) and alert on this.
- ESB error conditions: These are logged and allow you to discover when your ESB fails to connect or if a message has been sent using the wrong protocol
- Unauthorized access
- Authentication failures
Walkthroughs
The following walkthroughs discuss ad hoc troubleshooting:
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.