Resolve data quality issues
You can troubleshoot the following event processing and data quality issues when you get data in to the Splunk platform:
- Incorrect line breaking
- Incorrect event breaking
- Incorrect timestamp extraction
Line breaking issues
The following symptoms indicate that there might be issues with line breaking:
- You have fewer events than you expect and the events are very large, especially if your events are single-line events.
- The Monitoring Console Data Quality dashboard displays issues with line breaking.
- You might see the following error message in the Splunk Web Data Input workflow or in the splunkd.log file: "Truncating line because limit of 10000 bytes has been exceeded".
To confirm that the Splunk platform has line breaking issues, do one or more of the following troubleshooting steps:
- Visit the Monitoring Console Data Quality dashboard. Check the dashboard table for line breaking issues. See About the Monitoring Console in Monitoring Splunk Enterprise.
- Look for messages in the splunkd.log file like the following example:
- Search for events. Multiple combined events, or a single event broken into many, indicates a line breaking issue.
To resolve line breaking issues, complete these steps in Splunk Web:
- Click Settings > Add Data.
- Click Upload to test by uploading a file or Monitor to redo the monitor input.
- Select a file with a sample of your data.
- Click Next.
- On the Set Source Type page, work with the options on the left panel until your sample data is correctly broken into events. To configure
TRUNCATE, click Advanced.
- Complete the data input workflow or record the correct settings and use them to correct your existing input configurations.
While you work with the options on the Set Source Type page, the
LINE_BREAKER setting might not be properly set. The
LINE_BREAKER setting must have a capturing group and the group must match the events.
For example, you might have a value of
LINE_BREAKER that is not matched. Look for messages with "Truncating line because limit of 10000 bytes has been exceeded" in the splunkd.log file or look for the following message in Splunk Web:
If you find such a message, do the following:
- Confirm that
LINE_BREAKERis properly configured to segment your data into lines as you expect.
- Confirm that the string you specify in the
LINE_BREAKERsetting exists in your data.
LINE_BREAKERis configured correctly but you have very long lines, or if you are using
LINE_BREAKERas the only method to define events, bypassing line merging later in the indexing pipeline, confirm that the
TRUNCATEsetting is large enough to contain the entire data fragment delimited by
The default value for
TRUNCATEis 10,000. If your events are larger than the
TRUNCATEvalue, you might want to increase the value of
TRUNCATE. For performance and memory usage reasons, do not set
If you do not specify a capturing group,
LINE_BREAKER is ignored.
For more information, see Configure event line breaking.
Event breaking or aggregation issues
Event breaking issues can pertain to the
MAX_EVENTS settings and any props.conf configuration file setting with the keyword
You might have aggregation issues if you see the following indicators:
- Aggregation issues present in the Monitoring Console Data Quality dashboard.
- An error in the Splunk Web Data Input workflow.
- Count events. If events are missing and are very large, especially if your events are single-line events, you might have event breaking issues
To confirm that the Splunk platform has event breaking issues, do one or more of the following troubleshooting steps:
- View the Monitoring Console Data Quality dashboard.
- Search for events that are multiple events combined into one.
- Check splunkd.log for messages such as the following:
12-07-2016 09:32:32.876 -0500 WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded
12-07-2016 09:32:32.876 -0500 WARN AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (256) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only.
For line and event breaking, determine whether this issue occurs for one of the following reasons:
- Your events are properly recognized but are too large for the limits in place. The
MAX_EVENTSdefines the maximum number of lines in an event.
- Your events are not properly recognized.
If your events are larger than the limit set in
MAX_EVENTS, you can increase limits. Be aware that large events are not optimal for indexing performance, search performance, and resource usage. Large events can be costly to search. The upper values of both limits result in 10,000 characters per line, as defined by
TRUNCATE, times 256 lines, as set by
MAX_EVENTS. The combination of those two limits is a very large event.
If the cause is that your events are not properly recognized, which is more likely, the Splunk platform is not breaking events properly. Check the following:
- Your event breaking strategy. The default strategy breaks before the date, so if the Splunk platform does not extract a timestamp, it does not break the event. To diagnose and resolve this issue, investigate timestamp extraction. See How timestamp assignment works.
- Your event breaking regular expression.
For more information, see the following topics:
Time stamping issues
Time stamping issues can pertain to the following settings in the props.conf configuration file:
You might have timestamp parsing issues if you see the following indicators:
- Timestamp parsing issues are present in the Monitoring Console Data Quality dashboard.
- An error occurs in the Splunk Web Data Input workflow.
- Count events. If you are missing events and have very large events, especially if your events are single-line events, parsing might be a problem.
- The time zone is not properly assigned.
- The value of
_timeassigned by the Splunk platform does not match the time in the raw data.
To confirm that you have a timestamping issue, do one or more of the following:
- Visit the Monitoring Console Data Quality dashboard. Check for timestamp parsing issues in the table. Time stamp assignment resorts to various fallbacks. For more details, see How timestamp assignment works. For most of the fallbacks, even if one of them successfully assigns a timestamp, you still get an issue in the Monitoring Console dashboard.
- Search for events that are multiple events combined into one.
- Look in the splunkd.log file for messages like the following:
12-09-2016 00:45:29.956 -0800 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Fri Dec 9 00:45:27 2016). Context: source::/disk2/sh-demo/splunk/var/log/splunk/entity.log|host::svdev-sh-demo|entity-too_small|682235
12-08-2016 12:33:56.025 -0500 WARN AggregatorMiningProcessor - Too many events (100K) with the same timestamp: incrementing timestamps 1 second(s) into the future to insure retrievability
All events are indexed with the same timestamp, which makes searching that time range ineffective.
To resolve a timestamping issue, complete the following steps:
- Make sure that each event has a complete timestamp, including a year, full date, full time, and a time zone.
- See Configure timestamp recognition for more possible resolution steps.
Troubleshoot the input process
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2112, 8.1.2011, 8.1.2012, 8.1.2101, 8.0.2006, 8.0.2007, 8.1.2009, 8.1.2103, 8.2.2104, 8.2.2105, 8.2.2106, 8.2.2107 (latest FedRAMP release), 8.2.2109, 8.2.2111