Splunk® Enterprise

Getting Data In

Download manual as PDF

Download topic as PDF

Resolve data quality issues

This topic helps you troubleshoot event-processing and data quality issues such as the following:

  • Incorrect line breaking
  • Incorrect event breaking
  • Incorrect time stamp extraction

Line breaking issues

Problem

Indicators that you have line breaking issues include the following:

  • You have fewer events than you expect and the events are very large, especially if your events are single-line events.
  • Line breaking issues are present in the Monitoring Console Data Quality dashboard.
  • In the Splunk Web Data Input workflow or in splunkd.log, an error message like the following. Linebreaker mismatch.png

Diagnosis

To confirm that your Splunk software has line breaking issues, do one or more of the following:

  • Visit the Monitoring Console Data Quality dashboard. Check the dashboard's table for line breaking issues. See About the Monitoring Console in Monitoring Splunk Enterprise.
  • Look for messages in splunkd.log like the following:
12-12-2016 13:45:48.709 -0800 WARN LineBreakingProcessor - Truncating line because limit of 10000 bytes has been exceeded with a line length >= 301367
  • Search for events. Multiple events combined, or a single event broken into many, indicates a line breaking issue.

Solution

To resolve line breaking issues, in Splunk Web:

  1. Click Settings > Add data.
  2. Click add a file to test or monitor to redo the monitor input.
  3. Select a file with a sample of your data.
  4. Click Next.
  5. On the Set Source Type page, work with the options on the left until your sample data is correctly broken into events. To configure LINE_BREAKER or TRUNCATE, click Advanced.
  6. Complete the data input workflow or record the correct settings and use them to correct your existing input configurations.

While you are working with the options on the Set Source Type page, the LINE_BREAKER setting might not be properly set. LINE_BREAKER must have a capturing group and the group must match the events.

For example, you might have a value of LINE_BREAKER that is not matched (screenshot called linebreaker_mismatch). Look for messages with "Truncating line because limit of 10000 bytes has been exceeded" in splunkd.log or in Splunk Web:

Linebreaker mismatch.png

If you find such a message, do the following:

  1. Check that LINE_BREAKER is properly configured to segment your data into lines as you expect. Make sure that the string exists in your data.
  2. If LINE_BREAKER is configured correctly, and you simply have very long lines, or if you are using LINE_BREAKER as the only method to define events, bypassing line merging later in the indexing pipeline,

make sure that TRUNCATE is set large enough to contain the entire data fragment delimited by LINE_BREAKER. The default value for TRUNCATE is 10,000. If your events are larger than the TRUNCATE value, you might want to increase the value of TRUNCATE. For performance and memory usage reasons, do not set TRUNCATE to unlimited.

If you do not specify a capturing group, LINE_BREAKER is ignored.


See Configure event line breaking.

Event breaking, or aggregation, issues

Event breaking issues can pertain to BREAK_ONLY_BEFORE_DATE, MAX_EVENTS, and any props.conf setting that contains the keyword "BREAK".

Problem

Indicators that you have aggregation issues include:

  • Aggregation issues present in the Monitoring Console Data Quality dashboard.
  • An error in the Splunk Web Data Input work flow.
  • Count events. If events are missing and are very large, especially if your events are single-line events, you might have event breaking issues.

Diagnosis

To confirm that your Splunk software has event breaking issues, do one or more of the following:

  • View the Monitoring Console Data Quality dashboard.
  • Search for events, and find that they are multiple events mashed together.
  • Check splunkd.log for messages like the following:
12-07-2016 09:32:32.876 -0500 WARN  AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded
12-07-2016 09:32:32.876 -0500 WARN  AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (256) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only.

Solution

For line and event breaking, determine whether this is happening because either (1) your events are properly recognized but too large for the limits in place (MAX_EVENTS, which defines the maximum number of lines in an event), or (2) your events are not properly recognized.

If the cause is scenario 1, you can increase limits. But be aware that large events are not optimal for indexing performance, search performance, and resource usage. Large events can be costly to search. The upper values of both limits result in 10,000 characters per line, as defined by TRUNCATE, times 256 lines, as set by MAX_EVENTS. The combination of those two limits is a very large event.

If the cause is scenario 2, which is more likely, your Splunk software is not breaking events as it should. Check the following:

  • Your event breaking strategy. The default is to break before the date, so if Splunk software does not extract a time stamp, it does not break the event. To diagnose and resolve, investigate time stamp extraction. See How timestamp assignment works.
  • Your event breaking regex.

For more information:

Time stamping issues

Time stamping issues can pertain to the DATETIME_CONFIG, TIME_PREFIX, TIME_FORMAT, MAX_TIMESTAMP_LOOKAHEAD, or TZ settings in props.conf. See props.conf.spec in the Admin Manual.

See How timestamp assignment works.

Problem

Indicators that you have time stamp parsing issues include:

  • Timestamp parsing issues present in the Monitoring Console Data Quality dashboard.
  • An error in the Splunk Web Data Input work flow.
  • Count events. If you are missing events and have very large events, especially if your events are single-line events, parsing might be a problem.
  • Less acute problems like time zone not properly assigned
  • The value of _time assigned by Splunk software does not match the time in the raw data.

Diagnosis

To confirm that you have a time stamping issue, do one or more of the following:

  • Visit the Monitoring Console Data Quality dashboard. Check for timestamp parsing issues in the table. Time stamp assignment resorts to various fallbacks, as described in How timestamp assignment works. For most of the fallbacks, even if one of them successfully assigns a time stamp, you still get an issue in the Monitoring Console dashboard.
  • Search for events, find that they are multiple events combined.
  • Look in splunkd.log for messages like:
12-09-2016 00:45:29.956 -0800 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Fri Dec 9 00:45:27 2016). Context: source::/disk2/sh-demo/splunk/var/log/splunk/entity.log|host::svdev-sh-demo|entity-too_small|682235
12-08-2016 12:33:56.025 -0500 WARN  AggregatorMiningProcessor - Too many events (100K) with the same timestamp: incrementing timestamps 1 second(s) into the future to insure retrievability

All events are indexed with the same time stamp, which makes searching that time range ineffective.

Solution

To resolve a time stamping issue:

  • Make sure that each event has a complete time stamp, including a year, full date, full time, and a time zone.
  • See Configure time stamp recognition for additional possible resolution steps.
PREVIOUS
Troubleshoot the input process
 

This documentation applies to the following versions of Splunk® Enterprise: 6.5.1612 (Splunk Cloud only), 6.6.0, 6.6.1, 6.6.2


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters