Splunk® Enterprise

Getting Data In

Download manual as PDF

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

How Splunk handles log file rotation

Splunk recognizes when a file that it is monitoring (such as /var/log/messages) has been rolled (/var/log/messages1) and will not read the rolled file a second time.

How Splunk recognizes log rotation

The monitoring processor picks up new files and reads the first 256 bytes of the file. This data is hashed into a begin and end cyclic redundancy check (CRC), which functions as a fingerprint representing the file content. Splunk uses this CRC to look up an entry in a database that contains all the beginning CRCs of files Splunk has seen before. If successful, the lookup returns a few values, but the important ones are a seekAddress, meaning the number of bytes into the known file that Splunk has already read, and a seekCRC which is a fingerprint of the data at that location.

Using the results of this lookup Splunk can attempt to categorize the file.

There are three possible outcomes of a CRC check:

1. There is no matching record for the CRC from the file beginning in the database. This indicates a new file. Splunk will pick it up and consume its data from the start of the file. Splunk updates the database with the new CRCs and Seek Addresses as the file is being consumed.

2. There is a matching record for the CRC from the file beginning in the database, the content at the Seek Address location matches the stored CRC for that location in the file, and the size of the file is larger than the Seek Address that Splunk stored. This means that while Splunk has seen the file before, there has been data added to it since it was last read. Splunk opens the file, seeks to Seek Address--the end of the file when Splunk last finished with it--and starts reading from there. In this way, Splunk will only read the new data and not anything it has read before.

3. There is a matching record for the CRC from the file beginning in the database, but the content at the Seek Address location does not match the stored CRC at that location in the file. This means that Splunk has previously read some file with the same initial data, but either some of the material that it read has since been modified in place, or it is in fact a wholly different file which simply begins with the same content. Since Splunk's database for content tracking is keyed to the beginning CRC, it has no way to track progress independently for the two different data streams, and further configuration is required.

Important: Since the CRC start check is run against only the first 256 bytes of the file by default, it is possible for non-duplicate files to have duplicate start CRCs, particularly if the files are ones with identical headers. To handle such situations you can

  • Use the initCrcLength attribute to increase the number of characters used for the CRC calculation, and make it longer than your static header.
  • Use the crcSalt attribute when configuring the file in inputs.conf, as described in "Edit inputs.conf" in this manual. The crcSalt attribute ensures that each file has a unique CRC. The effect of this setting is that each pathname is assumed to contain unique content. You do not want to use this attribute with rolling log files, or any other scenario in which logfiles are renamed or moved to another monitored location, because it defeats Splunk's ability to recognize rolling logs and will cause Splunk to re-index the data.
PREVIOUS
Whitelist or blacklist specific incoming data
  NEXT
Get data from TCP and UDP ports

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17


Comments

To me the explanation seems incorrect:<br /> - case 2: matching begin CRC and end CRC, but wrong seekPtr<br /> this is a not really an existing file, must be reread as explained under (3)<br /> - case 3: begin CRC is present, but the end CRC does not match (and seekPtr does not match, too, I would add)<br /> this is a recently rotated log file; should be re-read starting from seekPtr, as explained in (2)<br /> - case 4 is missing: matching begin CRC and end CRC, and same seekPtr<br /> this is an existing file, do nothing

Icssupport
November 30, 2012

About the length of the CRC, the default is 256 chars, but since 5.0 you can increase it, with initCrcLength.<br />see inputs.conf specifications.

Ykherian, Splunker
November 26, 2012

It might be preferable to allow CRC to be calculated over a larger portion of the file (e.g. the first 1024 bytes) rather than using the crcSalt. Using crcSalt= still might allow the same file with a different name to be read in twice.

Supersleepwalker
April 13, 2012

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters