Splunk Cloud

Getting Data In

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

How Splunk Cloud handles log file rotation

Splunk Cloud recognizes when a file that it is monitoring (such as /var/log/messages) has been rolled by the operating system (/var/log/messages1) and will not read the rolled file a second time.

The monitoring processor picks up new files and reads the first 256 bytes of the file. The processor then hashes this data into a begin and end cyclic redundancy check (CRC), which functions as a fingerprint representing the file content. Splunk Cloud uses this CRC to look up an entry in a database that contains all the beginning CRCs of files it has seen before. If successful, the lookup returns a few values, but the important ones are a seekAddress, meaning the number of bytes into the known file that Splunk Cloud has already read, and a seekCRC which is a fingerprint of the data at that location.

Using the results of this lookup, Splunk Cloud can categorize the file.

There are three possible outcomes of a CRC check:

  • No matching record for the CRC from the file beginning in the database. This indicates a new file. Splunk Cloud picks it up and consumes its data from the start of the file. Splunk Cloud updates the database with the new CRCs and Seek Addresses as it consumes the file.
  • A matching record for the CRC from the file beginning in the database, the content at the Seek Address location matches the stored CRC for that location in the file, and the size of the file is larger than the Seek Address that Splunk Cloud stored. While Splunk Cloud has seen the file before, data has been added since it was last read. Splunk Cloud opens the file, seeks to Seek Address--the end of the file when Splunk Cloud last finished with it--and starts reading the new from that point.
  • A matching record for the CRC from the file beginning in the database, but the content at the Seek Address location does not match the stored CRC at that location in the file. Splunk Cloud has read some file with the same initial data, but either some of the material that it read has been modified in place, or it is in fact a wholly different file which begins with the same content. Because the database for content tracking is keyed to the beginning CRC, it has no way to track progress independently for the two different data streams, and further configuration is required.

Because the CRC start check runs against only the first 256 bytes of the file by default, it is possible for non-duplicate files to have duplicate start CRCs, particularly if the files are ones with identical headers. To handle such situations you can:

  • Use the initCrcLength attribute in inputs.conf to increase the number of characters used for the CRC calculation, and make it longer than your static header.
  • Use the crcSalt attribute when configuring the file in inputs.conf, as described in "Monitor files and directories with inputs.conf" in this manual. The crcSalt attribute, when set to <SOURCE>, ensures that each file has a unique CRC. The effect of this setting is that Splunk Cloud assumes that each path name contains unique content.

Do not use crcSalt = <SOURCE> with rolling log files, or any other scenario in which logfiles get renamed or moved to another monitored location. Doing so prevents Splunk Cloud from recognizing log files across the roll or rename, which results in the data being reindexed.

Last modified on 18 June, 2020
Include or exclude specific incoming data
Get data from TCP and UDP ports

This documentation applies to the following versions of Splunk Cloud: 7.0.13, 7.2.10, 8.0.2006, 8.0.2007, 8.1.2008, 8.1.2009, 8.1.2011, 8.1.2012, 8.1.2101

Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters