How Splunk Cloud handles log file rotation
Splunk Cloud recognizes when a file that it is monitoring (such as
/var/log/messages) has been rolled by the operating system (
/var/log/messages1) and will not read the rolled file a second time.
The monitoring processor picks up new files and reads the first 256 bytes of the file. The processor then hashes this data into a begin and end cyclic redundancy check (CRC), which functions as a fingerprint representing the file content. Splunk Cloud uses this CRC to look up an entry in a database that contains all the beginning CRCs of files it has seen before. If successful, the lookup returns a few values, but the important ones are a seekAddress, meaning the number of bytes into the known file that Splunk Cloud has already read, and a seekCRC which is a fingerprint of the data at that location.
Using the results of this lookup, Splunk Cloud can categorize the file.
There are three possible outcomes of a CRC check:
- No matching record for the CRC from the file beginning in the database. This indicates a new file. Splunk Cloud picks it up and consumes its data from the start of the file. Splunk Cloud updates the database with the new CRCs and Seek Addresses as it consumes the file.
- A matching record for the CRC from the file beginning in the database, the content at the Seek Address location matches the stored CRC for that location in the file, and the size of the file is larger than the Seek Address that Splunk Cloud stored. While Splunk Cloud has seen the file before, data has been added since it was last read. Splunk Cloud opens the file, seeks to Seek Address--the end of the file when Splunk Cloud last finished with it--and starts reading the new from that point.
- A matching record for the CRC from the file beginning in the database, but the content at the Seek Address location does not match the stored CRC at that location in the file. Splunk Cloud has read some file with the same initial data, but either some of the material that it read has been modified in place, or it is in fact a wholly different file which begins with the same content. Because the database for content tracking is keyed to the beginning CRC, it has no way to track progress independently for the two different data streams, and further configuration is required.
Because the CRC start check runs against only the first 256 bytes of the file by default, it is possible for non-duplicate files to have duplicate start CRCs, particularly if the files are ones with identical headers. To handle such situations you can:
- Use the
inputs.confto increase the number of characters used for the CRC calculation, and make it longer than your static header.
- Use the
crcSaltattribute when configuring the file in
inputs.conf, as described in "Monitor files and directories with inputs.conf" in this manual. The
crcSaltattribute, when set to
<SOURCE>, ensures that each file has a unique CRC. The effect of this setting is that Splunk Cloud assumes that each path name contains unique content.
Do not use
crcSalt = <SOURCE> with rolling log files, or any other scenario in which logfiles get renamed or moved to another monitored location. Doing so prevents Splunk Cloud from recognizing log files across the roll or rename, which results in the data being reindexed.
Include or exclude specific incoming data
Get data from TCP and UDP ports
This documentation applies to the following versions of Splunk Cloud™: 7.0.13, 7.2.10, 8.0.2006, 8.0.2007, 8.1.2008, 8.1.2009, 8.1.2011, 8.1.2012, 8.1.2101