Admin Manual

 


How log file rotation is handled

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

How log file rotation is handled

Splunk recognizes when a file that it is monitoring (such as /var/log/messages) has been rolled (/var/log/messages1) and will not read the rolled file in a second time.

Note: Splunk does not recognize compressed files produced by logrotate (such as bz2 or gz) as the same as the uncompressed originals. This can lead to a duplication of data if these files are then monitored by Splunk. You can configure logrotate to move these files into a directory you have not told Splunk to read, or you can explicitly set blacklist rules in your input definitions for archive filetypes to prevent Splunk from reading these files as new logfiles.

Example:
 
inputs.conf:
[<your_stanza>]
_blacklist = \.(gz|bz2|z|zip)$ 

Splunk recognizes the following archive filetypes: tar, gz, bz2, tar.gz, tgz, tbz, tbz2, zip, and z.

For more information on setting blacklist rules see "Whitelist and blacklist specific incoming data" in this manual.

How log rotation works

The monitoring processor picks up new files and reads the first and last 256 bytes of the file. This data is hashed into a begin and end cyclic redundancy check (CRC). Splunk checks new CRCs against a database that contains all the CRCs of files Splunk has seen before. The location Splunk last read in the file is also stored.

There are three possible outcomes of a CRC check:

1. There is no begin and end CRC matching this file in the database. This is a new file and will be picked up and consumed from the start. Splunk updates the database with new CRCs and seekptrs as the file is being consumed.

2. The begin CRC is present and the end CRC are present but the size of the file is larger than the seekPtr Splunk stored. This means that, while Splunk has seen the file before, there has been information added to it since it was last read. Splunk opens the file and seeks to the previous end of the file and starts reading from there (so Splunk will only grab the new data and not anything it has read before).

3. The begin CRC is present but the end CRC does not match. This means the file has been changed since Splunk last read it and some of the portions it has read in already are different. In this case there is evidence that the previous data Splunk read from has been changed. In this case Splunk has no choice but to read the whole file again.

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!