Getting Data In

 


Monitor files and directories

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Monitor files and directories

Splunk has two file input processors: monitor and upload. For the most part, you can use monitor to add all your data sources from files and directories. However, you might want to use upload to add one-time inputs, such as an archive of historical data.

You can add inputs to monitor or upload using any of these methods:

How monitor works in Splunk

Specify a path to a file or directory and Splunk's monitor processor consumes any new data written to that file or directory. This is how you can monitor live application logs such as those coming from J2EE or .NET applications, Web access logs, and so on. Splunk continues to monitor and index the file or directory as new data appears. You can also specify a mounted or shared directory, including network file systems, so long as Splunk can read from the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.

Splunk checks for the file or directory specified in a monitor configuration on Splunk start and restart. If the file or directory is not present on start, Splunk checks for it again in 24 hour intervals from the time of the last restart. Subdirectories of monitored directories are scanned continuously. To add new inputs without restarting Splunk, use Splunk Web or the CLI. If you want Splunk to find potential new inputs automatically, use crawl.

When using monitor, note the following:

  • On most file systems, files can be read even as they are being written to. However, Windows file systems have the ability to prevent files from being read while they are being written, and some Windows programs may use these modes, though most do not.
  • Files or directories can be included or excluded via whitelists and blacklists.
  • Upon restart, Splunk continues processing files where it left off.
  • Splunk decompresses archive files before it indexes them. It can handle these common archive file types: .tar, .gz, .bz2, .tar.bz2 , and .zip.
  • Splunk detects log file rotation and does not process renamed files it has already indexed (with the exception of .tar and .gz archives; for more information see "Log file rotation" in this manual).
  • The entire dir/filename path must not exceed 1024 characters.
  • Removing an input does not stop the input's files from being indexed. Rather, it stops files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.

Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.

Note: Monitor inputs should not overlap. That is, monitoring /a/path while also monitoring /a/path/subdir will produce unreliable results. Similarly, monitor inputs that watch the same directory with different whitelists, blacklists, and wildcard components are not supported.

Why use upload or batch

To index a static file once, select Upload a local file or Index a file on the Splunk server in Splunk Web. The file will not be monitored on an ongoing basis.

You can also use the CLI add oneshot or spool commands for the same purpose. See Use the CLI for details.

Use the batch input type in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and then deletes it.

Note: For best practices on loading file archives, see "How to index different sized archives" on the Community Wiki.

This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 View the Article History for its revisions.


Comments

When using ../var/spool/splunk and setting "segment in path" and setting the "segment number" of the host, the directory is not deleted, either is the file within.

Pstein
August 10, 2011

You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!