Monitor files and directories
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Monitor files and directories
Splunk has two file input processors: monitor and upload. For the most part, you can use monitor to add all your data sources from files and directories. However, you may want to use upload when you want to add one-time inputs, such as an archive of historical data.
This topic discusses how to add monitor and upload inputs using Splunk Web and the configuration files. You can also add, edit, and list monitor inputs using the CLI; for more information, read this topic.
How monitor works in Splunk
Specify a path to a file or directory and Splunk's monitor processor consumes any new input. This is how you'd monitor live application logs such as those coming from J2EE or .Net applications, Web access logs, and so on. Splunk will continue to index the data in this file or directory as it comes in. You can also specify a mounted or shared directory, including network filesystems, as long as the Splunk server can read from the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.
Splunk checks for the file or directory specified in a monitor configuration on Splunk server start and restart. If the file or directory specified is not present on start, Splunk checks for it again in 24 intervals from the time of the last restart. Subdirectories of monitored directories are scanned continuously. To add new inputs without restarting Splunk, use Splunk Web or the command line interface. If you want Splunk to find potential new inputs automatically, use crawl.
When using monitor:
- On most file systems, files can be read even as they are being written to. However, Windows file systems have the ability to prevent files from being read while they are being written, and some Windows programs may use these modes, though most do not.
- Files or directories can be included or excluded via whitelists and blacklists.
- Upon restart, Splunk continues processing files where it left off.
- Splunk decompresses archive files before it indexes them. It can handle the following common archive file types: .tar, .gz, .bz2, .tar.bz2 , and .zip.
- Splunk detects log file rotation and does not process renamed files it has already indexed (with the exception of .tar and .gz archives; for more information see "Log file rotation" in this manual).
- The entire dir/filename path must not exceed 1024 characters.
- Set the sourcetype for directories to Automatic. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
- Removing an input does not stop the the input's files from being indexed. Rather, it stops files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.
Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.
Note: Monitor input stanzas may not overlap. That is, monitoring /a/path while also monitoring /a/path/subdir will produce unreliable results. Similarly, monitor input stanzas which watch the same directory with different whitelists, blacklists, and wildcard components are not supported.
Why use upload or batch
Use the Upload a local file or Index a file on the Splunk server options to index a static file one time. The file will not be monitored on an ongoing basis.
Use the batch input type in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and then deletes it.
Note: For best practices on loading file archives, see "How to index different sized archives" on the Community Wiki.
Monitor files and directories in Splunk Web
Add inputs from files and directories via Splunk Web.
1. Click Manager in the upper right-hand corner of Splunk Web.
2. Under System configurations, click Data Inputs.
3. Click Files and directories.
4. Click New to add an input.
5. Choose the radio button you want. You can:
- Monitor a file or directory, which sets up an ongoing input--whenever more data is added to this file or directory, Splunk will index it.
- Upload a local file from your local machine into Splunk.
- Index a file on the Splunk server, which copies a file on the server into Splunk via the batch directory.
6. Specify the path to the file or directory. If you select Upload a local file, use the Browse... button.
To monitor a shared network drive, enter the following: <myhost><mypath> (or \\<myhost>\<mypath> on Windows). Make sure Splunk has read access to the mounted drive as well as the files you wish to monitor.
7. Under the Host heading, select the host name. You have several choices if you are using Monitor or Batch methods. Learn more about setting host value.
Note: Host only sets the host field in Splunk. It does not direct Splunk to look on a specific host on your network.
8. Now set the Source Type. Source type is a default field added to events. Source type is used to determine processing characteristics such as timestamps and event boundaries.
9. After specifying the source, host, and source type, click Submit.
Define input stanzas in inputs.conf
To add an input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read about configuration files in this manual before you begin.
You can set any number of attributes and values following an input type. If you do not specify a value for one or more attributes, Splunk uses the defaults that are preset in $SPLUNK_HOME/etc/system/default/.
Note: To ensure new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes when it changes. Note that the entire file is indexed, which can result in duplicate events.
The following are options that you can use in both monitor and batch input stanzas. See the sections following for more attributes that are specific to each type of input.
host = <string>
- Set the host value of your input to a static value.
-
host=is automatically prepended to the value when this shortcut is used. - Defaults to the IP address of fully qualified domain name of the host where the data originated.
index = <string>
- Set the index where events from this input will be stored.
-
index=is automatically prepended to the value when this shortcut is used. - Defaults to
main(or whatever you have set as your default index). - For more information about the index field, see "How indexing works" in this manual.
sourcetype = <string>
- Set the sourcetype name of events from this input.
-
sourcetype=is automatically prepended to the value when this shortcut is used. - Splunk automatically picks a source type based on various aspects of your data. There is no hard-coded default.
- For more information about the sourcetype field, see the "About sourcetypes," in the Knowledge Manager Manual.
source = <string>
- Set the source name of events from this input.
- Defaults to the file path.
-
source=is automatically prepended to the value when this shortcut is used.
queue = <string> (parsingQueue, indexQueue, etc)
- Specify where the input processor should deposit the events that it reads.
- Can be any valid, existing queue in the pipeline.
- Defaults to
parsingQueue.
host_regex = <regular expression>
- If specified, the regex extracts host from the filename of each input.
- Specifically, the first group of the regex is used as the host.
- Defaults to the default
host=attribute if the regex fails to match.
host_segment = <integer>
- If specified, the '/' separated segment of the path is set as host.
- Defaults to the default
host::attribute if the value is not an integer, or is less than 1.
Monitor syntax and examples
Monitor input stanzas direct Splunk to watch all files in the <path> (or just <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path. For more information, read how to "Specify input paths with wildcards".
[monitor://<path>] <attrbute1> = <val1> <attrbute2> = <val2> ...
The following are additional attributes you can use when defining monitor input stanzas.
crcSalt = <string>
- If set, this string is added to the CRC.
- Use this setting to force Splunk to consume files that have matching CRCs.
- If set to
crcSalt = <SOURCE>(note: This setting is case sensitive), then the full source path is added to the CRC.
followTail = 0|1
- If set to 1, monitoring begins at the end of the file (like
tail -f). - This only applies to files the first time they are picked up.
- After that, Splunk's internal file position records keep track of the file.
_whitelist = <regular expression>
- If set, files from this path are monitored only if they match the specified regex.
_blacklist = <regular expression>
- If set, files from this path are NOT monitored if they match the specified regex.
Example 1. To load anything in /apache/foo/logs or /apache/bar/logs, etc.
[monitor:///apache/.../logs]
Example 2. To load anything in /apache/ that ends in .log.
[monitor:///apache/*.log]
Batch syntax and examples
Use batch to set up a one time, destructive input of data from a source. For continuous, non-destructive inputs, use monitor. Remember, after the batch input is indexed, Splunk deletes the file.
[batch://<path>] move_policy = sinkhole <attrbute1> = <val1> <attrbute2> = <val2> ...
Important: When defining batch inputs, you must include the setting, move_policy = sinkhole. This loads the file destructively. Do not use this input type for files you do not want to consume destructively.
Note: source = <string> and <KEY> = <string> are not used by batch.
Example: This example batch loads all files from the directory /system/flight815/.
[batch://system/flight815/*] move_policy = sinkhole
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.