Monitor files and directories
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Monitor files and directories
Splunk has two file input processors: monitor and upload. For the most part, you can use monitor to add all your data sources from files and directories. However, you may want to use upload when you want to add one-time inputs, such as an archive of historical data.
This topic discusses how to add monitor and upload inputs using Splunk Web and the configuration files. You can also add, edit, and list monitor inputs using the CLI; for more information, read this topic.
How monitor works in Splunk
Specify a path to a file or directory and Splunk's monitor processor consumes any new input. This is how you'd monitor live application logs such as those coming from J2EE or .Net applications, Web access logs, and so on. Splunk will continue to index the data in this file or directory as it comes in. You can also specify a mounted or shared directory, including network filesystems, as long as the Splunk server can read from the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.
Splunk checks for the file or directory specified in a monitor configuration on Splunk server start and restart. If the file or directory specified is not present on start, Splunk checks for it again in 24 hour intervals from the time of the last restart. Subdirectories of monitored directories are scanned continuously. To add new inputs without restarting Splunk, use Splunk Web or the command line interface. If you want Splunk to find potential new inputs automatically, use crawl.
When using monitor, note the following:
- On most file systems, files can be read even as they are being written to. However, Windows file systems have the ability to prevent files from being read while they are being written, and some Windows programs may use these modes, though most do not. The notable exception to this is Windows event logs, however Splunk has an input dedicated to processing those kinds of log files.
- Files or directories can be included or excluded via whitelists and blacklists.
- Upon restart, Splunk continues processing files where it left off.
- Splunk decompresses archive files before it indexes them. It can handle these common archive file types: .tar, .gz, .bz2, .tar.bz2 , and .zip.
- Splunk detects log file rotation and does not process renamed files it has already indexed (with the exception of .tar and .gz archives; for more information see "Log file rotation" in this manual).
- The entire
dir/filenamepath must not exceed 1024 characters. - Set the source type for directories to Automatic. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
- Removing an input does not stop the the input's files from being indexed. Rather, it stops files from being checked again, but all the initial content will be indexed. To stop all in-process data, you must restart the Splunk server.
Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.
Note: Monitor input stanzas may not overlap. That is, monitoring /a/path while also monitoring /a/path/subdir will produce unreliable results. Similarly, monitor input stanzas that watch the same directory with different whitelists, blacklists, and wildcard components are not supported.
Why use upload or batch
Use the Upload a local file or Index a file on the Splunk server options to index a static file one time. The file will not be monitored on an ongoing basis.
Use the batch input type in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and then deletes it.
Note: For best practices on loading file archives, see "How to index different sized archives" on the Community Wiki.
Configure with Splunk Web
Add inputs from files and directories via Splunk Web.
1. Click Manager in the upper right-hand corner of Splunk Web.
2. Under System configurations, click Data Inputs.
3. Click Files and directories.
4. Click Add new to add an input.
5. Select a Source radio button:
- Monitor a file or directory. Sets up an ongoing input. Whenever data is added to this file or directory, Splunk will index it. See the next section for advanced options specific to this choice.
- Upload a local file. Uploads a file from your local machine into Splunk.
- Index a file on the Splunk server. Copies a file on the server into Splunk via the batch directory.
6. Specify the Full path to the file or directory.
To monitor a shared network drive, enter the following: <myhost><mypath> (or \\<myhost>\<mypath> on Windows). Make sure Splunk has read access to the mounted drive, as well as to the files you wish to monitor.
7. Under the Host section, set the host name value. You have several choices for this setting. Learn more about setting the host value in "About default fields".
Note: Host only sets the host field. It does not direct Splunk to look on a specific host on your network.
8. Set the Source type. Source type is a default field added to events. Source type is used to determine processing characteristics such as timestamps and event boundaries.
9. Set the Index. Leave the value as "default", unless you have defined multiple indexes to handle different types of events. In addition to indexes for user data, Splunk has a number of utility indexes, which show up in the dropdown box.
10. Click Save.
Advanced options for file/directory monitoring
If your choice for source is Monitor a file or directory, the page includes an Advanced Options section, which allows you to configure some additional settings:
- Follow tail. If checked, monitoring begins at the end of the file (like
tail -f). - Whitelist. If a path is specified, files from that path are monitored only if they match the specified regex.
- Blacklist. If a path is specified, files from that path are not monitored if they match the specified regex.
For detailed information on whitelists and blacklists, see Whitelist or blacklist specific incoming data in this manual.
Configure with inputs.conf
To add an input, add a stanza to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read "About configuration files" before you begin.
You can set multiple attributes in an input stanza. If you do not specify a value for an attribute, Splunk uses the default that's preset in $SPLUNK_HOME/etc/system/default/.
Note: To ensure that new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes it when it changes. Be aware that the entire file will be re-indexed, which can result in duplicate events.
Configuration settings
The following are options that you can use in both monitor and batch input stanzas. See the sections that follow for attributes that are specific to each type of input.
host = <string>
- Set the host value of your input to a static value.
- "host=" is automatically prepended to
<string>. - Defaults to the IP address or fully qualified domain name of the host where the data originated.
index = <string>
- Set the index where events from this input will be stored.
- "index=" is automatically prepended to
<string>. - Defaults to
main, or whatever you have set as your default index. - For more information about the index field, see "How indexing works" in this manual.
sourcetype = <string>
- Set the sourcetype name of events from this input.
- "sourcetype=" is automatically prepended to
<string>. - Splunk picks a sourcetype based on various aspects of your data. There is no hard-coded default.
- For more information about the sourcetype field, see "About default fields (host, source, sourcetype, and more)", in this manual.
source = <string>
- Set the source name of events from this input.
- Defaults to the file path.
- "source=" is automatically prepended to
<string>.
queue = parsingQueue | indexQueue
- Specifies where the input processor should deposit the events that it reads.
- Set to "parsingQueue" to apply
props.confand other parsing rules to your data. - Set to "indexQueue" to send your data directly into the index.
- Defaults to
parsingQueue.
_TCP_ROUTING = <tcpout_group_name>,<tcpout_group_name>,...
- Specifies a comma-separated list of tcpout group names.
- Using this attribute, you can selectively forward your data to specific indexer(s) by specifying the tcpout group(s) that the forwarder should use when forwarding your data.
- The tcpout group names are defined in
outputs.confin[tcpout:<tcpout_group_name>]stanzas. - This setting defaults to the groups present in 'defaultGroup' in
[tcpout]stanza in outputs.conf.
host_regex = <regular expression>
- If specified, the regex extracts host from the filename of each input.
- Specifically, the first group of the regex is used as the host.
- Defaults to the default "
host =" attribute, if the regex fails to match.
host_segment = <integer>
- If specified, a segment of the path is set as host, using
<integer>to determine which segment. For example, ifhost_segment = 2, host is set to the second segment of the path. Path segments are separated by the '/' character. - Defaults to the default "
host =" attribute, if the value is not an integer, or is less than 1.
Monitor syntax and examples
Monitor input stanzas direct Splunk to watch all files in the <path> (or just <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path. For more information, read how to "Specify input paths with wildcards".
[monitor://<path>] <attrbute1> = <val1> <attrbute2> = <val2> ...
The following are additional attributes you can use when defining monitor input stanzas:
crcSalt = <string>
- Use this setting to force Splunk to consume files that have matching CRCs. (Splunk only performs CRC checks against the first few lines of a file. This behavior prevents Splunk from indexing the same file twice, even though you may have renamed it -- as, for example, with rolling log files. However, because the CRC is based on only the first few lines of the file, it is possible for legitimately different files to have matching CRCs, particularly if they have identical headers.)
- If set,
stringis added to the CRC. - If set to
<SOURCE>, the full source path is added to the CRC. This ensures that each file being monitored has a unique CRC. - Be cautious about using this attribute with rolling log files; it could lead to the log file being re-indexed after it has rolled.
- Note: This setting is case sensitive.
followTail = 0|1
- If set to 1, monitoring begins at the end of the file (like
tail -f). - This only applies to files the first time they are picked up.
- After that, Splunk's internal file position records keep track of the file.
whitelist = <regular expression>
- If set, files from this path are monitored only if they match the specified regex.
blacklist = <regular expression>
- If set, files from this path are NOT monitored if they match the specified regex.
alwaysOpenFile = 0 | 1
- If set to 1, Splunk opens a file to check if it has already been indexed.
- Only useful for files that don't update modtime.
- Should only be used for monitoring files on Windows, and mostly for IIS logs.
- Note: This flag should only be used as a last resort, as it increases load and slows down indexing.
time_before_close = <integer>
- Modtime delta required before Splunk can close a file on EOF.
- Tells the system not to close files that have been updated in past
<integer>seconds. - Defaults to 3.
recursive = true|false
- If set to
false, Splunk will not go into subdirectories found within a monitored directory. - Defaults to
true.
followSymlink
- If
false, Splunk will ignore symbolic links found within a monitored directory. - Defaults to
true.
Example 1. To load anything in /apache/foo/logs or /apache/bar/logs, etc.
[monitor:///apache/.../logs]
Example 2. To load anything in /apache/ that ends in .log.
[monitor:///apache/*.log]
Batch syntax and examples
Use batch to set up a one time, destructive input of data from a source. This input is effective when, for example, you have a directory containing files whose data you wanted, but whose disk space utilization you did not.
Caution: For continuous, non-destructive inputs, use monitor. Remember, after the batch input is indexed, Splunk deletes the file.
[batch://<path>] move_policy = sinkhole <attrbute1> = <val1> <attrbute2> = <val2> ...
Important: When defining batch inputs, you must include the setting, move_policy = sinkhole. This loads the file destructively. Do not use this input type for files you do not want to consume destructively.
Note: source = <string> and <KEY> = <string> are not used by batch.
Example: This example batch loads all files from the directory /system/flight815/, but does not recurse through any subdirectories under it -- remove the asterisk and recursion will occur:
[batch://system/flight815/*] move_policy = sinkhole
For details on using the asterisk in input paths, see "Specify input paths with wildcards".
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.
Upload a local file. Uploads a file from your local machine into Splunk