Admin Manual

 


About the Splunk Admin Manual
How Splunk Works

Files and directories

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Files and directories

Point Splunk at a file or a directory. If you specify a directory, Splunk consumes everything in the directory. Splunk has two different file input processors: monitor and batch. For the most part, use monitor to input all your data sources from files and directories. The only time you should use batch is to load a large archive of historical files. Read on for more specifics.

Monitor

Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, including network filesystems, as long as the Splunk server can read from the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files.

Splunk checks for the file or directory specified in a monitor configuration on Splunk server start and restart. If the file or directory specified is not present on start, Splunk checks for it again in 24 intervals from the time of the last restart. Subdirectories of monitored directories are scanned continuously. To add new inputs without restarting Splunk, use Splunk Web or the command line interface. If you want Splunk to find potential new inputs automatically, use crawl.

When using monitor:

Note: You cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes in a directory, use file system change monitor. If you want to index new events in a directory, use monitor.

Note: Monitor input stanzas may not overlap. That is, monitoring /a/path while also monitoring /a/path/subdir will produce unreliable results. Similarly, monitor input stanzas which watch the same directory with different whitelists, blacklists, and wildcard components are not supported.

Batch

Use the batch processor at the CLI or in inputs.conf to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and then deletes it.

Note: Batch is most useful for loading in historical data, such as large archives of files. For best practices on loading file archives, see "How to index different sized archives".


Splunk Web

Add inputs from files and directories via Splunk Web.

1. Click Admin in the upper right-hand corner of Splunk Web.

2. Then click Data Inputs.

3. Pick files and directories.

4. Click New Input to add an input.

5. Under Data access, pick Monitor a directory.

You can also:

6. Specify the pathname to the file or directory. If you select Upload, use the Browse... button.

To monitor a shared network drive, enter the following: <myhost><mypath> (or \\<myhost>\<mypath> on Windows). Make sure your Splunk server has read access to the mounted drive as well as the files you wish to monitor.

7. Under the Host heading, select the host name. You have several choices if you are using Monitor or Batch methods. Learn more about setting host value.

Note: Host only sets the host field in Splunk. It does not direct Splunk to look on a specific host on your network.

8. Now set the Source Type. Source type is a default field added to events. Source type is used to determine processing characteristics such as timestamps and event boundaries. Learn more about source type.

9. After specifying the source, host, and source type, click Submit.

CLI

Monitor files and directories via Splunk's Command Line Interface (CLI). To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command from the UNIX or Windows command prompt. Or add Splunk to your path and use the splunk command.

If you get stuck, Splunk's CLI has built-in help. Access the main CLI help by typing splunk help. Individual commands have their own help pages as well -- type splunk help <command>.

The following commands are available for input configuration via the CLI:

Command Command syntax Action
add add monitor $SOURCE [-parameter value] ... Add inputs from $SOURCE.
edit edit monitor $SOURCE [-parameter value] ... Edit a previously added input for $SOURCE.
remove remove monitor $SOURCE Remove a previously added $SOURCE.
list list monitor List the currently configured monitor.
spool spool source Copy a file into Splunk via the sinkhole directory.

Change the configuration of each data input type by setting additional parameters. Parameters are set via the syntax: -parameter value.

Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.

Parameter Required? Description
source Required Path to the file or directory to monitor for new input.
sourcetype Optional Specify a sourcetype field value for events from the input source.
index Optional Specify the destination index for events from the input source.
hostname Optional Specify a host name to set as the host field value for events from the input source.
hostregex Optional Specify a regular expression on the source file path to set as the host field value for events from the input source.
hostsegmentnum Optional Set the number of segments of the source file path to set as the host field value for events from the input source.
follow-only Optional (T/F) True or False. Default False. When set to True, Splunk will read from the end of the source (like the "tail -f" Unix command).

Example: use the CLI to monitor /var/log/

The following example shows how to monitor files in /var/log/:

Add /var/log/ as a data input:

./splunk add monitor /var/log/

Example: use the CLI to monitor windowsupdate.log

The following example shows how to monitor the Windows Update log (where Windows logs automatic updates):

Add C:\Windows\windowsupdate.log as a data input:

./splunk add monitor C:\Windows\windowsupdate.log

Example: use the CLI to monitor IIS logging

This example shows how to monitor the default location for Windows IIS logging: Add C:\windows\system32\LogFiles\W3SVC as a data input:

./splunk add monitor c:\windows\system32\LogFiles\W3SVC 

Inputs.conf

To add an input, add a stanza for it to inputs.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read how configuration files work before you begin.

You can set any number of attributes and values following an input type. If you do not specify a value for one or more attributes, Splunk uses the defaults that are preset in $SPLUNK_HOME/etc/system/default/ (noted below).

Monitor

[monitor://<path>]
<attrbute1> = <val1>
<attrbute2> = <val2>
...

This type of input stanza (monitor) directs Splunk to watch all files in the <path> (or just <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path. For more information, see the "Wildcards" subsection, below.

Note: To ensure new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes when it changes. Note that the entire file is indexed, which can result in duplicate events.

host = <string>

index = <string>

sourcetype = <string>

source = <string>

queue = <string> (parsingQueue, indexQueue, etc)

host_regex = <regular expression>

host_segment = <integer>

crcSalt = <string>

followTail = 0|1

_whitelist = <regular expression>

_blacklist = <regular expression>

Wildcards

You can use wildcards to specify your input path for monitored input. Use ... for paths and * for files.

Note: In Windows, you must use two backslashes \\ to escape wildcards. Regexes with backslashes in them are not currently supported for _whitelist and _blacklist in Windows.

Specifying wildcards results in an implicit _whitelist created for that stanza. The longest fully qualified path is used as the monitor stanza, and the wildcards are translated into regular expressions using the following map:


wildcard regex meaning
* [^/]* anything but /
... .* anything (greedy)
. \. literal .

Additionally, the converted expression is anchored to the right end of the file path, so that the entire path must be matched.

For example, if you specify

[monitor:///foo/bar*.log]

Splunk translates this into

[monitor:///foo/]
_whitelist = bar[^/]*\.log$

As a consequence, you can't have multiple stanzas with wildcards for files in the same director.

Also, you cannot use a _whitelist declaration in conjunction with wildcards.

For example:

[monitor:///foo/bar_baz*]
[monitor:///foo/bar_qux*]

This results in overlapping stanzas indexing the directory /foo/. Splunk takes the first one, so only files starting with /foo/bar_baz will be indexed. To include both sources, manually specify a _whitelist using regular expression syntax for "or":

[monitor:///foo]
_whitelist = (bar_baz[^/]*|bar_qux[^/]*)$

Note: To set any additional attributes (such as sourcetype) for multiple whitelisted/blacklisted inputs that may have different attributes, use props.conf.

Examples

To load anything in /apache/foo/logs or /apache/bar/logs, etc.

[monitor:///apache/.../logs]

To load anything in /apache/ that ends in .log.

[monitor:///apache/*.log]

Batch

[batch://<path>]
move_policy = sinkhole
<attrbute1> = <val1>
<attrbute2> = <val2>
...

Use batch to set up a one time, destructive input of data from a source. For continuous, non-destructive inputs, use monitor.

Note: You must set move_policy = sinkhole. This loads the file destructively. Do not use this input type for files you do not want to consume destructively.

host = <string>

index = <string>

sourcetype = <string>

source = <string>

queue = <string> (parsingQueue, indexQueue, etc)

host_regex = <regular expression>

host_segment = <integer>

Note: source = <string> and <KEY> = <string> are not used by batch.

Example

This example batch loads all files from the directory /system/flight815/.

[batch://system/flight815/*]
move_policy = sinkhole

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.