Getting Data In

 


Specify input paths with wildcards

Specify input paths with wildcards

This topic is only relevant when using inputs.conf to specify inputs, as described in "Edit inputs.conf" in this manual.

Important: Input path specifications in inputs.conf don't use regex-compliant expressions but rather Splunk-defined wildcards.

Wildcard overview

A wildcard is a character that you can substitute for one or more unspecified characters when searching text or selecting multiple files or directories. In Splunk, you can use wildcards to specify your input path for monitored input.

Wildcard Description Regex equivalent Example(s)
... The ellipsis wildcard recurses through directories and any number of levels of subdirectories to find matches. .* /foo/.../bar/* matches the files /foo/bar, /foo/1/bar, /foo/2/bar, /foo/1/2/bar, etc.

Note: Because a single ellipse recurses through all directories and subdirectories, /foo/.../bar matches the same as /foo/.../.../bar

* The asterisk wildcard matches anything in that specific directory path segment.

Unlike "...", "*" doesn't recurse through any subdirectories.

[^/]* /foo/*/bar matches the files /foo/bar, /foo/1/bar, /foo/2/bar, etc. However, it does not match /foo/1/2/bar.

/foo/m*r/bar matches /foo/mr/bar, /foo/mir/bar, /foo/moor/bar, etc.

/foo/*.log matches all files with the .log extension, such as /foo/bar.log. It does not match /foo/bar.txt or /foo/bar/test.log.

Note: A single dot (.) is not a wildcard, and is the regex equivalent of \..

For more specific matches, combine the ... and * wildcards. For example, /foo/.../bar/* matches any file in the /bar directory within the specified path.

Caution: In Windows, you cannot currently use a wildcard at the root level. For example, this does not work:

[monitor://E:\...\foo\*.log]

Splunk logs an error and fails to index the desired files.

This is a known issue, described in the Known Issues topic of the Release Notes. Look there for details on all known issues.

Wildcards and regular expression metacharacters

When determining the set of files or directories to monitor, Splunk splits elements of a monitoring stanza into segments - defined as text between directory separator characters ("/" or "\") in the stanza definition. If you specify a monitor stanza that contains segments with both wildcards and regular expression (regex) metacharacters (such as (, ), [, ], and |), those characters behave differently depending on where the wild card is in the stanza.

If a monitoring stanza contains a segment with regex metacharacters before a segment with wildcards, Splunk treats the metacharacters literally, as if you wanted to monitor files or directories with those characters in the files' or directories' names. For example:

[monitor://var/log/log(a|b).log]

monitors the /var/log/log(a|b).log file. Splunk does not treat the (a|b) as a regular expression because there are no wildcards present.

[monitor://var/log()/log*.log]

monitors all files in the /var/log()/ directory which begin with log and have the extension .log. Splunk does not treat the () as a regular expression because the regex is in the segment before the wildcard.

If the regex metacharacters occur within or after a segment that contains a wildcard, Splunk treats the metacharacters as a regex and matches files to monitor accordingly. For example:

[monitor://var/log()/log(a|b)*.log]

monitors all files in the /var/log()/ directory which begin with either loga or logb and have the extension .log. Splunk does not treat the first set of () as a regex because the wild card is in the following segment. The second set of () gets treated as a regex because it is in the same segment as the wildcard '*'.

[monitor://var/.../log(a|b).log]

monitors all files in any subdirectory of the /var/ directory named loga.log and logb.log. Splunk treats (a|b) as a regex because of the wildcard '...' in the previous stanza segment.

[monitor://var/.../log[A-Z0-9]*.log]

monitors all files in any subdirectory of the /var/ directory which:

  • begin with log, then
  • contain a single capital letter (from A-Z) or number (from 0-9), then
  • contain any other characters, then
  • end in .log.

Splunk treats [A-Z0-9]* as a regex because of the wildcard '...' in the previous stanza segment.

Multiple stanzas with wildcards against a common directory

Specifying multiple stanzas with wildcards to a different subset of files in the same directory might result in inconsistent indexing behavior. To monitor multiple subsets of files in the same directory, create a single stanza at the top level of the directory whose files you want to monitor, and configure the whitelist attribute to a regular expression that represents the files you want to index. Then, if needed, override the sourcetype by following the instructions at "Override source types on a per-event basis" in this manual.

Input examples

To monitor /apache/foo/logs, /apache/bar/logs, /apache/bar/1/logs, etc.:

[monitor:///apache/.../logs/*]

To monitor /apache/foo/logs, /apache/bar/logs, etc., but not /apache/bar/1/logs or /apache/bar/2/logs:

[monitor:///apache/*/logs]

To monitor any file directly under /apache/ that ends in .log:

[monitor:///apache/*.log]

To monitor any file under /apache/ (under any level of subdirectory) that ends in .log:

[monitor:///apache/.../*.log]

Wildcards and whitelisting

Important: In Splunk, whitelists and blacklists are defined with standard PCRE regex syntax, unlike the file input path syntax described in the previous sections.

When you specify wildcards in a file input path, Splunk creates an implicit whitelist for that stanza. The longest wildcard-free path becomes the monitor stanza, and the wildcards are translated into regular expressions, as listed in the table above.

Note: In Windows, whitelist and blacklist rules do not support regexes that include backslashes; you must use two backslashes \\ to escape wildcards.

Additionally, the converted expression is anchored to the right end of the file path, so that the entire path must be matched.

For example, if you specify

[monitor:///foo/bar*.log]

Splunk translates this into

[monitor:///foo/]
whitelist = bar[^/]*\.log$


For more information on using whitelists with file inputs, see "Whitelist or blacklist specific incoming data".

This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 View the Article History for its revisions.


Comments

It would be good to have more detailed windows examples. I have c:\dir1\dir2\dir3\dir4\filename.txt I would like to index only the txt files in the dir4 directory
[monitor://c:\dir1\…]
sourcetype = Test
whitelist = *\dir4\.txt$
This seems to take all files where i only want dir4 text files.

Any ideas?

Andykiely
October 4, 2012

Regading vbumgarner's comment.

Both of the first two inputs will monitor either a file or a directory called foo, it simply depends upon what is actually on the disk when splunk starts up.

However the second stanza clearly shows the intent to monitor a directory, so perhaps it is useful in terms of a self-documenting configuration.

Jrodman
September 7, 2011

More Windows examples:

[monitor://c:\Program Files\foo]
will look for a FILE called foo.

[monitor://c:\Program Files\foo\]
will look for all files in the directory foo.

[monitor://c:\Program Files\foo\*.log]
will look for files matching *.log in the directory foo.

[monitor://c:\Program Files\foo\*.log]
will look for files matching *.log in the directory foo.

Vbumgarner
July 29, 2011

we need examples for monitoring windows directories; [monitor://c:\windows\system32\logfiles\w3svc1]

Pstraw
June 22, 2011

You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!