
Whitelist or blacklist specific incoming data
Use whitelist and blacklist rules to explicitly tell Splunk which files to consume when monitoring directories. You can also apply these settings to batch
inputs. When you define a whitelist, Splunk indexes only the files in that list. When you define a blacklist, Splunk ignores the files in that list and consumes everything else. You define whitelists and blacklists in the particular input's stanza in inputs.conf
.
You don't have to define both a whitelist and a blacklist; they are independent settings. If you do define both and a file matches both of them, that file will not be indexed; blacklist
will override whitelist
.
Whitelist and blacklist rules use regular expression syntax to define the match on the file name/path. Also, your rules must be contained within a configuration stanza, for example [monitor://<path>]
; those outside a stanza (global entries) are ignored.
Instead of whitelisting or blacklisting your data inputs, you can filter specific events and send them to different queues or indexes. Read more about routing and filtering data. You can also use the crawl feature to predefine files you want Splunk to index or not index automatically when they are added to your file system.
Important: Define whitelist and blacklist entries with exact regex syntax; the "..." wildcard used for input paths (described here) is not supported.
Whitelist (allow) files
To define the files you want Splunk to exclusively index, add the following line to your monitor
stanza in the /local/inputs.conf
file for the app this input was defined in:
whitelist = <your_custom regex>
For example, if you want Splunk to monitor only files with the .log
extension:
[monitor:///mnt/logs] whitelist = \.log$
You can whitelist multiple files in one line, using the "|" (OR) operator. For example, to whitelist filenames that contain query.log OR my.log
:
whitelist = query\.log$|my\.log$
Or, to whitelist exact matches:
whitelist = /query\.log$|/my\.log$
Note: The "$" anchors the regex to the end of the line. There is no space before or after the "|" operator.
For information on how whitelists interact with wildcards in input paths, see "Wildcards and whitelisting".
Blacklist (ignore) files
To define the files you want Splunk to exclude from indexing, add the following line to your monitor
stanza in the /local/inputs.conf
file for the app this input was defined in:
blacklist = <your_custom regex>
Important: If you create a blacklist
line for each file you want to ignore, Splunk activates only the last filter.
If you want Splunk to ignore and not monitor only files with the .txt
extension:
[monitor:///mnt/logs] blacklist = \.(txt)$
If you want Splunk to ignore and not monitor all files with either the .txt
extension OR
the .gz
extension (note that you use the "|" for this):
[monitor:///mnt/logs] blacklist = \.(txt|gz)$
If you want Splunk to ignore entire directories beneath a monitor input refer to this example:
[monitor:///mnt/logs] blacklist = (archive|historical|\.bak$)
The above example tells Splunk to ignore all files under /mnt/logs/ within the archive or historical directories and all files ending in *.bak.
If you want Splunk to ignore files that contain a specific string you could do something like this:
[monitor:///mnt/logs] blacklist = 2009022[8|9]file\.txt$
The above example will ignore the webserver20090228file.txt and webserver20090229file.txt files under /mnt/logs/.
PREVIOUS Specify input paths with wildcards |
NEXT How Splunk handles log file rotation |
This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18
Comments
For a Windows installation, this is terribly confusing. Am I doing this from inside the web-based interface? The "monitor stanza in the /local/inputs.conf file" means absolutely nothing to me. Do Windows expressions work? For example, if I wanted to omit .7z files, could I just put *.7z in the Blacklist options when setting up a new Data Input?
If you are going to whitelist multiple files then you need to group them in parentheses, ie: whitelist = (query\.log$|my\.log$). I only tested whitellists and on Linux 2.6 64-bit.
Can you apply filters to the contents of a compressed archive? e.g. file.tar.gz:/folder_1/folder_a etc...
Moreymic - <br /><br />While this topic describes how to specify whitelists/blacklists by directly editing the inputs.conf file, you can also specify a whitelist or blacklist in Splunk Manager (the web interface) when adding a file or directory input. Click on the "more settings" option on the "files & directories" data input page; you'll see the whitelist/blacklist fields at the end of the page. You need to specify a regex expression.