How input configuration works
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
How input configuration works
Splunk consumes any data you point it at. Before indexing data, you must add your data source as an input. The source is then listed as one of Splunk's default fields (whether it's a file, directory or network port).
Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents.
Data input methods
Specify data inputs via the following methods:
- Splunk Web.
- Splunk's CLI.
- The inputs.conf configuration file.
- Data distribution.
Most data sources can be specified via Splunk Web. For more extensive configuration options, use inputs.conf. Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/system/local/inputs.conf. Configure Windows inputs via inputs.conf as well.
Sources
Splunk accepts data inputs from a wide range of sources. Here's a basic overview of your options. Read on through the Data Inputs and Data Distribution sections of this manual for configuration specifics.
Files and directories
Many data inputs come directly from files and directories. For the most part, you can use Splunk's monitor processor to index data in files and directories. If you have a large archive of historical data, you may want to use batch. Data sent via batch is loaded once and the original files are deleted when Splunk is done indexing them. Keep this in mind when using batch input.
You can also configure Splunk's file system change monitor to watch for changes in your file system. However, you cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes (eg. file edits, ownership changes) in a directory, use file system change monitor. If you want to index new events (eg. from log files) in a directory, use monitor.
To configure files and directories, see files and directories.
To configure file system change monitor, see the page on file system change monitor.
Monitor
Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, as long as the Splunk server can see the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files. Splunk only checks for files and directories each time the Splunk server starts/restarts, so be sure to add new sources when they become available if you don't want to restart the server. You can also use crawl to discover new sources
When using monitor:
- Files can be opened or closed for writing. Splunk consumes files even if they're still being written to by the operating system.
- Files or directories can be included or excluded via whitelists and blacklists. For more information, see "Whitelist and blacklist rules" in this manual.
- Upon restart, Splunk continues processing files where it left off.
- Splunk unpacks compressed archive files before it reads them. Splunk can handle the following common archive filetypes: tar, gz, bz2, tar.gz, tgz, tbz, tbz2, zip, and z, and it processes compressed files according to their extension. Keep in mind that unpacking large amounts of compressed files can cause performance issues, so you may want to store old archive files where they are not monitored by Splunk.
- Splunk detects log file rotation and does not process renamed files it has already indexed, with the exception of archive filetypes such as .tar and .gz, which it will not recognize as being the same as the uncompressed originals (you can exclude them with the blacklist functionality mentioned above). For more information see "Log file rotation" in this manual.
- The entire path dir/filename for a monitored file must not exceed 993 characters. Paths longer than this are indexed, but the soure key is truncated.
- Set the sourcetype to Automatic when you monitor a directory. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
- Removing an input does not stop Splunk from indexing files right away. The input will be disabled when the Splunk server is restarted. Additionally, some small amount of data already read from these files may be indexed after the restart.
Note: Splunk rescans the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents.
Important: To avoid performance issues, Splunk recommends that you set followTail=1 in inputs.conf if you are deploying Splunk to systems containing significant quantities of historical data. Setting followTail=1 for a monitor input means that any new incoming data is indexed when it arrives, but anything already in files on the system when Splunk was first started will not be indexed.
For the curious, some detail on How Splunk Reads Input Files is available on the Community wiki.
Upload files
Upload files directly through Splunk Web. If necessary, Splunk decompresses files before indexing. Uploading files through Splunk Web places them in the spool directory $SPLUNK_HOME/var/spool/splunk.
Use the batch processor at the CLI to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and deletes it. You should only use this for large archives of historical data. For most inputs, use monitor.
FIFO queues
Caution: Due to common issues with deadlock and data loss, the use of FIFOs is not recommended. Monitor is a more reliable, stable method. Support for FIFO inputs is deprecated and will be removed in a future release of Splunk.
A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.
To configure FIFO cues, see "FIFO" in this manual.
Network ports
You can configure Splunk with an Enterprise license to listen on any network port. This is the best method to send data to your Splunk server from any machine (see data distribution for more information). When configuring network ports, keep in mind that you cannot use privileged ports (i.e. any port lower than 1025) if you have not installed Splunk as root on Linux, Unix, Mac, or FreeBSD. Windows does not implement privileged ports, so Splunk can bind to any port when running under any user context.
To configure network ports, see "Network ports" in this manual.
UDP
UDP is a best effort protocol, so you may experience data loss under certain conditions such as high network or system utilization. Use UDP inputs only when the sending device does not support TCP.
Splunk with an Enterprise license can listen for data on any UDP port. When configured to listen on UDP port 514, Splunk eliminates the need to install and configure a syslog server to listen for syslog data sent from remote hosts.
TCP
TCP is a reliable, connection-oriented protocol that should be used instead of UDP to transmit and receive data whenever possible. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other application that transmit via TCP. TCP is the foundation of Splunk's data distribution architecture.
Scripted inputs
Configure Splunk to run shell commands on a schedule, and then index whatever the command writes to standard output.
For example:
- vmstat, iostat, netstat, and any other network or system status commands.
- SQL DBI.
- HTTP and HTTPS requests.
- SNMP.
See configure scripted inputs for details on setting this up.
Windows data sources
By default, Splunk for Windows indexes the Windows Application, System, and Security event logs. Splunk for Windows can also monitor and index changes to your registry and accept WMI data input. For more information on configuring Splunk for Windows, see "Windows inputs" in this manual.
Crawl
Discover new inputs automatically. Crawl uses rules you configure to traverse any directory structure. Splunk adds new inputs you find via crawl to inputs.conf.
Data processing
Once Splunk consumes data, it is sent to the universal processing pipeline. Splunk can automatically learn event boundaries, classify events and sources, and extract timestamps. However, you may want to manually override Splunk's automatic processing. Change processing settings and indexing properties via props.conf.
Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, routing events, and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.
Common use cases for custom indexing properties include:
- Define additional indexed or extracted fields.
- Override the value of host on a per-event basis, such as for syslog coming from multiple servers.
- Customize how Splunk recognizes timestamps.
- Change how Splunk recognizes multi-line event boundaries.
- Mask sensitive data in an event, such as social security numbers.
- Customize how Splunk segments events in its index.
This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.