How input configuration works
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
How input configuration works
Specify data inputs via Splunk's CLI or Splunk Web. You may also use inputs.conf (read more about how to configure inputs via inputs.conf). Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/bundles/local/inputs.conf. Configure Windows inputs via inputs.conf as well.
Read on for a description of Splunk's data input types, including their purpose and behavior.
Files and directories
Data inputs can come from files and directories. Data in files can be processed in live or batch mode. Use tail input for active log files. Use batch input for closed or archived data.
Tail
Splunk's tail behaves like the UNIX tail command. Specify a path to a file or directory and Splunk's tail processor consumes any new input. If subdirectories exist within the specified directory, Splunk recursively examines them for log files. Splunk automatically adds any new files into the index.
In addition, when tailing:
- Files can be opened or closed for writing. Splunk consumes files even if they're still being written to by the operating system.
- Files or directories can be included or excluded via whitelists and blacklists. For more information, see "Whitelist and blacklist rules" in this manual.
- Upon restart, Splunk continues processing files where it left off.
- Splunk unpacks compressed archive files before it reads them. Splunk can handle the following common archive filetypes: tar, gz, bz2, tar.gz, tgz, tbz, tbz2, zip, and z, and it processes compressed files according to their extension. Keep in mind that unpacking large amounts of compressed files can cause performance issues, so you may want to store old archive files where they are not monitored by Splunk.
- Splunk detects log file rotation and does not process renamed files it has already indexed, with the exception of archive filetypes such as .tar and .gz, which it will not recognize as being the same as the uncompressed originals (you can exclude them with the blacklist functionality mentioned above). For more information see "Log file rotation" in this manual.
- The entire path dir/filename for a tailed file must not exceed 1024 characters.
- Set the sourcetype to Automatic when you monitor a directory. If the directory contains multiple files of different formats, do not set a value for the source type manually. Manually setting a source type forces a single source type for all files in that directory.
- Removing an input does not stop Splunk from indexing files. Instead, it stops Splunk from checking files checked again. Splunk will continue to index all the initial content. To stop all in-process data, you must restart the Splunk server.
Note: If the specified file or directory does not exist, the Splunk server will not check to see if it is created later. Splunk only checks for files and directories each time the Splunk server starts (or is restarted). So be sure to explicitly add new files as inputs when they become available if you don't want to restart the server.
Batch upload and watch
Splunk's batch processing module watches any specified directory on the local Splunk server's file system and then processes the entirety of any new file that appears. You can also upload archived files directly into Splunk Web. If necessary, Splunk unpacks and uncompresses files before indexing. Keep in mind that Splunk needs adequate disk space to uncompress these files. This processing can take more time than processing a live or uncompressed file.
By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. You can set up your own watch directory as well.
Note: This method does not watch files it has already seen, so it's not designed for live logfiles -- just rotated archive copies.
In addition, when batch uploading or watching, Splunk can:
- delete files
- if you are copying files to the Splunk host and have no need to keep them on the server.
- make a copy of files
- if you have mounted an existing log archive filesystem to your Splunk host via NFS, SMB or other network file sharing protocol.
- use a symlink
- if your Splunk host is also your primary central log archive so all the archive files are local, or if you are mounting your existing log archive file system to your Splunk host via SAN.
FIFO queues
A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. When choosing the FIFO data input method, consider the following:
- FIFO queues can be a high performance method to get data into Splunk, since the system does not have the I/O burden of writing to both a file on disk and Splunk's index on disk (like the tailing method).
- FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.
- You do not have to worry about log file rotation and archiving because the data goes straight from the logging application into Splunk via the queue. There is nothing on disk to manage except for Splunk's index.
- Most syslog implementations can write to FIFO queues in addition to or instead of files.
- Other applications can write to FIFO queues instead of files by just changing a logfile name parameter from a filename to a defined FIFO queue.
Note: FIFOs are not recommended for application servers forwarding data to Splunk in a distributed setting. Tail is a more reliable, stable method.
Network ports
UDP and TCP ports can feed data into the Splunk Server. UDP and TCP behave differently, and these behaviors affect how data arrives for processing. When configuring network ports, keep in mind that you cannot use ports lower than 1024 if you have not installed Splunk as root.
UDP
UDP is a best effort protocol. This means that you might not get messages if the network is clogged, or has a hiccup. You also can't be absolutely sure the messages aren't spoofed or altered in transit. UDP should be reserved for logging implementations focused on day-to-day troubleshooting rather than compliance or security.
Splunk with an Enterprise license can read directly from the network on any UDP port. Use this configuration to make Splunk act directly as a syslog server by reading remote syslog events on UDP port 514. You can also send any other UDP source of logging data, including SNMP.
Like all network streaming approaches, direct UDP input is higher performance than reading files from disk.
TCP
TCP is a reliable, high-performance choice for many situations, as this protocol includes checks to ensure that data has arrived safely and intact. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other syslog implementations that use TCP for security or reliability. TCP is the foundation of Splunk's distributed data access.
Note: If the sending process buffers data such that events are broken into multiple pieces, Splunk may interpret the parts as multiple events. This is more likely if events are being generated intermittently, as there may be long pauses (several seconds or longer) between blocks of buffered data. If you notice truncated events, try forcing the process to send events atomically.
Scripted inputs
Configure Splunk to run shell commands on a schedule, and then index the output. For example:
- vmstat, iostat, netstat, and any other network or system status commands.
- SQL DBI
- HTTP and HTTPS requests
- SNMP
See Configure scripted inputs for details on how to set this up.
Indexing properties
Splunk can process any data, regardless of format and it automatically learns event boundaries, classifies events and sources, and finds timestamps. However, sometimes you may want to customize Splunk's default processing. Change processing settings and indexing properties in props.conf.
Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, correlating events and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.
Common use cases for custom indexing properties include:
- define additional indexed or extracted fields.
- overriding the value of host on a per-event basis, such as for syslog coming from multiple servers.
- correcting or optimizing how Splunk recognizes timestamps.
- correcting or changing how Splunk recognizes multi-line event boundaries.
- masking sensitive data in an event, such as social security numbers.
- changing how Splunk segments events in its index.
This documentation applies to the following versions of Splunk: 3.2 , 3.2.1 , 3.2.2 , 3.2.3 , 3.2.4 , 3.2.5 , 3.2.6 View the Article History for its revisions.