Admin Manual

 


About the Splunk Admin Manual
How Splunk Works

How input configuration works

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

How input configuration works

Splunk consumes any data you point it at. Before indexing data, you must add your data source as an input. The source is then listed as one of Splunk's default fields (whether it's a file, directory or network port).

Note: Splunk looks for the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents.

Data input methods

Specify data inputs via the following methods:

Most data sources can be specified via Splunk Web. For more extensive configuration options, use inputs.conf. Changes made via Splunk Web or the Splunk CLI are written to $SPLUNK_HOME/etc/system/local/inputs.conf. Configure Windows inputs via inputs.conf as well.


Sources

Splunk accepts data inputs from a wide range of sources. Here's a basic overview of your options. Read on through the Data Inputs and Data Distribution sections of this manual for configuration specifics.

Files and directories

Many data inputs come directly from files and directories. For the most part, you can use Splunk's monitor processor to index data in files and directories. If you have a large archive of historical data, you may want to use batch. Data sent via batch is loaded once and the original files are deleted when Splunk is done indexing them. Keep this in mind when using batch input.

You can also configure Splunk's file system change monitor to watch for changes in your file system. However, you cannot currently use both monitor and file system change monitor to follow the same directory or file. If you want to see changes (eg. file edits, ownership changes) in a directory, use file system change monitor. If you want to index new events (eg. from log files) in a directory, use monitor.

To configure files and directories, see files and directories.

To configure file system change monitor, see the page on file system change monitor.

Monitor

Specify a path to a file or directory and Splunk's monitor processor consumes any new input. You can also specify a mounted or shared directory, as long as the Splunk server can see the directory. If the specified directory contains subdirectories, Splunk recursively examines them for new files. Splunk only checks for files and directories each time the Splunk server starts/restarts, so be sure to add new sources when they become available if you don't want to restart the server. You can also use crawl to discover new sources

When using monitor:

Note: Splunk rescans the inputs it is configured to monitor every 24 hours starting from the time it was last restarted. This means that if you add a stanza to monitor a directory or file that doesn't exist yet, it could take up to 24 hours for Splunk to start indexing its contents.

Important: To avoid performance issues, Splunk recommends that you set followTail=1 in inputs.conf if you are deploying Splunk to systems containing significant quantities of historical data. Setting followTail=1 for a monitor input means that any new incoming data is indexed when it arrives, but anything already in files on the system when Splunk was first started will not be indexed.

For the curious, some detail on How Splunk Reads Input Files is available on the Community wiki.

Upload files

Upload files directly through Splunk Web. If necessary, Splunk decompresses files before indexing. Uploading files through Splunk Web places them in the spool directory $SPLUNK_HOME/var/spool/splunk.

Use the batch processor at the CLI to load files once and destructively. By default, Splunk's batch processor is located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory, Splunk indexes it and deletes it. You should only use this for large archives of historical data. For most inputs, use monitor.

FIFO queues

Caution: Due to common issues with deadlock and data loss, the use of FIFOs is not recommended. Monitor is a more reliable, stable method. Support for FIFO inputs is deprecated and will be removed in a future release of Splunk.

A FIFO (AKA named pipe) is a queue of data maintained in memory. File systems can write log messages directly to a FIFO. Splunk then accesses the FIFO as though it were a file. FIFO access is very fast, but FIFOs are vulnerable when there are processing disruptions because the in-memory data may be lost.

To configure FIFO cues, see "FIFO" in this manual.

Network ports

You can configure Splunk with an Enterprise license to listen on any network port. This is the best method to send data to your Splunk server from any machine (see data distribution for more information). When configuring network ports, keep in mind that you cannot use privileged ports (i.e. any port lower than 1025) if you have not installed Splunk as root on Linux, Unix, Mac, or FreeBSD. Windows does not implement privileged ports, so Splunk can bind to any port when running under any user context.

To configure network ports, see "Network ports" in this manual.

UDP

UDP is a best effort protocol, so you may experience data loss under certain conditions such as high network or system utilization. Use UDP inputs only when the sending device does not support TCP.

Splunk with an Enterprise license can listen for data on any UDP port. When configured to listen on UDP port 514, Splunk eliminates the need to install and configure a syslog server to listen for syslog data sent from remote hosts.

TCP

TCP is a reliable, connection-oriented protocol that should be used instead of UDP to transmit and receive data whenever possible. Splunk with an Enterprise license can receive data on any TCP port, allowing Splunk to receive remote data from syslog-ng and other application that transmit via TCP. TCP is the foundation of Splunk's data distribution architecture.

Scripted inputs

Configure Splunk to run shell commands on a schedule, and then index whatever the command writes to standard output.

For example:

See configure scripted inputs for details on setting this up.


Windows data sources

By default, Splunk for Windows indexes the Windows Application, System, and Security event logs. Splunk for Windows can also monitor and index changes to your registry and accept WMI data input. For more information on configuring Splunk for Windows, see "Windows inputs" in this manual.


Crawl

Discover new inputs automatically. Crawl uses rules you configure to traverse any directory structure. Splunk adds new inputs you find via crawl to inputs.conf.


Data processing

Once Splunk consumes data, it is sent to the universal processing pipeline. Splunk can automatically learn event boundaries, classify events and sources, and extract timestamps. However, you may want to manually override Splunk's automatic processing. Change processing settings and indexing properties via props.conf.

Some attributes within props.conf can be customized by defining new stanzas in other configuration files. For example, transforms.conf defines regex-based rules for extracting fields, routing events, and performing other transformations. Segmenters.conf and outputs.conf can also define attribute values referenced by props.conf.

Common use cases for custom indexing properties include:

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.