Admin Manual

 


props.conf

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

props.conf

The following are the spec and example files for props.conf.

props.conf.spec

# Copyright (C) 2005-2010 Splunk Inc.  All Rights Reserved.  Version 4.0 
#
# This file contains possible attribute/value pairs for configuring Splunk's processing properties
# via props.conf.
#
# There is a props.conf in $SPLUNK_HOME/etc/system/default/.  To set custom configurations, 
# place a props.conf in $SPLUNK_HOME/etc/system/local/. For help, see
# props.conf.example. 
# You can enable configurations changes made to props.conf by typing the following search string
# in Splunk Web:
#
# | extract reload=T 
#
# To learn more about configuration files (including precedence) please see the documentation 
# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles

[<spec>]
* This stanza enables properties for a given <spec>. 
* A props.conf file can contain multiple stanzas for any number of different <spec>.
* Follow this stanza name with any number of the following attribute/value pairs.
* If you do not set an attribute for a given <spec>, the default is used.

<spec> can be:
1. <sourcetype>, the source type of an event.
2. host::<host>, where <host> is the host for an event.
3. source::<source>, where <source> is the source for an event.
4. rule::<rulename>, where <rulename> is a unique name of a source type classification rule.
5. delayedrule::<rulename>, where <rulename> is a unique name of a delayed source type classification rule.  
These are only considered as a last resort before generating a new source type based on the source seen.
   
Precedence:

For settings that are specified in multiple categories of matching stanzas,
[host::<hostpattern>] spec settings override [<sourcetype>] spec settings.
Additionally, [source::<sourcepattern] spec settings override both
[host::<hostpattern>] and [<sourcetype>] settings.

Patterns:

When setting a <spec>, use the following regex-type syntax:

... = recurses through directories until the match is met.

* = matches anything but / 0 or more times.

| = or 

( ) = used to limit scope of |.

Example: [source::....(?<!tar.)(gz|tgz)] 

Match language:

These match expressions must match the entire key value, not just a substring.

For those familiar with regular expressions, these are a full implementation
(PCRE) with the translation of ..., * and .
Thus . matches a period, * non-directory seperators, and ... any number of any characters.
For more information see the wildcards section at: 

http://www.splunk.com/base/Documentation/latest/Admin/FilesAndDirectories#Inputs.conf

Pattern collisions:

Suppose the source of a given input matches multiple source patterns. If the
stanzas for these patterns each supply distinct settings, all of these settings
are applied.

However, suppose two stanzas supply the same setting. In this case, we choose
the value to apply based on the ASCII order of the patterns in question.
For example, suppose we have a source:

    source::az

and the following colliding patterns:

    [source::...a...]
    sourcetype = a

    [source::...z...]
    sourcetype = z

In this case, the settings provided by the pattern "source::...a..." take
precedence over those provided by "source::...z...", and sourcetype will have
the value "a".

To override this default ASCII ordering, use the priority key:

    [source::...a...]
    sourcetype = a
    priority = 5

    [source::...z...]
    sourcetype = z
    priority = 10

Assigning a higher priority to the second stanza causes sourcetype to have the
value "z".

If not specified, the default value for the priority key is 0.

The priority key may also be used to resolve collisions between sourcetype
patterns and between host patterns. Note, however, that the priority key will
not affect precendence across spec types. For example, source patterns take
priority over host and sourcetype patterns, regardless of priority key values.

#******************************************************************************
# The possible attributes/value pairs for props.conf, and their
# default values, are:
#******************************************************************************

# International characters

CHARSET = <string>
* When set, Splunk assumes the input from the given <spec> is in the specified encoding.  
* A list of valid encodings can be retrieved using the command "iconv -l" on most *nix systems.  
* If an invalid encoding is specified, a warning is logged during initial configuration and further input from that <spec> is discarded.  
* If the source encoding is valid, but some characters from the <spec> are not valid in the specified encoding, then the characters are escaped as hex (e.g. "\xF3").
* When set to "AUTO", Splunk attempts to automatically determine the character encoding and convert text from that encoding to UTF-8.  
* For a complete list of the character sets Splunk automatically detects, see the online documentation.
* Defaults to ASCII.

#******************************************************************************
# Line breaking
#******************************************************************************

# Use the following attributes to define the length of a line.

TRUNCATE = <non-negative integer>
* Change the default maximum line length.  
* Set to 0 if you do not want truncation ever (very long lines are, however, often a sign of garbage data).
* Defaults to 10000.

LINE_BREAKER = <regular expression>
* Specifies a regex that determines how the raw text stream is broken into initial events, 
  before line merging takes place. (See SHOULD_LINEMERGE)
* Defaults to ([\r\n]+), meaning data is broken into an event for each line, delimited by \r or \n. 
* The regex must contain a matching group. 
* Wherever the regex matches, the start of the first matching group is considered the end of the 
  previous event, and the end of the first matching group is considered the start of the next event.
* The contents of the first matching group is ignored as event text.
* NOTE: There is a significant speed boost by using the LINE_BREAKER to delimit multiline events,
  rather than using line merging to reassemble individual lines into events.

LINE_BREAKER_LOOKBEHIND = <integer>
* Change the default lookbehind for the regex based linebreaker. 
* When there is leftover data from a previous raw chunk, this is how far before the end the raw chunk (with the next chunk concatenated) we should begin applying the regex.
* Defaults to 100.

# Use the following attribute to define multi-line events with
# additional attributes and values.

SHOULD_LINEMERGE = true | false
* When set to true, Splunk combines several lines of data into a single event, based on the following configuration attributes.
* Defaults to true.
  	
# When SHOULD_LINEMERGE = True, use the following attributes to define the multi-line events.

BREAK_ONLY_BEFORE_DATE = true | false
* When set to true, Splunk creates a new event if and only if it encounters a new line with a date.
* Defaults to false.

BREAK_ONLY_BEFORE = <regular expression>
* When set, Splunk creates a new event if and only if it encounters a new line that matches the regular expression.
* Defaults to empty.

MUST_BREAK_AFTER = <regular expression>
* When set, and the regular expression matches the current line, Splunk creates a new event for the next input line.
* Splunk may still break before the current line if another rule matches.
* Defaults to empty.

MUST_NOT_BREAK_AFTER = <regular expression>
* When set and the current line matches the regular expression, Splunk does not break on any subsequent lines until the MUST_BREAK_AFTER expression matches.
* Defaults to empty.

MUST_NOT_BREAK_BEFORE = <regular expression>
* When set and the current line matches the regular expression, Splunk does not break the last event before the current line.
* Defaults to empty.

MAX_EVENTS = <integer>
* Specifies the maximum number of input lines to add to any event. 
* Splunk breaks after the specified number of lines are read.
* Defaults to 256.
  	

#******************************************************************************
# Timestamp extraction configuration
#******************************************************************************

DATETIME_CONFIG = <filename relative to $SPLUNK_HOME>
* Specifies which file configures the timestamp extractor.
* This configuration may also be set to "NONE" to prevent the timestamp extractor from running or "CURRENT" to assign the current system time to each event.
* Defaults to /etc/datetime.xml (eg $SPLUNK_HOME/etc/datetime.xml).

MAX_TIMESTAMP_LOOKAHEAD = <integer>
* Specifies how far (in characters) into an event Splunk should look for a timestamp.
* Defaults to 150.

TIME_PREFIX = <regular expression>
* Specifies the necessary condition for timestamp extraction. 
* The timestamping algorithm only looks for a timestamp after the first regex match.
* Defaults to empty.

TIME_FORMAT = <strptime-style format>
* Specifies a strptime format string to extract the date. 
* For more information on strptime see `man strptime` or "Configure timestamp recognition" in the Splunk Admin Manual.
* This method of date extraction does not support in-event timezones. 
* TIME_FORMAT starts reading after the TIME_PREFIX. 
* For good results, the <strptime-style format> should describe the day of the year and the time of day.
* Defaults to empty.

TZ = <timezone identifier>
* The algorithm for determining the time zone for a particular event is as follows:
* If the event has a timezone in its raw text (e.g., UTC, -08:00), use that.
* If TZ is set to a valid timezone string, use that.
* Otherwise, use the timezone of the system that is running splunkd.
* Defaults to empty.

MAX_DAYS_AGO = <integer>
* Specifies the maximum number of days past, from the current date, for an extracted date to be valid.  
* If set to 10, for example, Splunk ignores dates that are older than 10 days ago.
* Defaults to 2000.
* IMPORTANT: If your data is older than 2000 days, change this setting.

MAX_DAYS_HENCE = <integer>
* Specifies the maximum number of days in the future, from the current date, for an extracted date to be valid.  
* If set to 3, for example, dates that are more than 3 days in the future are ignored.
* False positives are less likely with a tighter window.
* The default value includes dates from one day in the future.  
* If your servers have the wrong date set or are in a timezone that is one day ahead, increase this value to at least 3.
* Defaults to 2.

MAX_DIFF_SECS_AGO = <integer>
* If the event's timestamp is more than <integer> seconds BEFORE the previous timestamp, only accept it if it has the same exact time format as the majority of timestamps from the source.
* IMPORTANT: If your timestamps are wildly out of order, consider increasing this value.
* Defaults to 3600 (one hour).

MAX_DIFF_SECS_HENCE = <integer>
* If the event's timestamp is more than <integer> seconds AFTER the previous timestamp only accept it if it has the same exact time format as the majority of timestamps from the source.
* IMPORTANT: If your timestamps are wildly out of order, or you have logs that are written less than once a week, consider increasing this value.
* Defaults to 604800 (one week).

#******************************************************************************
# Transform configuration
#******************************************************************************

# Use the TRANSFORMS class to create indexed fields.  Use the REPORT class to create extracted fields.
# Please note that extracted fields are recommended as best practice.
# Note: Indexed fields have performance implications and are only recommended in specific circumstances.
# You may want to use indexed fields if you search for expressions like foo!="bar" or NOT foo="bar" and the field foo nearly always takes on the value bar. 
# Another common reason to use indexed fields is if the value of the field exists outside of the field more often than not. 
# For example, if you commonly search for foo="1", but 1 occurs in many events that do not have foo="1", you may want to index foo. 
# For more information, see documentation at: http://www.splunk.com/doc/latest/admin/ExtractFields
# For examples, see props.conf.spec and transforms.conf.spec.

Precedence rules for classes:

* For each class, Splunk takes the configuration from the highest precedence configuration block (see precedence rules at the beginning of this file).
* If a particular class is specified for a source and a sourcetype, the class for source wins out. 
* Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides that class  in ../default/.


TRANSFORMS-<value> = <unique_stanza_name>
*  <unique_stanza_name> is the name of your stanza from transforms.conf.
* <value> is any value you want to give to your stanza to identify its name-space.
* Transforms are applied in the specified order.
* If you need to change the order, control it by rearranging the list.
  
REPORT-<value> = <unique_stanza_name>
*  <unique_stanza_name> is the name of your stanza from transforms.conf.
* <value> is any value you want to give to your stanza to identify its name-space.
* Transforms are applied in the specified order.
* If you need to change the order, control it by rearranging the list.


EXTRACT-<class> = <regex> (in <src_field>)?
* Perform regex-based field extraction from the value of source field.
* The regex is required to have named capturing groups. 
* When the regex matches the named capturing groups and their values are added to the event.
* Note: this extraction is performed at search time *only*.
* Where:
 * regex: a perl-compatible regex containing named capturing groups
 * src_field: name of the field to match the regex agains (defaults to _raw)


KV_MODE = none | auto | multi
* Specifies the key/value extraction mode for the data. 
* Set KV_MODE to one of the following:
 * none: if you want no key/value extraction to take place.
 * auto: extracts key/value pairs separated by equal signs.
 * multi: invokes multikv to expand a tabular event into multiple events.
* Defaults to auto.
  	
  	
CHECK_FOR_HEADER = true | false
* Set to true to enable header-based field extraction for a file.
* If the file has a list of columns and each event contains a field value (without field name), Splunk picks a suitable header line to use to for extracting field names.
* Defaults to false.
     
SEDCMD-<class> = <sed script>
* Specifie a sed script to apply to the _raw field at index time *only*.
 * A sed script is a space separated list of sed commands. 
 * Currently the following subset of sed commands is supported: 
 replace (s) and character substitution (y).
* Syntax:
 * replace    - s/regex/replacement/flags  
  * where regex is a perl regex (optionally containing capturing groups)
  * replacement is a string to replace the regex match, use \N for backreferences
  * flags can be either: g to replace all matches or a number to replace a specified match      
 * substitute - y/string1/string2/
  * substitutes the string1[i] with string2[i]
        
LOOKUP-<class> = $TRANSFORM (<match_field> (AS <match_field_in_event>)?)+ (OUTPUT|OUTPUTNEW (<output_field> (AS <output_field_in_event>)? )+ )?
* Specifies a specifc lookup table and how to apply that lookup table to events
* <match_field> specifies a field in the lookup table to match on.  
* By default will look for field with that same name in the event to match with (if <match_field_in_event> is not provided)
* multiple match fields may be provided, at least one is required
* <output_field> specifies a field in the lookup entry to copy into each matching event, where it will be in the field <output_field_in_event>. 
* If that is not specified, <output_field> will be used.
* A list of output fields is not required.  
* If not provided, all fields in the lookup table except for the match fields (and the timestamp field if specified) will be outputed for each matching event.
* If the output field list starts with the keyword "OUTPUTNEW" instead of "OUTPUT", then the lookup is only applied if *none* of the output fields already exist in the event.  Otherwise, the output fields are always overriden.  Any event that has all of the match_fields but no matching entry in the lookup table will end up clearing all of the output fields.    

FIELDALIAS-<class> = (<orig_field> AS <new_field>)+
* A list of fields to alias as new fields.  
* Both fields will exist, i.e. the original field will not be removed.
* Field aliasing is performed after kv extraction but before lookups.  
* Therefore, it is possible to specify a lookup based on a field alias.  
* In addition, a field that is extracted at search time can be aliased.  		  

#******************************************************************************
# Binary file configuration
#******************************************************************************

NO_BINARY_CHECK = true | false
* Can only be set for a [source::...] stanza.
* When set to true, Splunk processes binary files.
* By default, binary files are ignored.
* Defaults to false.

#******************************************************************************
# Segmentation configuration
#******************************************************************************

SEGMENTATION = <string>
* Specifies the segmenter from segmenters.conf to use at index time.
* Set segmentation for any of the <spec> outlined at the top of this file.

SEGMENTATION-<segment selection> = <string>
* Specifies that Splunk Web should use the a specific segmenter (from segmenters.conf) for the given <segment selection> choice. 
* Default <segment selection> choices are: all, inner, outer, none.
    
    
#******************************************************************************
# File checksum configuration
#******************************************************************************

CHECK_METHOD = endpoint_md5 | entire_md5 | modtime
* Set to 'endpoint_md5' to have Splunk checksum of the first and last 256 bytes of a file.  When matches are found, Splunk lists the file as already indexed and indexes only new data, or ignores it if there is no new data.
* Set this to "entire_md5" to use the checksum of the entire file.
* Alternatively, set this to "modtime" to check only the modification time of the file.
* Settings other than endpoint_md5 will cause splunk to index the entire file for each detected change.
* Defaults to endpoint_md5.


#******************************************************************************
# Small file settings
#******************************************************************************

PREFIX_SOURCETYPE = true | false
* NOTE: this attribute is only relevant to the "[too_small]" sourcetype.
* Determines the sourcetype given to files smaller than 100 lines, and therefore not classifiable. 
* False sets the sourcetype to "too_small." 
* True sets the sourcetype to "<sourcename>-too_small", where "<sourcename>" is a cleaned up version of the filename.
* The advantage of a True value is that not all small files are classified as the same sourcetype, and wildcard searching is often effective.  
* For example, a splunk search of "sourcetype=access*" will retrieve "access" files as well as "access-too_small" files.
* Defaults to true.

    
#******************************************************************************
# Sourcetype configuration
#******************************************************************************

sourcetype = <string>
* Can only be set for a [<source>::...] stanza.
* Anything from that <source> is assigned the specified sourcetype.
* Defaults to empty.
    
# The following attribute/value pairs can only be set for a stanza
# that begins with [<sourcetype>]:

rename = <string>
* Renames <sourcetype> as <string>
* With renaming, you can search for the sourcetype with sourcetype=<string>
* To search for the original sourcetype without renaming, use the field _sourcetype

invalid_cause = <string>
* Can only be set for a [<sourcetype>] stanza.
* Splunk does not index any data with invalid_cause set.
* Set <string> to "archive" to send the file to the archive processor (specified in unarchive_cmd).
* Set to any other string to throw an error in the splunkd.log if running Splunklogger in debug mode.
* Defaults to empty.
  	
is_valid = true | false
* Automatically set by invalid_cause.
* DO NOT SET THIS.
* Defaults to true.

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* DOES NOT WORK ON BATCH PROCESSED FILES. Use preprocessing_script.
* Defaults to empty.

LEARN_MODEL = true | false
* For known sourcetypes, the fileclassifier adds a model file to the learned directory.
* To disable this behavior for diverse sourcetypes (such as sourcecode, where there is no good exemplar to make a sourcetype) set LEARN_MODEL = false.
* Defaults to empty.

maxDist = <integer>
* Determines how different a sourcetype model may be from the current file.  
* The larger the value, the more forgiving.
* For example, if the value is very small (e.g., 10), then files of the specified  sourcetype should not vary much.
* A larger value indicates that files of the given sourcetype vary quite a bit.
* Defaults to 300.


# rule:: and delayedrule:: configuration

MORE_THAN<optional_unique_value>_<number> = <regular expression> (empty)
LESS_THAN<optional_unique_value>_<number> = <regular expression> (empty)

An example:

[rule::bar_some]
sourcetype = source_with_lots_of_bars
# if more than 80% of lines have "----", but fewer than 70% have "####"
# declare this a "source_with_lots_of_bars"
MORE_THAN_80 = ----
LESS_THAN_70 = ####

A rule can have many MORE_THAN and LESS_THAN patterns, and all are required for the rule to match.

#******************************************************************************
# Internal settings
#******************************************************************************

# NOT YOURS.  DO NOT SET.

_actions = <string>
* Internal field used for user-interface control of objects.
* Defaults to "new,edit,delete".

pulldown_type = <bool>
* Internal field used for user-interface control of sourcetypes.
* Defaults to empty.

props.conf.example

# Copyright (C) 2005-2010 Splunk Inc.  All Rights Reserved.  Version 4.0 
#
# The following are example props.conf configurations. Configure properties for your data.
#
# To use one or more of these configurations, copy the configuration block into
# props.conf in $SPLUNK_HOME/etc/system/local/. You must restart Splunk to enable configurations.
#
# To learn more about configuration files (including precedence) please see the documentation 
# located at http://www.splunk.com/base/Documentation/latest/Admin/Aboutconfigurationfiles


########
# Line merging settings
########

# The following example linemerges source data into multi-line events for apache_error sourcetype.

[apache_error]
SHOULD_LINEMERGE = True



########
# Settings for tuning
########

# The following example limits the amount of characters indexed per event from host::small_events.

[host::small_events]
TRUNCATE = 256

# The following example turns off DATETIME_CONFIG (which can speed up indexing) from any path
# that ends in /mylogs/*.log.

[source::.../mylogs/*.log]
DATETIME_CONFIG = NONE


  
########
# Timestamp extraction configuration
########

# The following example sets Eastern Time Zone if host matches nyc*.

[host::nyc*]
TZ = US/Eastern


# The following example uses a custom datetime.xml that has been created and placed in a custom app
# directory. This sets all events coming in from hosts starting with dharma to use this custom file.

[host::dharma*]
DATETIME_CONFIG = <etc/apps/custom_time/datetime.xml>



########
# Transform configuration
########

# The following example creates a search field for host::foo if tied to a stanza in transforms.conf.

[host::foo]
TRANSFORMS-foo=foobar

# The following example creates an extracted field for sourcetype access_combined
# if tied to a stanza in transforms.conf.

[eventtype::my_custom_eventtype]
REPORT-baz = foobaz


# The following stanza extracts an ip address from _raw
[my_sourcetype]
EXTRACT-extract_ip = (?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

# The following example shows how to configure lookup tables
[my_lookuptype]
LOOKUP-foo = mylookuptable userid AS myuserid OUTPUT username AS myusername

# The following shows how to specify field aliases
FIELDALIAS-foo = user AS myuser id AS myid


########
# Sourcetype configuration
########

# The following example sets a sourcetype for the file web_access.log.

[source::.../web_access.log]
sourcetype = splunk_web_access 


# The following example untars syslog events.

[syslog]
invalid_cause = archive
unarchive_cmd = gzip -cd -
	

# The following example learns a custom sourcetype and limits the range between different examples
# with a smaller than default maxDist.

[custom_sourcetype]
LEARN_MODEL = true
maxDist = 30


# rule:: and delayedrule:: configuration
# The following examples create sourectype rules for custom sourcetypes with regex.


[rule::bar_some]
sourcetype = source_with_lots_of_bars
MORE_THAN_80 = ----


[delayedrule::baz_some]
sourcetype = my_sourcetype
LESS_THAN_70 = ####


########	
# File configuration
########

# Binary file configuration
# The following example eats binary files from the host::sourcecode.

[host::sourcecode]
NO_BINARY_CHECK = true 
    

# File checksum configuration
# The following example checks the entirety of every file in the web_access dir rather than 
# skipping files that appear to be the same.

[source::.../web_access/*]
CHECK_METHOD = entire_md5

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!