Admin Manual

 


transforms.conf

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

transforms.conf

The following are the spec and example files for transforms.conf.

transforms.conf.spec

# Copyright (C) 2005-2011 Splunk Inc. All Rights Reserved.  Version 4.3
#
# This file contains attributes and values that you can use to configure data transformations.
# and event signing in transforms.conf.
#
# Transforms.conf is commonly used for:
# * Configuring regex-based host and source type overrides. 
# * Anonymizing certain types of sensitive incoming data, such as credit card or social 
#   security numbers. 
# * Routing specific events to a particular index, when you have multiple indexes. 
# * Creating new index-time field extractions. NOTE: We do not recommend adding to the set of 
#   fields that are extracted at index time unless it is absolutely necessary because there
#   are negative performance implications.
# * Creating advanced search-time field extractions that involve one or more of the following:
#		* Reuse of the same field-extracting regular expression across multiple sources, 
#		  source types, or hosts.
#		* Application of more than one regex to the same source, source type, or host.
#       * Using a regex to extract one or more values from the values of another field.
#		* Delimiter-based field extractions (they involve field-value pairs that are 
# 		  separated by commas, colons, semicolons, bars, or something similar).
#		* Extraction of multiple values for the same field (multivalued field extraction).
#		* Extraction of fields with names that begin with numbers or underscores.
#		* NOTE: Less complex search-time field extractions can be set up entirely in props.conf.
# * Setting up lookup tables that look up fields from external sources.
#
# All of the above actions require corresponding settings in props.conf.
#
# You can find more information on these topics by searching the Splunk documentation 
# (http://docs.splunk.com/Documentation)
#
# There is a transforms.conf file in $SPLUNK_HOME/etc/system/default/. To set custom 
# configurations, place a transforms.conf $SPLUNK_HOME/etc/system/local/. For examples, see the 
# transforms.conf.example file.
#
# You can enable configurations changes made to transforms.conf by typing the following search 
# string in Splunk Web:
#
# | extract reload=t 
#
# To learn more about configuration files (including precedence) please see the documentation 
# located at http://docs.splunk.com/Documentation/Splunk/latest/Admin/Aboutconfigurationfiles


[<unique_transform_stanza_name>]
* Name your stanza. Use this name when you configure field extractions, lookup tables, and event 
  routing in props.conf. For example, if you are setting up an advanced search-time field 
  extraction, in props.conf you would add REPORT-<value> = <unique_transform_stanza_name> under 
  the [<spec>] stanza that corresponds with a stanza you've created in transforms.conf.
* Follow this stanza name with any number of the following attribute/value pairs, as appropriate
  for what you intend to do with the transform.  
* If you do not specify an entry for each attribute, Splunk uses the default value.

REGEX = <regular expression>
* Enter a regular expression to operate on your data. 
* NOTE: This attribute is valid for both index-time and search-time field extraction.
	* REGEX is required for all search-time transforms unless you are setting up a 
	  delimiter-based field extraction, in which case you use DELIMS (see the DELIMS attribute 
	  description, below).
	* REGEX is required for all index-time transforms.
* REGEX and the FORMAT attribute:
	* Name-capturing groups in the REGEX are extracted directly to fields. This means that you
	  do not need to specify the FORMAT attribute for simple field extraction cases (see the 
	  description of FORMAT, below).
	* If the REGEX extracts both the field name and its corresponding field value, you can use 
	  the following special capturing groups if you want to skip specifying the mapping in 
	  FORMAT: 
	  _KEY_<string>, _VAL_<string>. 
	* For example, the following are equivalent:
		* Using FORMAT:
			* REGEX  = ([a-z]+)=([a-z]+)
			* FORMAT = $1::$2
		* Without using FORMAT
			* REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
	* When using either of the above formats, in a search-time extraction, the
	  regex will continue to match against the source text, extracting as many
	  fields as can be identified in the source text.
* Defaults to an empty string.

FORMAT = <string>
* NOTE: This option is valid for both index-time and search-time field extraction. However, FORMAT 
  behaves differently depending on whether the extraction is performed at index time or 
  search time.
* This attribute specifies the format of the event, including any field names or values you want 
  to add.
* FORMAT for index-time extractions:
	* Use $n (for example $1, $2, etc) to specify the output of each REGEX match. 
	* If REGEX does not have n groups, the matching fails. 
	* The special identifier $0 represents what was in the DEST_KEY before the REGEX was performed.
	* At index time only, you can use FORMAT to create concatenated fields:
		* FORMAT = ipaddress::$1.$2.$3.$4
	* When you create concatenated fields with FORMAT, "$" is the only special character. It is 
	  treated as a prefix for regex-capturing groups only if it is followed by a number and only 
	  if the number applies to an existing capturing group. So if REGEX has only one capturing 
	  group and its value is "bar", then:
		* "FORMAT = foo$1" yields "foobar"
		* "FORMAT = foo$bar" yields "foo$bar"
		* "FORMAT = foo$1234" yields "foo$1234"
		* "FORMAT = foo$1\$2" yields "foobar\$2"
	* At index-time, FORMAT defaults to <stanza-name>::$1
* FORMAT for search-time extractions:
	* The format of this field as used during search time extractions is as follows:
		* FORMAT = <field-name>::<field-value>( <field-name>::<field-value>)* 
			* where:
			* field-name  = [<string>|$<extracting-group-number>]
			* field-value = [<string>|$<extracting-group-number>]
	* Search-time extraction examples:
		* 1. FORMAT = first::$1 second::$2 third::other-value
		* 2. FORMAT = $1::$2
	* If the key-name of a FORMAT setting is varying, for example $1 in the
	  example 2 just above, then the regex will continue to match against
	  the source key to extract as many matches as are present in the text.
	* NOTE: You cannot create concatenated fields with FORMAT at search time. That 
	  functionality is only available at index time.
	* At search-time, FORMAT defaults to an empty string.

LOOKAHEAD = <integer>
* NOTE: This option is only valid for index-time field extractions.
* Optional. Specifies how many characters to search into an event.
* Defaults to 4096. You may want to increase this value if you have event line lengths that 
  exceed 4096 characters (before linebreaking).

WRITE_META = [true|false]
* NOTE: This attribute is only valid for index-time field extractions.
* Automatically writes REGEX to metadata.
* Required for all index-time field extractions except for those where DEST_KEY = meta (see 
  the description of the DEST_KEY attribute, below)
* Use instead of DEST_KEY = meta.
* Defaults to false.

DEST_KEY = <KEY>
* NOTE: This attribute is only valid for index-time field extractions.
* Specifies where Splunk stores the REGEX results.
* Required for index-time field extractions where WRITE_META = false or is not set.
* For index-time searches, DEST_KEY = _meta, which is where Splunk stores indexed fields. For
  other potential DEST_KEY values see the KEYS section at the bottom of this file.
	* When you use DEST_KEY = _meta you should also add $0 to the start of your FORMAT attribute. 
	  $0 represents the DEST_KEY value before Splunk performs the REGEX (in other words, _meta).
	  	* The $0 value is in no way derived *from* the REGEX. (It does not represent a captured 
	  	  group.)
     * KEYs are case-sensitive, and should be used exactly as they appear in the KEYs list at
       the bottom of this file. (For example, you would say DEST_KEY = MetaData:Host, *not* 
       DEST_KEY = metadata:host .)	  	   	  

DEFAULT_VALUE = <string>
* NOTE: This attribute is only valid for index-time field extractions.
* Optional. Splunk writes the DEFAULT_VALUE to DEST_KEY if the REGEX fails.
* Defaults to empty.

SOURCE_KEY = <string>
* NOTE: This attribute is valid for both index-time and search-time field extractions.
* Optional. Defines the KEY that Splunk applies the REGEX to. 
* For search time extractions, you can use this attribute to extract one or more values from 
  the values of another field. You can use any field that is available at the time of the 
  execution of this field extraction.
* For index-time extractions use the KEYs described at the bottom of this file. 
     * KEYs are case-sensitive, and should be used exactly as they appear in the KEYs list at
       the bottom of this file. (For example, you would say SOURCE_KEY = MetaData:Host, *not* 
       SOURCE_KEY = metadata:host .)
* SOURCE_KEY is typically used in conjunction with REPEAT_MATCH in index-time field 
  transforms.
* Defaults to _raw, which means it is applied to the raw, unprocessed text of all events.

REPEAT_MATCH = [true|false]
* NOTE: This attribute is only valid for index-time field extractions.
* Optional. When set to true Splunk runs the REGEX multiple times on the SOURCE_KEY. 
* REPEAT_MATCH starts wherever the last match stopped, and continues until no more matches are 
  found. Useful for situations where an unknown number of REGEX matches are expected per
  event.
* Defaults to false.

DELIMS = <quoted string list>
* NOTE: This attribute is only valid for search-time field extractions.
* IMPORTANT: If a value may contain an embedded unescaped double quote character, 
  such as "foo"bar", use REGEX, not DELIMS. An escaped double quote (\") is ok.
* Optional. Used in place of REGEX when dealing with delimiter-based field extractions, 
  where field values (or field/value pairs) are separated by delimiters such as colons, 
  spaces, line breaks, and so on.
* Sets delimiter characters, first to separate data into field/value pairs, and then to 
  separate field from value.
* Each individual character in the delimiter string is used as a delimiter to split the event.
* Delimiters must be quoted with " " (use \ to escape).
* When the event contains full delimiter-separated field/value pairs, you enter two sets of 
  quoted characters for DELIMS: 
	* The first set of quoted delimiters extracts the field/value pairs.
	* The second set of quoted delimiters separates the field name from its corresponding
	  value.
* When the event only contains delimiter-separated values (no field names) you use just one set
  of quoted delimiters to separate the field values. Then you use the FIELDS attribute to
  apply field names to the extracted values (see FIELDS, below).
  	* Alternately, Splunk reads even tokens as field names and odd tokens as field values.
* Splunk consumes consecutive delimiter characters unless you specify a list of field names.
* The following example of DELIMS usage applies to an event where field/value pairs are 
  seperated by '|' symbols and the field names are separated from their corresponding values 
  by '=' symbols:
  	[pipe_eq]
  	DELIMS = "|", "="
* Defaults to "".  	
  
FIELDS = <quoted string list>
* NOTE: This attribute is only valid for search-time field extractions.
* Used in conjunction with DELIMS when you are performing delimiter-based field extraction 
  and only have field values to extract. 
* FIELDS enables you to provide field names for the extracted field values, in list format 
  according to the order in which the values are extracted.
* NOTE: If field names contain spaces or commas they must be quoted with " " (to escape, 
  use \).
* The following example is a delimiter-based field extraction where three field values appear
  in an event. They are separated by a comma and then a space.
  	[commalist]
  	DELIMS = ", "
  	FIELDS = field1, field2, field3
* Defaults to "".

MV_ADD = [true|false]
* NOTE: This attribute is only valid for search-time field extractions.
* Optional. Controls what the extractor does when it finds a field which already exists.
* If set to true, the extractor makes the field a multivalued field and appends the 
* newly found value, otherwise the newly found value is discarded.
* Defaults to false

CLEAN_KEYS = [true|false]
* NOTE: This attribute is only valid for search-time field extractions.
* Optional. Controls whether Splunk "cleans" the keys (field names) extracted at search time. 
  "Key cleaning" is the practice of replacing any non-alphanumeric characters (characters other
  than those falling between the a-z, A-Z, or 0-9 ranges) in field names with underscores, as 
  well as the stripping of leading underscores and 0-9 characters from field names.
* Add CLEAN_KEYS = false to your transform if you need to extract field names that include 
  non-alphanumeric characters, or which begin with underscores or 0-9 characters.
* Defaults to true.

KEEP_EMPTY_VALS = [true|false]
* NOTE: This attribute is only valid for search-time field extractions.
* Optional. Controls whether Splunk keeps field/value pairs when the value is an empty string.
* This option does not apply to field/value pairs that are generated by Splunk's autokv 
  extraction. Autokv ignores field/value pairs with empty values.
* Defaults to false.

CAN_OPTIMIZE = [true|false]
* NOTE: This attribute is only valid for search-time field extractions.
* Optional. Controls whether Splunk can optimize this extraction out (another way of saying
  the extraction is disabled). 
* You might use this when you have field discovery turned off--it ensures that certain fields 
  are *always* discovered.
* Splunk only disables an extraction if it can determine that none of the fields identified by 
  the extraction will ever be needed for the successful evaluation of a search. 
* NOTE: This option should be rarely set to false.
* Defaults to true.


#*******
# Lookup tables
#*******
# NOTE: Lookup tables are used ONLY during search time

filename = <string>
* Name of static lookup file.  
* File should be in $SPLUNK_HOME/etc/<app_name>/lookups/ for some <app_name>, or in 
  $SPLUNK_HOME/etc/system/lookups/
* If file is in multiple 'lookups' directories, no layering is done.  
* Standard conf file precedence is used to disambiguate.
* Defaults to empty string.

max_matches = <integer>
* The maximum number of possible matches for each input lookup value.
* If the lookup is non-temporal (not time-bounded, meaning the time_field attribute is 
  not specified), Splunk uses the first <integer> entries, in file order.   
* If the lookup is temporal, Splunk uses the first <integer> entries in descending time order.
* Default = 100 if the lookup is not temporal, default = 1 if it is temporal.

min_matches = <integer>
* Minimum number of possible matches for each input lookup value.
* Default = 0 for both temporal and non-temporal lookups, which means that Splunk outputs 
  nothing if it cannot find any matches.
	* However, if min_matches > 0, and Splunk get less than min_matches, then Splunk provides 
	  the default_match value provided (see below).

default_match = <string>
* If min_matches > 0 and Splunk has less than min_matches for any given input, it provides 
  this default_match value one or more times until the min_matches threshold is reached.
* Defaults to empty string.  

case_sensitive_match = <bool>
* If set to false, case insensitive matching will be performed for all fields in a lookup 
  table
* Defaults to true (case sensitive matching)

match_type = <string>
* A comma and space-delimited list of <match_type>(<field_name>) specification to allow for 
  non-exact matching
* The avaiable match_type values are WILDCARD, CIDR, and EXACT.  EXACT is the default and 
  does not need to be specified.  Only fields that should use WILDCARD or CIDR matching should 
  be specified in this list

external_cmd = <string>
* Provides the command and arguments to invoke to perform a lookup. Use this for external 
  (or "scripted") lookups, where you interface with with an external script rather than a 
  lookup table.
* This string is parsed like a shell command.
* The first argument is expected to be a python script located in 
  $SPLUNK_HOME/etc/<app_name>/bin (or ../etc/searchscripts).
* Presence of this field indicates that the lookup is external and command based.
* Defaults to empty string.

fields_list = <string>
* A comma- and space-delimited list of all fields that are supported by the external command.

external_type = python
* Type of external command.  
* Currently, only python is supported.
* Defaults to python.

time_field = <string>
* Used for temporal (time bounded) lookups. Specifies the name of the field in the lookup 
  table that represents the timestamp.
* Defaults to an empty string, meaning that lookups are not temporal by default.

time_format = <string>
* For temporal lookups this specifies the 'strptime' format of the timestamp field.
* You can include subseconds but Splunk will ignore them.
* Defaults to pure UTC time. 

max_offset_secs = <integer>
* For temporal lookups, this is the maximum time (in seconds) that the event timestamp can be 
  later than the lookup entry time for a match to occur.
* Default is 2000000000 (no maximum, effectively).

min_offset_secs = <integer>
* For temporal lookups, this is the minimum time (in seconds) that the event timestamp can be 
  later than the lookup entry timestamp for a match to occur.
* Defaults to 0.

batch_index_query = <bool>
* For large file based lookups, this determines whether queries can be grouped to improve 
  search performance.
* Default is unspecified here, but defaults to true (at global level in limits.conf)

#*******
# KEYS:
#*******
* NOTE: Keys are case-sensitive. Use the following keys exactly as they appear.

queue : Specify which queue to send the event to (can be parsingQueue, nullQueue, indexQueue).
_raw  : The raw text of the event.
_done : If set to any string, this represents the last event in a stream.
_meta : A space-separated list of metadata for an event.
_time : The timestamp of the event, in seconds since 1/1/1970 UTC.
MetaData:FinalType  : The event type of the event.

MetaData:Host       : The host associated with the event.
                      The value must be prefixed by "host::"

_MetaData:Index     : The index where the event should be stored.

MetaData:Source     : The source associated with the event.
                      The value must be prefixed by "source::"

MetaData:Sourcetype : The sourcetype of the event.
                      The value must be prefixed by "sourcetype::"

_TCP_ROUTING        : Comma separated list of tcpout group names (from outputs.conf)
                      Defaults to groups present in 'defaultGroup' for [tcpout].

* NOTE: Any KEY (field name) prefixed by '_' is not indexed by Splunk, in general.

transforms.conf.example

# Copyright (C) 2005-2011 Splunk Inc. All Rights Reserved.  Version 4.3 
#
# This is an example transforms.conf.  Use this file to create regexes and rules for transforms.
# Use this file in tandem with props.conf.
#
# To use one or more of these configurations, copy the configuration block into transforms.conf 
# in $SPLUNK_HOME/etc/system/local/. You must restart Splunk to enable configurations.
#
# To learn more about configuration files (including precedence) please see the documentation 
# located at http://docs.splunk.com/Documentation/Splunk/latest/Admin/Aboutconfigurationfiles

# Note: These are examples.  Replace the values with your own customizations.


# Indexed field:

[netscreen-error]
REGEX =  device_id=[^ ]+\s+\[w+\](.*)(?
FORMAT = err_code::$1
WRITE_META = true

# Extracted field:

[netscreen-error]
REGEX = device_id=[^ ]+\s+\[w+\](.*)(?
FORMAT = err_code::$1

# Override host:

[hostoverride]
DEST_KEY = MetaData:Host
REGEX = \s(\w*)$
FORMAT = host::$1


# Extracted fields:

[netscreen-error]
REGEX = device_id=[^ ]+\s+\[w+\](.*)(?
FORMAT = err_code::$1


# Static lookup table

[mylookuptable]
filename = mytable.csv

# one to one lookup
# guarantees that we output a single lookup value for each input value, if no match exists, 
# we use the value of "default_match", which by default is "NONE"
[mylook]
filename = mytable.csv
max_matches = 1
min_matches = 1
default_match = nothing

# external command lookup table

[myexternaltable]
external_cmd = testadapter.py blah
fields_list = foo bar

# Temporal based static lookup table

[staticwtime]
filename = mytable.csv
time_field = timestamp
time_format = %d/%m/%y %H:%M:%S


# Mask sensitive data:

[session-anonymizer]
REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$
FORMAT = $1SessionId=########$2
DEST_KEY = _raw


# Route to an alternate index:

[AppRedirect]
REGEX = Application
DEST_KEY = _MetaData:Index
FORMAT = Verbose


# Extract comma-delimited values into fields:

[extract_csv]
DELIMS = ","
FIELDS = "field1", "field2", "field3"

# This example assigns the extracted values from _raw to field1, field2 and field3 (in order of 
# extraction). If more than three values are extracted the values without a matching field name 
# are ignored.

[pipe_eq]
DELIMS = "|", "="

# The above example extracts key-value pairs which are separated by '|'
# while the key is delimited from value by '='.


[multiple_delims]
DELIMS = "|;", "=:"

# The above example extracts key-value pairs which are separated by '|' or ';'.
# while the key is delimited from value by '=' or ':'. 

This documentation applies to the following versions of Splunk: 4.3 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!