transforms.conf

The following are the spec and example files for transforms.conf.

transforms.conf.spec

   Version 7.2.10

 This file contains settings and values that you can use to configure
 data transformations.  

 Transforms.conf is commonly used for:
 * Configuring host and source type overrides that are based on regular 
   expressions.
 * Anonymizing certain types of sensitive incoming data, such as credit
   card or social security numbers.
 * Routing specific events to a particular index, when you have multiple
   indexes.
 * Creating new index-time field extractions. NOTE: We do not recommend
   adding to the set of fields that are extracted at index time unless it
   is absolutely necessary because there are negative performance
   implications.
 * Creating advanced search-time field extractions that involve one or more
   of the following:
   * Reuse of the same field-extracting regular expression across multiple
     sources, source types, or hosts.
   * Application of more than one regular expression to the same source, 
     source type, or host.
   * Using a regular expression to extract one or more values from the values 
     of another field.
   * Delimiter-based field extractions, such as extractions where the
     field-value pairs are separated by commas, colons, semicolons, bars, or
     something similar.
   * Extraction of multiple values for the same field.
   * Extraction of fields with names that begin with numbers or
     underscores.
   * NOTE: Less complex search-time field extractions can be set up
           entirely in props.conf.
 * Setting up lookup tables that look up fields from external sources.

 All of the above actions require corresponding settings in props.conf.

 You can find more information on these topics by searching the Splunk
 documentation (http://docs.splunk.com/Documentation).

 There is a transforms.conf file in $SPLUNK_HOME/etc/system/default/. To
 set custom configurations, place a transforms.conf file in
 $SPLUNK_HOME/etc/system/local/. 

 For examples of transforms.conf configurations, see the 
 transforms.conf.example file.

 You can enable configuration changes made to transforms.conf by running this 
 search in Splunk Web:

 | extract reload=t

 To learn more about configuration files (including precedence) please see
 the documentation located at
 http://docs.splunk.com/Documentation/Splunk/latest/Admin/Aboutconfigurationfiles

GLOBAL SETTINGS


 Use the [default] stanza to define any global settings.
  * You can also define global settings outside of any stanza, at the top
    of the file.
  * Each conf file should have at most one default stanza. If there are
    multiple default stanzas, settings are combined. In the case of
    multiple definitions of the same setting, the last definition in the
    file wins.
  * If a setting is defined at both the global level and in a specific
    stanza, the value in the specific stanza takes precedence.


[<unique_transform_stanza_name>]
* Name your stanza. Use this name when you configure field extractions,
  lookup tables, and event routing in props.conf. For example, if you are
  setting up an advanced search-time field extraction, in props.conf you
  would add REPORT-<class> = <unique_transform_stanza_name> under the
  [<spec>] stanza that corresponds with a stanza you've created in
  transforms.conf.
* Follow this stanza name with any number of the following setting/value
  pairs, as appropriate for what you intend to do with the transform.
* If you do not specify an entry for each setting, Splunk software uses 
  the default value.

REGEX = <regular expression>
* Enter a regular expression to operate on your data.
* NOTE: This setting is valid for index-time and search-time field extraction.
* REGEX is required for all search-time transforms unless you are setting up
  an ASCII-only delimiter-based field extraction, in which case you can use
  DELIMS (see the DELIMS setting description, below).
* REGEX is required for all index-time transforms.
* REGEX and the FORMAT setting:
  * FORMAT must be used in conjunction with REGEX for index-time transforms.
    Use of FORMAT in conjunction with REGEX is optional for search-time
    transforms.
  * Name-capturing groups in the REGEX are extracted directly to fields.
    This means that you do not need to specify the FORMAT setting for
    simple search-time field extraction cases (see the description of FORMAT,
    below).
  * If the REGEX extracts both the field name and its corresponding field
    value, you can use the following special capturing groups if you want to
    skip specifying the mapping in FORMAT for search-time field extractions:
      _KEY_<string>, _VAL_<string>.
  * For example, the following are equivalent for search-time field extractions:
    * Using FORMAT:
      * REGEX  = ([a-z]+)=([a-z]+)
      * FORMAT = $1::$2
    * Without using FORMAT
      * REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    * When using either of the above formats, in a search-time extraction,
      the regular expression attempts to match against the source text, 
	  extracting as many fields as can be identified in the source text.
* Default: empty string

FORMAT = <string>
* NOTE: This option is valid for both index-time and search-time field
  extraction. Index-time field extraction configurations require the FORMAT
  setting. The FORMAT setting is optional for search-time field extraction
  configurations.
* This setting specifies the format of the event, including any field names or
  values you want to add.
* FORMAT is required for index-time extractions:
  * Use $n (for example $1, $2, etc) to specify the output of each REGEX
    match.
  * If REGEX does not have n groups, the matching fails.
  * The special identifier $0 represents what was in the DEST_KEY before the
    REGEX was performed.
  * At index time only, you can use FORMAT to create concatenated fields:
    * Example: FORMAT = ipaddress::$1.$2.$3.$4
  * When you create concatenated fields with FORMAT, "$" is the only special
    character. It is treated as a prefix for regular expression capturing 
	groups only if it is followed by a number and only if the number applies to 
	an existing capturing group. So if REGEX has only one capturing group and 
	its value is "bar", then:
      * "FORMAT = foo$1" yields "foobar"
      * "FORMAT = foo$bar" yields "foo$bar"
      * "FORMAT = foo$1234" yields "foo$1234"
      * "FORMAT = foo$1\$2" yields "foobar\$2"
  * At index-time, FORMAT defaults to <stanza-name>::$1
* FORMAT for search-time extractions:
  * The format of this field as used during search time extractions is as
    follows:
    * FORMAT = <field-name>::<field-value>( <field-name>::<field-value>)*
      where:
      * field-name  = [<string>|$<extracting-group-number>]
      * field-value = [<string>|$<extracting-group-number>]
  * Search-time extraction examples:
      * 1. FORMAT = first::$1 second::$2 third::other-value
      * 2. FORMAT = $1::$2
  * If you configure FORMAT with a variable <field-name>, such as in the second
    example above, the regular expression is repeatedly applied to the source 
	key to match and extract all field/value pairs in the event.
  * When you use FORMAT to set both the field and the value (such as FORMAT =
    third::other-value), and the value is not an indexed token, you must set the
    field to INDEXED_VALUE = false in fields.conf. Not doing so can cause 
    inconsistent search results.
  * NOTE: You cannot create concatenated fields with FORMAT at search time.
    That functionality is only available at index time.
  * At search-time, FORMAT defaults to an empty string.

MATCH_LIMIT = <integer>
* Only set in transforms.conf for REPORT and TRANSFORMS field extractions.
  For EXTRACT type field extractions, set this in props.conf.
* Optional. Limits the amount of resources that are spent by PCRE
  when running patterns that do not match.
* Use this to set an upper bound on how many times PCRE calls an internal
  function, match(). If set too low, PCRE may fail to correctly match a pattern.
* Default: 100000

DEPTH_LIMIT = <integer>
* Only set in transforms.conf for REPORT and TRANSFORMS field extractions.
   For EXTRACT type field extractions, set this in props.conf.
* Optional. Limits the amount of resources that are spent by PCRE
  when running patterns that do not match.
* Use this to limit the depth of nested backtracking in an internal PCRE
  function, match(). If set too low, PCRE might fail to correctly match a 
  pattern.
* Default: 1000

CLONE_SOURCETYPE = <string>
* This name is wrong; a transform with this setting actually clones and
  modifies events, and assigns the new events the specified source type.
* If CLONE_SOURCETYPE is used as part of a transform, the transform creates a 
  modified duplicate event for all events that the transform is applied to via 
  normal props.conf rules.
* Use this setting when you need to store both the original and a modified
  form of the data in your system, or when you need to to send the original and 
  a modified form to different outbound systems.
  * A typical example would be to retain sensitive information according to
    one policy and a version with the sensitive information removed
    according to another policy. For example, some events may have data
    that you must retain for 30 days (such as personally identifying
    information) and only 30 days with restricted access, but you need that
    event retained without the sensitive data for a longer time with wider
    access.
* Specifically, for each event handled by this transform, a near-exact copy
  is made of the original event, and the transformation is applied to the
  copy. The original event continues along normal data processing unchanged.
* The <string> used for CLONE_SOURCETYPE selects the source type that is used 
  for the duplicated events.
* The new source type MUST differ from the the original source type. If the
  original source type is the same as the target of the CLONE_SOURCETYPE,
  Splunk software makes a best effort to log warnings to splunkd.log, but this 
  setting is silently ignored at runtime for such cases, causing the transform 
  to be applied to the original event without cloning.
* The duplicated events receive index-time transformations & sed
  commands for all transforms that match its new host, source, or source type.
  * This means that props.conf matching on host or source will incorrectly be
    applied a second time.
* Can only be used as part of of an otherwise-valid index-time transform.  For
  example REGEX is required, there must be a valid target (DEST_KEY or
  WRITE_META), etc as above.

LOOKAHEAD = <integer>
* NOTE: This option is valid for all index time transforms, such as
  index-time field creation, or DEST_KEY modifications.
* Optional. Specifies how many characters to search into an event.
* Default: 4096
  * You may want to increase this value if you have event line lengths that
    exceed 4096 characters (before linebreaking).

WRITE_META = [true|false]
* NOTE: This setting is only valid for index-time field extractions.
* Automatically writes REGEX to metadata.
* Required for all index-time field extractions except for those where
  DEST_KEY = _meta (see the description of the DEST_KEY setting, below)
* Use instead of DEST_KEY = _meta.
* Default: false

DEST_KEY = <KEY>
* NOTE: This setting is only valid for index-time field extractions.
* Specifies where Splunk software stores the expanded FORMAT results in    
  accordance with the REGEX match.
* Required for index-time field extractions where WRITE_META = false or is
  not set.
* For index-time extractions, DEST_KEY can be set to a number of values
  mentioned in the KEYS section at the bottom of this file.
  * If DEST_KEY = _meta (not recommended) you should also add $0 to the
    start of your FORMAT setting.  $0 represents the DEST_KEY value before
    Splunk software performs the REGEX (in other words, _meta).
    * The $0 value is in no way derived *from* the REGEX match. (It
      does not represent a captured group.)
* KEY names are case-sensitive, and should be used exactly as they appear in
  the KEYs list at the bottom of this file. (For example, you would say
  DEST_KEY = MetaData:Host, *not* DEST_KEY = metadata:host .)

DEFAULT_VALUE = <string>
* NOTE: This setting is only valid for index-time field extractions.
* Optional. The Splunk software writes the DEFAULT_VALUE to DEST_KEY if the 
  REGEX fails.
* Default: empty string

SOURCE_KEY = <string>
* NOTE: This setting is valid for both index-time and search-time field
  extractions.
* Optional. Defines the KEY that Splunk software applies the REGEX to.
* For search time extractions, you can use this setting to extract one or
  more values from the values of another field. You can use any field that
  is available at the time of the execution of this field extraction
* For index-time extractions use the KEYs described at the bottom of this
  file.
  * KEYs are case-sensitive, and should be used exactly as they appear in
    the KEYs list at the bottom of this file. (For example, you would say
    SOURCE_KEY = MetaData:Host, *not* SOURCE_KEY = metadata:host .)
* If <string> starts with "field:" or "fields:" the meaning is changed.
  Instead of looking up a KEY, it instead looks up an already indexed field.
  For example, if a CSV field name "price" was indexed then
  "SOURCE_KEY = field:price" causes the REGEX to match against the contents
  of that field.  It's also possible to list multiple fields here with
  "SOURCE_KEY = fields:name1,name2,name3" which causes MATCH to be run
  against a string comprising of all three values, separated by space
  characters.
* SOURCE_KEY is typically used in conjunction with REPEAT_MATCH in
  index-time field transforms.
* Default: _raw
  * This means it is applied to the raw, unprocessed text of all events.

REPEAT_MATCH = [true|false]
* NOTE: This setting is only valid for index-time field extractions.
* Optional. When set to true, Splunk software runs the REGEX multiple 
  times on the SOURCE_KEY.
* REPEAT_MATCH starts wherever the last match stopped, and continues until
  no more matches are found. Useful for situations where an unknown number
  of REGEX matches are expected per event.
* Default: false

INGEST_EVAL = <comma-separated list of evaluator expressions>
* NOTE: This setting is only valid for index-time field extractions.
* Optional. When you set INGEST_EVAL, this setting overrides all of the other 
  index-time settings (such as REGEX, DEST_KEY, etc) and declares the 
  index-time extraction to be evaluator-based.
* The expression takes a similar format to the search-time "|eval" command.
  For example "a=b+c*d" Just like the search-time operator, you can
  string multiple expressions together, separated by commas like
  "len=length(_raw), length_category=floor(log(len,2))".
* Keys which are commonly used with DEST_KEY or SOURCE_KEY (like
  "_raw", "queue", etc) can be used directly in the expression.
  Also available are values which would be populated by default when
  this event is searched ("source", "sourcetype", "host", "splunk_server",
  "linecount", "index"). Search-time calculated fields (the "EVAL-" settings
  in props.conf) are NOT available.
* When INGEST_EVAL accesses the "_time" variable, subsecond information is 
  included. This is unlike regular-expression-based index-time extractions, 
  where  "_time" values are limited to whole seconds.
* By default, other variable names refer to index-time fields which are
  populated in "_meta" So an expression 'event_category=if(_raw LIKE "WARN %", 
  "warning", "normal")' would append a new indexed field to _meta like 
  "event_category::warning".
* You can force a variable to be treated as a direct KEY name by
  prefixing it with "pd:". You can force a variable to be always
  treated as a "_meta" field by prefixing it with "field:" Therefore
  the above expression could also be written as
  '$field:event_category$=if($pd:_raw$ LIKE "WARN %", "warning", "normal")'
* When writing to a _meta field, the default behavior is to add a new
  index-time field even if one exists with the same name, the same way
  WRITE_META works for regular-expression-based extractions. For example, "a=5, 
  a=a+2" adds two index-time fields to _meta: "a::5 a::7". You can change this 
  by using ":=" after the variable name. For example, setting "a=5, a:=a+2" 
  causes Splunk software to add a single "a::7" field.
* NOTE: Replacing index-time fields is slower than adding them. It is best to 
  only use ":=" when you need this behavior.
* The ":=" operator can also be used to remove existing fields in _meta
  by assigning the expression null() to them.
* When reading from an index-time field that occurs multiple times inside the
  _meta key, normally the first value is used. You can override this by 
  prefixing the name with "mv:" which returns all of the values into a 
  "multival" object. For example, if _meta contains the keys "v::a v::b" then 
  'mvjoin(v,",")' returns "a" while 'mvjoin($mv:v$,",")' returns "a,b".
* Note that this "mv:" prefix does not change behavior when it writes to a
  _meta field. If the value returned by an expression is a multivalue, it 
  always creates multiple index-time fields. For example, 
  'x=mvappend("a","b","c")' causes the string "x::a x::b x::c" to be appended 
  to the _meta key.
* Internally, the _meta key can hold values with various numeric types.
  Splunk software normally picks a type appropriate for the value that the 
  expression returned. However, you can override this this choice by specifying 
  a type in square brackets after the destination field name. For example, 
  'my_len[int]=length(source)' creates a new field named "my_len" and forces it 
  to be stored as a 64-bit integer inside _meta. You can force Splunk software 
  to store a number as floating point by using the type "[float]". You can 
  request a smaller, less-precise encoding by using "[float32]". If you want to 
  store the value as floating point but also ensure that the Splunk software 
  remembers the significant-figures information that the evaluation expression 
  deduced, use "[float-sf]" or "[float32-sf]". Finally, you can force the 
  result to be treated as a string by specifying "[string]".
* The capability of the search-time |eval operator to name the destination
  field based on the value of another field (like "| eval {destname}=1")
  is NOT available for index-time evaluations.
* Default: empty

DELIMS = <quoted string list>
* NOTE: This setting is only valid for search-time field extractions.
* IMPORTANT: If a value may contain an embedded unescaped double quote
  character, such as "foo"bar", use REGEX, not DELIMS. An escaped double
  quote (\") is ok. Non-ASCII delimiters also require the use of REGEX.
* Optional. Use DELIMS in place of REGEX when you are working with ASCII-only 
  delimiter-based field extractions, where field values (or field/value pairs) 
  are separated by delimiters such as colons, spaces, line breaks, and so on.
* Sets delimiter characters, first to separate data into field/value pairs,
  and then to separate field from value.
* Each individual ASCII character in the delimiter string is used as a
  delimiter to split the event.
* Delimiters must be specified within double quotes (eg. DELIMS="|,;").
  Special escape sequences are \t (tab), \n (newline), \r (carriage return),
  \\ (backslash) and \" (double quotes).
* When the event contains full delimiter-separated field/value pairs, you
  enter two sets of quoted characters for DELIMS:
* The first set of quoted delimiters extracts the field/value pairs.
* The second set of quoted delimiters separates the field name from its
  corresponding value.
* When the event only contains delimiter-separated values (no field names), 
  use just one set of quoted delimiters to separate the field values. Then use 
  the FIELDS setting to apply field names to the extracted values.
  * Alternately, Splunk software reads even tokens as field names and odd   
    tokens as field values.
* Splunk software consumes consecutive delimiter characters unless you 
  specify a list of field names.
* The following example of DELIMS usage applies to an event where
  field/value pairs are separated by '|' symbols and the field names are
  separated from their corresponding values by '=' symbols:
    [pipe_eq]
    DELIMS = "|", "="
* Default: ""

FIELDS = <quoted string list>
* NOTE: This setting is only valid for search-time field extractions.
* Used in conjunction with DELIMS when you are performing delimiter-based
  field extraction and only have field values to extract.
* FIELDS enables you to provide field names for the extracted field values,
  in list format according to the order in which the values are extracted.
* NOTE: If field names contain spaces or commas they must be quoted with " "
  To escape, use \.
* The following example is a delimiter-based field extraction where three
  field values appear in an event. They are separated by a comma and then a
  space.
    [commalist]
    DELIMS = ", "
    FIELDS = field1, field2, field3
* Default: ""

MV_ADD = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls what the extractor does when it finds a field which
  already exists.
* If set to true, the extractor makes the field a multivalued field and
  appends the newly found value, otherwise the newly found value is
  discarded.
* Default: false

CLEAN_KEYS = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls whether Splunk software "cleans" the keys (field names) it 
  extracts at search time. "Key cleaning" is the practice of replacing any 
  non-alphanumeric characters (characters other than those falling between the 
  a-z, A-Z, or 0-9 ranges) in field names with underscores, as well as the 
  stripping of leading underscores and 0-9 characters from field names.
* Add CLEAN_KEYS = false to your transform if you need to extract field
  names that include non-alphanumeric characters, or which begin with
  underscores or 0-9 characters.
* Default: true

KEEP_EMPTY_VALS = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls whether Splunk software keeps field/value pairs when 
  the value is an empty string.
* This option does not apply to field/value pairs that are generated by
  Splunk software autokv extraction. Autokv ignores field/value pairs with 
  empty values.
* Default: false

CAN_OPTIMIZE = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls whether Splunk software can optimize this extraction out
  (another way of saying the extraction is disabled).
* You might use this if you are running searches under a Search Mode setting
  that disables field discovery--it ensures that Software always discovers
  specific fields.
* Splunk software only disables an extraction if it can determine that none of 
  the fields identified by the extraction will ever be needed for the successful
  evaluation of a search.
* NOTE: This option should be rarely set to false.
* Default: true

Lookup tables

 NOTE: Lookup tables are used ONLY during search time

filename = <string>
* Name of static lookup file.
* File should be in $SPLUNK_HOME/etc/system/lookups/, or in
  $SPLUNK_HOME/etc/<app_name>/lookups/ if the lookup belongs to a specific app.
* If file is in multiple 'lookups' directories, no layering is done.
* Standard conf file precedence is used to disambiguate.
* Only file names are supported. Paths are explicitly not supported. If you
  specify a path, Splunk software strips the path to use the value after
  the final path separator.
* Splunk software then looks for this filename in
  $SPLUNK_HOME/etc/system/lookups/ or $SPLUNK_HOME/etc/<app_name>/lookups/.
* Default: empty string

collection = <string>
* Name of the collection to use for this lookup.
* Collection should be defined in $SPLUNK_HOME/etc/<app_name>/collections.conf
  for some <app_name>
* If collection is in multiple collections.conf file, no layering is done.
* Standard conf file precedence is used to disambiguate.
* Defaults to empty string (in which case the name of the stanza is used).

max_matches = <integer>
* The maximum number of possible matches for each input lookup value
  (range 1 - 1000).
* If the lookup is non-temporal (not time-bounded, meaning the time_field
  setting is not specified), Splunk software uses the first <integer> entries, 
  in file order.
* If the lookup is temporal, Splunk software uses the first <integer> entries 
  in descending time order. In other words, only <max_matches> lookup entries
  are allowed to match. If the number of lookup entries exceeds <max_matches>, 
  only the ones nearest to the lookup value are used.
* Default = 100 matches if the time_field setting is not specified for the
  lookup. If the time_field setting is specified for the lookup, the default is
  1 match.

min_matches = <integer>
* Minimum number of possible matches for each input lookup value.
* Default = 0 for both temporal and non-temporal lookups, which means that
  Splunk software outputs nothing if it cannot find any matches.
* However, if min_matches > 0, and Splunk software gets less than min_matches, 
  it provides the default_match value provided (see below).

default_match = <string>
* If min_matches > 0 and Splunk software has less than min_matches for any 
  given input, it provides this default_match value one or more times until the
  min_matches threshold is reached.
* Defaults to empty string.

case_sensitive_match = <bool>
* NOTE: To disable case-sensitive matching with input fields and values from 
  events, the KV Store lookup data must be entirely in lower case. The input 
  data can be of any case, but the KV Store data must be lower case.
* If set to false, case insensitive matching is performed for all fields in a 
  lookup table
* Defaults to true (case sensitive matching)

match_type = <string>
* A comma and space-delimited list of <match_type>(<field_name>)
  specification to allow for non-exact matching
* The available match_type values are WILDCARD, CIDR, and EXACT. Only fields 
  that should use WILDCARD or CIDR matching should be specified in this list.
* Default: EXACT 

external_cmd = <string>
* Provides the command and arguments to invoke to perform a lookup. Use this
  for external (or "scripted") lookups, where you interface with with an
  external script rather than a lookup table.
* This string is parsed like a shell command.
* The first argument is expected to be a python script (or executable file)
  located in $SPLUNK_HOME/etc/<app_name>/bin (or ../etc/searchscripts).
* Presence of this field indicates that the lookup is external and command
  based.
* Default: empty string

fields_list = <string>
* A comma- and space-delimited list of all fields that are supported by the
  external command.

index_fields_list = <string>
* A comma- and space-delimited list of fields that need to be indexed
  for a static .csv lookup file.
* The other fields are not indexed and not searchable. 
* Restricting the fields enables better lookup performance.
* Defaults to all fields that are defined in the .csv lookup file header. 

external_type = [python|executable|kvstore|geo]
* This setting describes the external lookup type. 
* Use 'python' for external lookups that use a python script.
* Use 'executable' for external lookups that use a binary executable, such as a 
  C++ executable. 
* Use 'kvstore' for KV store lookups.
* Use 'geo' for geospatial lookups.
* Default: python

python.version = {default|python|python2|python3}
* ******* FOR SPLUNK 8.0 BACKWARDS COMPATIBILITY ONLY ********
* In Splunk 8.0 this attribute allows you to select which Python version to use.
* In this version of Splunk, this attribute is IGNORED as only Python 2 is supported
* by the platform. Ignoring this attribute allows you to set flags in your apps
* in anticipation of moving to 8.0 without causing startup warnings.

time_field = <string>
* Used for temporal (time bounded) lookups. Specifies the name of the field
  in the lookup table that represents the timestamp.
* Default: empty string
  * This means that lookups are not temporal by default.

time_format = <string>
* For temporal lookups this specifies the 'strptime' format of the timestamp
  field.
* You can include subseconds but Splunk software ignores them.
* Default: %s.%Q (seconds from unix epoch in UTC and optional milliseconds)

max_offset_secs = <integer>
* For temporal lookups, this is the maximum time (in seconds) that the event
  timestamp can be later than the lookup entry time for a match to occur.
* Default: 2000000000 

min_offset_secs = <integer>
* For temporal lookups, this is the minimum time (in seconds) that the event
  timestamp can be later than the lookup entry timestamp for a match to
  occur.
* Default: 0

batch_index_query = <bool>
* For large file-based lookups, batch_index_query determines whether queries 
  can be grouped to improve search performance.
* Default is unspecified here, but defaults to true (at global level in
  limits.conf)

allow_caching = <bool>
* Allow output from lookup scripts to be cached
* Default: true

cache_size = <integer>
* Cache size to be used for a particular lookup. If a previously looked up 
  value is already present in the cache, it is applied.
* The cache size represents the number of input values for which to cache 
  output values from a lookup table. 
* Do not change this value unless you are advised to do so by Splunk Support or 
  a similar authority. 
* Default: 10000 

max_ext_batch = <integer>
* The maximum size of external batch (range 1 - 1000).
* This setting applies only to KV Store lookup configurations.
* Default: 300

filter = <string>
* Filter results from the lookup table before returning data. Create this filter
  like you would a typical search query using Boolean expressions and/or 
  comparison operators.
* For KV Store lookups, filtering is done when data is initially retrieved to 
  improve performance.
* For CSV lookups, filtering is done in memory.

feature_id_element = <string>
* If the lookup file is a kmz file, this field can be used to specify the xml 
  path from placemark down to the name of this placemark.
* This setting applies only to geospatial lookup configurations.
* Default: /Placemark/name

check_permission = <bool>
* Specifies whether the system can verify that a user has write permission to a 
  lookup file when that user uses the outputlookup command to modify that file. 
  If the user does not have write permissions, the system prevents the 
  modification.
* The check_permission setting is only respected when output_check_permission
  is set to "true" in limits.conf. 
* You can set lookup table file permissions in the .meta file for each lookup 
  file, or through the Lookup Table Files page in Settings. By default, only 
  users who have the admin or power role can write to a shared CSV lookup file.
* This setting applies only to CSV lookup configurations.
* Default: false

replicate = true|false
* Indicates whether to replicate CSV lookups to indexers.
* When false, the CSV lookup is replicated only to search heads in a search 
  head cluster so that input lookup commands can use this lookup on the search 
  heads.
* When true, the CSV lookup is replicated to both indexers and search heads.
* Only for CSV lookup files.
* Note that replicate=true works only if it is included in replication 
  whitelist, See distSearch.conf/[replicationWhitelist] option.
* Default: true

METRICS - STATSD DIMENSION EXTRACTION

Metrics


[statsd-dims:<unique_transforms_stanza_name>]
* 'statsd-dims' prefix indicates this stanza is applicable only to statsd metric
  type input data.
* This stanza is used to define regular expression to match and extract 
  dimensions out of statsd dotted name segments.
* By default, only the unmatched segments of the statsd dotted name segment
  become the metric_name.

REGEX = <regular expression>
* Splunk software supports a named capturing group extraction format to provide 
  dimension names of the corresponding values being extracted out. For example:
    (?<dim1>group)(?<dim2>group)..

REMOVE_DIMS_FROM_METRIC_NAME = <boolean>
* If set to false, the matched dimension values from the REGEX above would also
  be a part of the metric name.
* If true, the matched dimension values would not be a part of metric name.
* Default: true

[metric-schema:<unique_transforms_stanza_name>]
* The 'metric-schema' stanza transforms index-time field extractions from a
  single log event into metrics.
* Each metric created has its own metric_name and _value.
* The other fields extracted from the log event become dimensions in the
  generated metrics.
* You must provide one of the following two settings:
  METRIC-SCHEMA-MEASURES-<unique_metric_name_prefix> or METRIC-SCHEMA-MEASURES. These
  settings determine how values for the metric_name and _value fields are obtained.

METRIC-SCHEMA-MEASURES-<unique_metric_name_prefix> = <measure_field1>, <measure_field2>,...
* Optional.
* <unique_metric_name_prefix> should match the value of a field extracted from
  the event.
* <measure_field> should match the name of a field with a numeric value
  extracted from the event.
* If the value of the 'metric_name' index-time extraction matches with the
  <unique_metric_name_prefix>, the Splunk platform:
  * Creates a metric with a new metric_name for each <measure_field> where the
    metric_name value is the <measure_field> prefixed by the
    <unique_metric_name_prefix>.
  * Saves the corresponding numeric value for each <measure_field> as '_value'
    within each metric.
* The Splunk platform saves the remaining index-time field extractions as
  dimensions in each of the created metrics.
* Default: empty

METRIC-SCHEMA-BLACKLIST-DIMS-<unique_metric_name_prefix> = <dimension_field1>, <dimension_field2>,...
* Optional
* This configuration enables the Splunk platform to omit unnecessary dimensions
  when it transforms log data to metrics data. You might set this up if you
  have high-cardinality dimensions that are unnecessary for your metrics.
* Use this configuration in conjunction with a corresponding
  METRIC-SCHEMA-MEASURES-<unique_metric_name_prefix> configuration.
* <unique_metric_name_prefix> should match the value of a field extracted from
  the log event.
* <dimension_field> should match the name of a field in the log event that is
  not extracted as a <measure_field> in the corresponding METRIC-SCHEMA-
  MEASURES-<unique_metric_name_prefix> configuration.
* Default: empty

METRIC-SCHEMA-MEASURES = <measure_field1>, <measure_field2>,...
* Optional.
* This configuration has a lower precedence over METRIC-SCHEMA-MEASURES-<unique_metric_name_prefix>
  if event has a match for unique_metric_name_prefix
* When no prefix can be identified, this configuration is active
  to create a new metric for each <measure_field> in the event data.
* The Splunk platform saves the remaining index-time field extractions as
  dimensions in each of the created metrics.
* Default: empty

METRIC-SCHEMA-BLACKLIST-DIMS = <dimension_field1>, <dimension_field2>,...
* Optional
* This configuration enables the Splunk platform to omit unnecessary dimensions
  when it transforms log data to metrics data. You might set this up if you
  have high-cardinality dimensions that are unnecessary for your metrics.
* Use this configuration in conjunction with a corresponding
  METRIC-SCHEMA-MEASURES configuration.
* <dimension_field> should match the name of a field in the log event that is
  not extracted as a <measure_field> in the corresponding METRIC-SCHEMA-
  MEASURES configuration.
* Default: empty

KEYS:

* NOTE: Keys are case-sensitive. Use the following keys exactly as they
appear.

queue : Specify which queue to send the event to (can be nullQueue, indexQueue).
* indexQueue is the usual destination for events going through the
transform-handling processor.
* nullQueue is a destination which causes the events to be
dropped entirely.
_raw : The raw text of the event.
_meta : A space-separated list of metadata for an event.
_time : The timestamp of the event, in seconds since 1/1/1970 UTC.

MetaData:Host : The host associated with the event.
The value must be prefixed by "host::"

_MetaData:Index : The index where the event should be stored.

MetaData:Source : The source associated with the event.
The value must be prefixed by "source::"

MetaData:Sourcetype : The source type of the event.
The value must be prefixed by "sourcetype::"

_TCP_ROUTING : Comma separated list of tcpout group names (from
outputs.conf)
Defaults to groups present in 'defaultGroup' for [tcpout].

_SYSLOG_ROUTING : Comma separated list of syslog-stanza names (from
outputs.conf)
Defaults to groups present in 'defaultGroup' for [syslog].

* NOTE: Any KEY (field name) prefixed by '_' is not indexed by Splunk software, in general.

[accepted_keys]

* Modifies the list of valid SOURCE_KEY and DEST_KEY values. Splunk software
checks the SOURCE_KEY and DEST_KEY values in your transforms against this
list when it performs index-time field transformations.
* Add entries to [accepted_keys] to provide valid keys for specific
environments, apps, or similar domains.
* The 'name' element disambiguates entries, similar to -class entries in
props.conf.
* The 'name' element can be anything you choose, including a description of
the purpose of the key.
* The entire stanza defaults to not being present, causing all keys not
documented just above to be flagged.

transforms.conf.example

#   Version 7.2.10
#
# This is an example transforms.conf.  Use this file to create regexes and
# rules for transforms.  Use this file in tandem with props.conf.
#
# To use one or more of these configurations, copy the configuration block
# into transforms.conf in $SPLUNK_HOME/etc/system/local/. You must restart
# Splunk to enable configurations.
#
# To learn more about configuration files (including precedence) please see
# the documentation located at
# http://docs.splunk.com/Documentation/Splunk/latest/Admin/Aboutconfigurationfiles

# Note: These are examples.  Replace the values with your own customizations.


# Indexed field:

[netscreen-error]
REGEX =  device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1
WRITE_META = true


# Override host:

[hostoverride]
DEST_KEY = MetaData:Host
REGEX = \s(\w*)$
FORMAT = host::$1


# Extracted fields:

[netscreen-error-field]
REGEX = device_id=\[w+\](?<err_code>[^:]+)
FORMAT = err_code::$1

# Index-time evaluations:

[discard-long-lines]
INGEST_EVAL = queue=if(length(_raw) > 500, "nullQueue", "")

[split-into-sixteen-indexes-for-no-good-reason]
INGEST_EVAL = index="split_" . substr(md5(_raw),1,1)

[add-two-numeric-fields]
INGEST_EVAL = loglen_raw=ln(length(_raw)), loglen_src=ln(length(source))

# In this example we only create the new index-time field if the host
# had a dot in it; assigning null() to a new field is a no-op:
[add-hostdomain-field]
INGEST_EVAL = hostdomain=if(host LIKE "%.%", replace(host,"^[^\\.]+\\.",""), null())

# Static lookup table

[mylookuptable]
filename = mytable.csv

# one to one lookup
# guarantees that we output a single lookup value for each input value, if
# no match exists, we use the value of "default_match", which by default is
# "NONE"
[mylook]
filename = mytable.csv
max_matches = 1
min_matches = 1
default_match = nothing

# Lookup and filter results
[myfilteredlookup]
filename = mytable.csv
filter = id<500 AND color="red"

# external command lookup table

[myexternaltable]
external_cmd = testadapter.py blah
fields_list = foo bar

# Temporal based static lookup table

[staticwtime]
filename = mytable.csv
time_field = timestamp
time_format = %d/%m/%y %H:%M:%S


# Mask sensitive data:

[session-anonymizer]
REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$
FORMAT = $1SessionId=########$2
DEST_KEY = _raw


# Route to an alternate index:

[AppRedirect]
REGEX = Application
DEST_KEY = _MetaData:Index
FORMAT = Verbose


# Extract comma-delimited values into fields:

[extract_csv]
DELIMS = ","
FIELDS = "field1", "field2", "field3"

# This example assigns the extracted values from _raw to field1, field2 and
# field3 (in order of extraction). If more than three values are extracted
# the values without a matching field name are ignored.

[pipe_eq]
DELIMS = "|", "="

# The above example extracts key-value pairs which are separated by '|'
# while the key is delimited from value by '='.


[multiple_delims]
DELIMS = "|;", "=:"


# The above example extracts key-value pairs which are separated by '|' or
# ';', while the key is delimited from value by '=' or ':'.


###### BASIC MODULAR REGULAR EXPRESSIONS DEFINITION START ###########
# When adding a new basic modular regex PLEASE add a comment that lists
# the fields that it extracts (named capturing groups), or whether it
# provides a placeholder for the group name as:
# Extracts: field1, field2....
#

[all_lazy]
REGEX = .*?

[all]
REGEX = .*

[nspaces]
# matches one or more NON space characters
REGEX = \S+

[alphas]
# matches a string containing only letters a-zA-Z
REGEX = [a-zA-Z]+

[alnums]
# matches a string containing letters + digits
REGEX = [a-zA-Z0-9]+

[qstring]
# matches a quoted "string" - extracts an unnamed variable
# name MUST be provided as in [[qstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = "(?<>[^"]*+)"

[sbstring]
# matches a string enclosed in [] - extracts an unnamed variable
# name MUST be provided as in [[sbstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = \[(?<>[^\]]*+)\]

[digits]
REGEX = \d+

[int]
# matches an integer or a hex number
REGEX = 0x[a-fA-F0-9]+|\d+

[float]
# matches a float (or an int)
REGEX = \d*\.\d+|[[int]]

[octet]
# this would match only numbers from 0-255 (one octet in an ip)
REGEX = (?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)

[ipv4]
# matches a valid IPv4 optionally followed by :port_num the octets in the ip
# would also be validated 0-255 range
# Extracts: ip, port
REGEX = (?<ip>[[octet]](?:\.[[octet]]){3})(?::[[int:port]])?

[simple_url]
# matches a url of the form proto://domain.tld/uri
# Extracts: url, domain
REGEX = (?<url>\w++://(?<domain>[a-zA-Z0-9\-.:]++)(?:/[^\s"]*)?)

[url]
# matches a url of the form proto://domain.tld/uri
# Extracts: url, proto, domain, uri
REGEX = (?<url>[[alphas:proto]]://(?<domain>[a-zA-Z0-9\-.:]++)(?<uri>/[^\s"]*)?)

[simple_uri]
# matches a uri of the form /path/to/resource?query
# Extracts: uri, uri_path, uri_query
REGEX = (?<uri>(?<uri_path>[^\s\?"]++)(?:\\?(?<uri_query>[^\s"]+))?)

[uri]
# uri  = path optionally followed by query [/this/path/file.js?query=part&other=var]
# path = root part followed by file        [/root/part/file.part]
# Extracts: uri, uri_path, uri_root, uri_file, uri_query, uri_domain (optional if in proxy mode)
REGEX = (?<uri>(?:\w++://(?<uri_domain>[^/\s]++))?(?<uri_path>(?<uri_root>/+(?:[^\s\?;=/]*+/+)*)(?<uri_file>[^\s\?;=?/]*+))(?:\?(?<uri_query>[^\s"]+))?)

[hide-ip-address]
# Make a clone of an event with the sourcetype masked_ip_address.  The clone
# will be modified; its text changed to mask the ip address.
# The cloned event will be further processed by index-time transforms and
# SEDCMD expressions according to its new sourcetype.
# In most scenarios an additional transform would be used to direct the
# masked_ip_address event to a different index than the original data.
REGEX = ^(.*?)src=\d+\.\d+\.\d+\.\d+(.*)$
FORMAT = $1src=XXXXX$2
DEST_KEY = _raw
CLONE_SOURCETYPE = masked_ip_addresses

#Set REPEAT_MATCH to true to repeatedly match the regex in the data.
#example sample data - 1483382050 a=1 b=2 c=3 d=4 e=5
#Since REPEAT_MATCH is set to true, the regex will matched for a=1, then b=2, then c=3 and so on
#If REPEAT_MATCH is not set, the match will stop at a=1
#Since WRITE_META is set to true, these will added as indexed fields - a, b, c, d, e
[repeat_regex]
REGEX = ([a-z])=(\d+)
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true

#Regex with $0 in FORMAT
#example sample data -
#    #52 +(6295)- [X]
#    <Tsk> idfdgdffdhgfjhgjsdfdsghkiuk;''''
#    <Tsk> sorry.. this is a test
[bash]
REGEX = #([0-9]+) \+\((-?[0-9]+)\)- \[X\]
FORMAT = $0 bash_quote_id::$1 bash_quote_score::$2
WRITE_META=true

###### BASIC MODULAR REGULAR EXPRESSIONS DEFINITION END ###########

# Statsd dimensions extraction

# For example, below two stanzas would extract dimensions as ipv4=10.2.3.4
# and os=windows from statsd data=mem.percent.used.10.2.3.4.windows:33|g
[statsd-dims:regex_stanza1]
REGEX = (?<ipv4>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})
REMOVE_DIMS_FROM_METRIC_NAME = true

[statsd-dims:regex_stanza2]
REGEX = \S+\.(?<os>\w+):
REMOVE_DIMS_FROM_METRIC_NAME = true

# In most cases we need only one regex to be run per sourcetype. By default
# Splunk would look for the sourcetype name in transforms.conf in such scenario.
# Hence, there is no need to provide STATSD-DIM-TRANSFORMS setting in props.conf.
[statsd-dims:metric_sourcetype_name]
# In this example, we extract both ipv4 and os dimension using a single regex
REGEX = (?<ipv4>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\.(?<os>\w+):
REMOVE_DIMS_FROM_METRIC_NAME = true


# In this metrics example, we start with this log line:
#
# 01-26-2018 07:49:49.030 -0800 INFO  Metrics - group=queue, name=aggqueue, max_size_kb=1024, current_size_kb=1,
# current_size=3, largest_size=49, smallest_size=0, dc_latitude=37.3187706, dc_longitude=-121.9515042
#
# The following stanza converts that single event into multiple metrics at
# index-time. It blacklists the dc_latitude and dc_longitude dimensions, which
# means they are omitted from the generated metrics.
[metric-schema:logtometrics]
METRIC-SCHEMA-MEASURES-queue = max_size_kb,current_size_kb,current_size,largest_size,smallest_size
METRIC-SCHEMA-BLACKLIST-DIMS-queue = dc_latitude,dc_longitude
# Here are the metrics generated by that stanza:
# {'metric_name' : 'queue.max_size_kb',    '_value' : 1024, 'name': 'aggqueue'},
# {'metric_name' : 'queue.current_size_kb, '_value' : 1,    'name': 'aggqueue'},
# {'metric_name' : 'queue.current_size',   '_value' : 3,    'name': 'aggqueue'},
# {'metric_name' : 'queue.largest_size',   '_value' : 49,   'name': 'aggqueue'},
# {'metric_name' : 'queue.smallest_size',  '_value' : 0,    'name': 'aggqueue'}

# In the sample log above, group=queue represents the unique metric name prefix. Hence, it needs to be
# formatted and saved as metric_name::queue for Splunk to identify queue as a metric name prefix.
[extract_group]
REGEX = group=(\w+)
FORMAT = metric_name::$1
WRITE_META = true

[extract_name]
REGEX = name=(\w+)
FORMAT = name::$1
WRITE_META = true

[extract_max_size_kb]
REGEX = max_size_kb=(\w+)
FORMAT = max_size_kb::$1
WRITE_META = true

[extract_current_size_kb]
REGEX = current_size_kb=(\w+)
FORMAT = current_size_kb::$1
WRITE_META = true

[extract_current_size]
REGEX = max_size_kb=(\w+)
FORMAT = max_size_kb::$1
WRITE_META = true

[extract_largest_size]
REGEX = largest_size=(\w+)
FORMAT = largest_size::$1
WRITE_META = true

[extract_smallest_size]
REGEX = smallest_size=(\w+)
FORMAT = smallest_size::$1
WRITE_META = true

Related answers from Splunk Community

transforms.conf

transforms.conf.spec

GLOBAL SETTINGS

Lookup tables

METRICS - STATSD DIMENSION EXTRACTION

Metrics

KEYS:

transforms.conf.example

Comments

transforms.conf

Was this topic useful?