Splunk® Enterprise

Getting Data In

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Anonymize data

You might want to mask sensitive personal data when indexing log events. Credit card numbers and social security numbers are two examples of data that you might not want to appear in a Splunk index. This topic describes how to mask part of confidential fields so that privacy is protected but there is enough of the data remaining to be able to use it to trace events.

Splunk lets you anonymize data in two ways:

  • Through a regex transform
  • Through a sed script

Through a regex transform

You can configure transforms.conf to mask data by means of regex expressions.

This example masks all but the last four characters of fields SessionId and Ticket number in an application server log.

An example of the desired output:

SessionId=###########7BEA&Ticket=############96EE

A sample input:

"2006-09-21, 02:57:11.58",  122, 11, "Path=/LoginUser Query=CrmId=ClientABC&
ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&
SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET,IP=209.51.249.195,
Content=", ""
"2006-09-21, 02:57:11.60",  122, 15, "UserData:<User CrmId="clientabc" 
UserId="p12345678"><EntitlementList></EntitlementList></User>", ""
"2006-09-21, 02:57:11.60",  122, 15, "New Cookie: SessionId=3A1785URH117BEA&
Ticket=646A1DA4STF896EE&CrmId=clientabcUserId=p12345678&AccountId=&AgentHost=man&
AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=
&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=
&PinPGRate=&PinMenu=&", ""

To mask the data, modify the props.conf and transforms.conf files in your $SPLUNK_HOME/etc/system/local/ directory.

Configure props.conf

Edit $SPLUNK_HOME/etc/system/local/props.conf and add the following stanza:

[<spec>]
TRANSFORMS-anonymize = session-anonymizer, ticket-anonymizer

Note the following:

  • <spec> must be one of the following:
    • <sourcetype>, the source type of an event.
    • host::<host>, where <host> is the host of an event.
    • source::<source>, where <source> is the source of an event.
  • In this example, session-anonymizer and ticket-anonymizer are arbitrary TRANSFORMS class names whose actions are defined in stanzas in a corresponding transforms.conf file. Use the class names you create in transforms.conf.

Configure transforms.conf

In $SPLUNK_HOME/etc/system/local/transforms.conf, add your TRANSFORMS:

[session-anonymizer]
REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$
FORMAT = $1SessionId=########$2
DEST_KEY = _raw
[ticket-anonymizer]
REGEX = (?m)^(.*)Ticket=\w+(\w{4}&.*)$
FORMAT = $1Ticket=########$2
DEST_KEY = _raw

Note the following:

  • REGEX should specify the regular expression that will point to the string in the event you want to anonymize.

Note: The regex processor can't handle multi-line events. To get around this you need to specify that the event is multi-line. Place (?m) before the regular expression in transforms.conf.

  • FORMAT specifies the masked values. $1 is all the text leading up to the regex and $2 is all the text of the event after the regex.
  • DEST_KEY = _raw specifies to write the value from FORMAT to the raw value in the log - thus modifying the event.

Through a sed script

You can also anonymize your data by using a sed script to replace or substitute strings in events.

Most UNIX users are familiar with sed, a Unix utility which reads a file and modifies the input as specified by a list of commands. Splunk lets you use sed-like syntax in props.conf to anonymize your data.

Define the sed script in props.conf

Edit or create a copy of props.conf in $SPLUNK_HOME/etc/system/local.

Create a props.conf stanza that uses SEDCMD to indicate a sed script:

[<spec>]
SEDCMD-<class> = <sed script>

Note the following:

  • <spec> must be one of the following:
    • <sourcetype>, the source type of an event.
    • host::<host>, where <host> is the host of an event.
    • source::<source>, where <source> is the source of an event.
  • The sed script applies only to the _raw field at index time. Splunk currently supports the following subset of sed commands:
    • replace (s)
    • character substitution (y).

Note: After making changes to props.conf, restart Splunk to enable the configuration changes.

Replace strings with regex match

The syntax for a sed replace is:

SEDCMD-<class> = s/<regex>/<replacement>/flags

Note the following:

  • regex is a PERL regular expression.
  • replacement is a string to replace the regex match. It uses "\n" for back-references, where n is a single digit.
  • flags can be either "g" to replace all matches or a number to replace a specified match.

Example

Let's say you want to index data containing social security numbers and credit card numbers. At index time, you want to mask these values so that only the last four digits are evident in your events. Your props.conf stanza might look like this:

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

Now, in your accounts events, social security numbers appear as ssn=xxxxx6789 and credit card numbers will appear as cc=xxxx-xxxx-xxxx-1234.

Substitute characters

The syntax for a sed character substitution is:

SEDCMD-<class> = y/<string1>/<string2>/

This substitutes each occurrence of the characters in string1 with the characters in string2.

Example

Let's say you have a file you want to index, abc.log, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf:

[source::.../abc.log]
SEDCMD-abc = y/abc/ABC/

Now, if you search for source="*/abc.log", you should not find the lowercase letters "a", "b", and "c" in your data at all. Splunk substituted "A" for each "a", "B" for each "b", and "C" for each "c'.

PREVIOUS
Configure indexed field extraction
  NEXT
How timestamp assignment works

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18


Comments

Q1: Can wildcard characters be used in ""? For example, if I want a transformation to be apply to all sourcetypes could I do the following in props.conf ? "[*]"<br /><br />Q2: Can I compound "" expressions? For example, How would I apply a transformation to a file named "abc.log" only on 2 out of 3 of my servers.

Bmonroe
February 27, 2013

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters