Splunk® Enterprise

Getting Data In

Download manual as PDF

Download topic as PDF

Anonymize data

This topic discusses how to anonymize data you are sending to your Splunk deployment, such as credit card and Social Security numbers.

You might want to mask sensitive personal data when indexing log events. Credit card numbers and social security numbers are two examples of data that you might not want to appear in an index. This topic describes how to mask part of confidential fields to protect privacy while providing enough remaining data for use in tracking events.

You can anonymize data in two ways:

  • Through a regular expression (regex) transform.
  • Through a sed script.

If you're running Splunk Enterprise and want to anonymize data, configure your indexers or heavy forwarders as described in this topic. If you're forwarding data to Splunk Cloud and want to anonymize it, use a heavy forwarder, configured as described in this topic.

Anonymize data with a regular expression transform

You can configure transforms.conf to mask data by means of regular expressions.

This example masks all but the last four characters of fields SessionId and Ticket number in an application server log.

Here is an example of the desired output:

SessionId=###########7BEA&Ticket=############96EE

A sample input:

"2006-09-21, 02:57:11.58",  122, 11, "Path=/LoginUser Query=CrmId=ClientABC&
ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE&
SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET,IP=209.51.249.195,
Content=", ""
"2006-09-21, 02:57:11.60",  122, 15, "UserData:<User CrmId="clientabc" 
UserId="p12345678"><EntitlementList></EntitlementList></User>", ""
"2006-09-21, 02:57:11.60",  122, 15, "New Cookie: SessionId=3A1785URH117BEA&
Ticket=646A1DA4STF896EE&CrmId=clientabcUserId=p12345678&AccountId=&AgentHost=man&
AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status=
&Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG=
&PinPGRate=&PinMenu=&", ""

To mask the data, modify the props.conf and transforms.conf files in your $SPLUNK_HOME/etc/system/local/ directory.

Configure props.conf

1. Edit $SPLUNK_HOME/etc/system/local/props.conf and add the following stanza:

[<spec>]
TRANSFORMS-anonymize = session-anonymizer, ticket-anonymizer

In this stanza, <spec> must be one of the following:

  • <sourcetype>, the source type of an event.
  • host::<host>, where <host> is the host of an event.
  • source::<source>, where <source> is the source of an event.

In this example, session-anonymizer and ticket-anonymizer are arbitrary TRANSFORMS class names whose actions you define in stanzas in a corresponding transforms.conf file. Use the class names you create in transforms.conf.

Configure transforms.conf

2. In $SPLUNK_HOME/etc/system/local/transforms.conf, add your TRANSFORMS:

[session-anonymizer]
REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$
FORMAT = $1SessionId=########$2
DEST_KEY = _raw
[ticket-anonymizer]
REGEX = (?m)^(.*)Ticket=\w+(\w{4}&.*)$
FORMAT = $1Ticket=########$2
DEST_KEY = _raw

In this transform:

  • REGEX should specify the regular expression that points to the string in the event you want to anonymize. FORMAT specifies the masked values.
  • $1 is all the text leading up to the regex and $2 is all the text of the event after the regular expression.
  • DEST_KEY = _raw specifies to write the value from FORMAT to the raw value in the log - thus modifying the event.

Note: The regular expression processor does not handle multiline events. As a workaround, specify that the event is multiline by placing (?m) before the regular expression in transforms.conf.

Anonymize data through a sed script

You can also anonymize data by using a sed script to replace or substitute strings in events.

Most UNIX users are familiar with sed, a Unix utility which reads a file and modifies the input as specified by a list of commands. Splunk Enterprise lets you use sed-like syntax in props.conf to anonymize your data.

Define the sed script in props.conf

1. Edit or create a copy of props.conf in $SPLUNK_HOME/etc/system/local.

Create a props.conf stanza that uses SEDCMD to indicate a sed script:

[<spec>]
SEDCMD-<class> = <sed script>

In this stanza, <spec> must be one of the following:

  • <sourcetype>, the source type of an event.
  • host::<host>, where <host> is the host of an event.
  • source::<source>, where <source> is the source of an event.

The sed script applies only to the _raw field at index time. The following subset of sed commands are supported:

    • replace (s)
    • character substitution (y).

2. After making changes to props.conf, restart the Splunk instance to enable the configuration.

Replace strings with regular expression match

The syntax for a sed replace is:

SEDCMD-<class> = s/<regex>/<replacement>/flags

In this stanza:

  • regex is a PERL regular expression.
  • replacement is a string to replace the regular expression match. It uses "\n" for back-references, where n is a single digit.
  • flags can be either "g" to replace all matches or a number to replace a specified match.

Example

In the following example, you want to index data containing Social Security and credit card numbers. At index time, you want to mask these values so that only the last four digits are present in your events. Your props.conf stanza might look like this:

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

In your accounts events, Social Security numbers appear as ssn=xxxxx6789 and credit card numbers appear as cc=xxxx-xxxx-xxxx-1234.

Substitute characters

The syntax for a sed character substitution is:

SEDCMD-<class> = y/<string1>/<string2>/

This substitutes each occurrence of the characters in string1 with the characters in string2.

Example

You have a file you want to index, abc.log, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf:

[source::.../abc.log]
SEDCMD-abc = y/abc/ABC/

When you search for source="*/abc.log", you should not find the lowercase letters "a", "b", and "c" in your data. Splunk Enterprise substituted "A" for each "a", "B" for each "b", and "C" for each "c'.

Caveats for anonymizing data

Splunk indexers do not parse structured data

When you forward structured data to an indexer, the indexer does not parse it, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS. Forwarded data skips the following queues on the indexer, which precludes any parsing of that data on the indexer:

  • parsing
  • aggregation
  • typing

The forwarded data must arrive at the indexer already parsed. To achieve this, you must also set up props.conf on the forwarder that sends the data. This includes configuration of INDEXED_EXTRACTIONS and any other parsing, filtering, anonymizing, and routing rules.

Universal forwarders are capable of performing these tasks solely for structured data. See Forward data extracted from structured data files.

PREVIOUS
Configure indexed field extraction
  NEXT
How timestamp assignment works

This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.0, 7.1.1, 7.1.2


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters