
Anonymize data
This topic discusses how to anonymize data that comes into Splunk Enterprise, such as credit card and Social Security numbers.
You might want to mask sensitive personal data when indexing log events. Credit card numbers and social security numbers are two examples of data that you might not want to appear in an index. This topic describes how to mask part of confidential fields to protect privacy while providing enough remaining data for use in tracking events.
Splunk Enterprise lets you anonymize data in two ways:
- Through a regular expression (regex) transform
- Through a sed script
Anonymize data with a regular expression transform
You can configure transforms.conf
to mask data by means of regex expressions.
This example masks all but the last four characters of fields SessionId
and Ticket number
in an application server log.
An example of the desired output:
SessionId=###########7BEA&Ticket=############96EE
A sample input:
"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC& ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE& SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET,IP=209.51.249.195, Content=", "" "2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", "" "2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA& Ticket=646A1DA4STF896EE&CrmId=clientabcUserId=p12345678&AccountId=&AgentHost=man& AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status= &Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG= &PinPGRate=&PinMenu=&", ""
To mask the data, modify the props.conf
and transforms.conf
files in your $SPLUNK_HOME/etc/system/local/
directory.
Configure props.conf
Edit $SPLUNK_HOME/etc/system/local/props.conf
and add the following stanza:
[<spec>] TRANSFORMS-anonymize = session-anonymizer, ticket-anonymizer
Note the following:
<spec>
must be one of the following:<sourcetype>
, the source type of an event.host::<host>
, where<host>
is the host of an event.source::<source>
, where<source>
is the source of an event.
- In this example,
session-anonymizer
andticket-anonymizer
are arbitrary TRANSFORMS class names whose actions are defined in stanzas in a correspondingtransforms.conf
file. Use the class names you create intransforms.conf
.
Configure transforms.conf
In $SPLUNK_HOME/etc/system/local/transforms.conf
, add your TRANSFORMS:
[session-anonymizer] REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$ FORMAT = $1SessionId=########$2 DEST_KEY = _raw [ticket-anonymizer] REGEX = (?m)^(.*)Ticket=\w+(\w{4}&.*)$ FORMAT = $1Ticket=########$2 DEST_KEY = _raw
Note the following:
REGEX
should specify the regular expression that will point to the string in the event you want to anonymize.
Note: The regex processor does not handle multi-line events. To get around this you must specify that the event is multi-line. Place (?m)
before the regular expression in transforms.conf
.
FORMAT
specifies the masked values.$1
is all the text leading up to the regex and$2
is all the text of the event after the regex.
DEST_KEY = _raw
specifies to write the value fromFORMAT
to the raw value in the log - thus modifying the event.
Anonymize data through a sed
script
You can also anonymize your data by using a sed
script to replace or substitute strings in events.
Most UNIX users are familiar with sed
, a Unix utility which reads a file and modifies the input as specified by a list of commands. Splunk Enterprise lets you use sed
-like syntax in props.conf
to anonymize your data.
Define the sed script in props.conf
Edit or create a copy of props.conf
in $SPLUNK_HOME/etc/system/local
.
Create a props.conf
stanza that uses SEDCMD
to indicate a sed script
:
[<spec>] SEDCMD-<class> = <sed script>
Note the following:
<spec>
must be one of the following:<sourcetype>
, the source type of an event.host::<host>
, where<host>
is the host of an event.source::<source>
, where<source>
is the source of an event.
- The
sed script
applies only to the_raw
field at index time. Splunk Enterprise currently supports the following subset ofsed
commands:- replace (
s
) - character substitution (
y
).
- replace (
Note: After making changes to props.conf
, restart Splunk Enterprise to enable the configuration changes.
Replace strings with regex match
The syntax for a sed
replace is:
SEDCMD-<class> = s/<regex>/<replacement>/flags
Note the following:
regex
is a PERL regular expression.replacement
is a string to replace the regex match. It uses "\n" for back-references, wheren
is a single digit.flags
can be either "g" to replace all matches or a number to replace a specified match.
Example
In the following example, you want to index data containing Social Security and credit card numbers. At index time, you want to mask these values so that only the last four digits are evident in your events. Your props.conf
stanza might look like this:
[source::.../accounts.log] SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g
Now, in your accounts events, social security numbers appear as ssn=xxxxx6789
and credit card numbers will appear as cc=xxxx-xxxx-xxxx-1234
.
Substitute characters
The syntax for a sed
character substitution is:
SEDCMD-<class> = y/<string1>/<string2>/
This substitutes each occurrence of the characters in string1
with the characters in string2
.
Example
Let's say you have a file you want to index, abc.log
, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf
:
[source::.../abc.log] SEDCMD-abc = y/abc/ABC/
Now, if you search for source="*/abc.log"
, you should not find the lowercase letters "a", "b", and "c" in your data at all. Splunk Enterprise substituted "A" for each "a", "B" for each "b", and "C" for each "c'.
Caveats for anonymizing data
Splunk Enterprise does not parse structured data that has been forwarded to an indexer
When you forward structured data to an indexer, Splunk Enterprise does not parse this data once it arrives at the indexer, even if you have configured props.conf
on that indexer with INDEXED_EXTRACTIONS
. Forwarded data skips the following queues on the indexer, which precludes any parsing of that data on the indexer:
parsing
aggregation
typing
The forwarded data must arrive at the indexer already parsed. To achieve this, you must also set up props.conf
on the forwarder that sends the data. This includes configuration of INDEXED_EXTRACTIONS
and any other parsing, filtering, anonymizing, and routing rules. Universal forwarders are capable of performing these tasks solely for structured data. See "Forward data extracted from structured data files".
PREVIOUS Configure indexed field extraction |
NEXT How timestamp assignment works |
This documentation applies to the following versions of Splunk® Enterprise: 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15
Comments
The "latest" link is dead.
The "click here for latest" link is borked.
Hi Woodcock,
The "Anonymize data" URL has changed for versions 6.3.0 and later. The new link is "http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata".