
Anonymize data
This topic discusses how to anonymize data you are sending to your Splunk deployment, such as credit card and Social Security numbers.
You might want to mask sensitive personal data when indexing log events. Credit card numbers and social security numbers are two examples of data that you might not want to appear in an index. This topic describes how to mask part of confidential fields to protect privacy while providing enough remaining data for use in tracking events.
You can anonymize data in two ways:
- Through a regular expression (regex) transform.
- Through a
sed
script.
Configure your forwarder as described in this topic.
Anonymize data with a regular expression transform
You can configure transforms.conf
to mask data by means of regular expressions.
This example masks all but the last four characters of fields SessionId
and Ticket number
in an application server log.
Here is an example of the desired output:
SessionId=###########7BEA&Ticket=############96EE
A sample input:
"2006-09-21, 02:57:11.58", 122, 11, "Path=/LoginUser Query=CrmId=ClientABC& ContentItemId=TotalAccess&SessionId=3A1785URH117BEA&Ticket=646A1DA4STF896EE& SessionTime=25368&ReturnUrl=http://www.clientabc.com, Method=GET,IP=209.51.249.195, Content=", "" "2006-09-21, 02:57:11.60", 122, 15, "UserData:<User CrmId="clientabc" UserId="p12345678"><EntitlementList></EntitlementList></User>", "" "2006-09-21, 02:57:11.60", 122, 15, "New Cookie: SessionId=3A1785URH117BEA& Ticket=646A1DA4STF896EE&CrmId=clientabcUserId=p12345678&AccountId=&AgentHost=man& AgentId=man, MANUser: Version=1&Name=&Debit=&Credit=&AccessTime=&BillDay=&Status= &Language=&Country=&Email=&EmailNotify=&Pin=&PinPayment=&PinAmount=&PinPG= &PinPGRate=&PinMenu=&", ""
To mask the data, modify the props.conf
and transforms.conf
files in your $SPLUNK_HOME/etc/system/local/
directory.
Configure props.conf
1. Edit $SPLUNK_HOME/etc/system/local/props.conf
and add the following stanza:
[<spec>] TRANSFORMS-anonymize = session-anonymizer, ticket-anonymizer
In this stanza, <spec>
must be one of the following:
<sourcetype>
, the source type of an event.host::<host>
, where<host>
is the host of an event.source::<source>
, where<source>
is the source of an event.
In this example, session-anonymizer
and ticket-anonymizer
are arbitrary TRANSFORMS class names whose actions you define in stanzas in a corresponding transforms.conf
file. Use the class names you create in transforms.conf
.
Configure transforms.conf
2. In $SPLUNK_HOME/etc/system/local/transforms.conf
, add your TRANSFORMS:
[session-anonymizer] REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$ FORMAT = $1SessionId=########$2 DEST_KEY = _raw [ticket-anonymizer] REGEX = (?m)^(.*)Ticket=\w+(\w{4}&.*)$ FORMAT = $1Ticket=########$2 DEST_KEY = _raw
In this transform:
REGEX
should specify the regular expression that points to the string in the event you want to anonymize.FORMAT
specifies the masked values.$1
is all the text leading up to the regex and$2
is all the text of the event after the regular expression.DEST_KEY = _raw
specifies to write the value fromFORMAT
to the raw value in the log - thus modifying the event.
Note: The regular expression processor does not handle multiline events. As a workaround, specify that the event is multiline by placing (?m)
before the regular expression in transforms.conf
.
Anonymize data through a sed
script
You can also anonymize data by using a sed
script to replace or substitute strings in events.
Most UNIX users are familiar with sed
, a Unix utility which reads a file and modifies the input as specified by a list of commands. Splunk Enterprise lets you use sed
-like syntax in props.conf
to anonymize your data.
Define the sed script in props.conf
1. Edit or create a copy of props.conf
in $SPLUNK_HOME/etc/system/local
.
Create a props.conf
stanza that uses SEDCMD
to indicate a sed script
:
[<spec>] SEDCMD-<class> = <sed script>
In this stanza, <spec>
must be one of the following:
<sourcetype>
, the source type of an event.host::<host>
, where<host>
is the host of an event.source::<source>
, where<source>
is the source of an event.
The sed script
applies only to the _raw
field at index time. The following subset of sed
commands are supported:
- replace (
s
) - character substitution (
y
).
- replace (
2. After making changes to props.conf
, restart the Splunk instance to enable the configuration.
Replace strings with regular expression match
The syntax for a sed
replace is:
SEDCMD-<class> = s/<regex>/<replacement>/flags
In this stanza:
regex
is a PERL regular expression.replacement
is a string to replace the regular expression match. It uses "\n" for back-references, wheren
is a single digit.flags
can be either "g" to replace all matches or a number to replace a specified match.
Example
In the following example, you want to index data containing Social Security and credit card numbers. At index time, you want to mask these values so that only the last four digits are present in your events. Your props.conf
stanza might look like this:
[source::.../accounts.log] SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g
In your accounts events, Social Security numbers appear as ssn=xxxxx6789
and credit card numbers appear as cc=xxxx-xxxx-xxxx-1234
.
Substitute characters
The syntax for a sed
character substitution is:
SEDCMD-<class> = y/<string1>/<string2>/
This substitutes each occurrence of the characters in string1
with the characters in string2
.
Example
You have a file you want to index, abc.log
, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf
:
[source::.../abc.log] SEDCMD-abc = y/abc/ABC/
When you search for source="*/abc.log"
, you should not find the lowercase letters "a", "b", and "c" in your data. Splunk Enterprise substituted "A" for each "a", "B" for each "b", and "C" for each "c'.
Caveats for anonymizing data
Splunk indexers do not parse structured data
When you forward structured data to an indexer, the indexer does not parse it, even if you have configured props.conf
on that indexer with INDEXED_EXTRACTIONS
. Forwarded data skips the following queues on the indexer, which precludes any parsing of that data on the indexer:
parsing
aggregation
typing
The forwarded data must arrive at the indexer already parsed. To achieve this, you must also set up props.conf
on the forwarder that sends the data. This includes configuration of INDEXED_EXTRACTIONS
and any other parsing, filtering, anonymizing, and routing rules.
Universal forwarders are capable of performing these tasks solely for structured data. See Forward data extracted from structured data files.
PREVIOUS Configure event line breaking |
NEXT How timestamp assignment works |
This documentation applies to the following versions of Splunk Cloud™: 7.2.4, 7.2.6, 7.2.7, 7.2.8, 7.2.9
Feedback submitted, thanks!