Anonymize data with sed
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Anonymize data with sed
This utility allows you to anonymize your data by replacing or substituting strings in it at index time using a sed script.
Most UNIX users are familiar with sed, a Unix utility which reads a file and modifies the input as specified by a list of commands. Now, you can use sed-like syntax to anonymize your data from props.conf.
Note: Edit or create a copy of props.conf in $SPLUNK_HOME/etc/system/local.
Define the sed script in props.conf
In a props.conf stanza, use SEDCMD to indicate a sed script:
[<stanza_name>] SEDCMD-<class> = <sed script>
The stanza_name is restricted to the host, source, or sourcetype that you want to modify with your anonymization or transform.
The sed script applies only to the _raw field at index time. Splunk currently supports the following subset of sed commands: replace (s) and character substitution (y).
Note: You need to restart Splunk to implement the changes you made to props.conf
Replace strings with regex match
The syntax for a sed replace is:
SEDCMD-<class> = s/<regex>/<replacement>/flags
-
regexis a Perl regular expression. -
replacementis a string to replace the regex match and uses "\n" for back-references, where n is a single digit. -
flagscan be either: "g" to replace all matches or a number to replace a specified match.
Example
Let's say you want to index data containing social security numbers and credit card numbers. At index time, you want to mask these values so that only the last four digits are evident in your events. Your props.conf stanza may look like this:
[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g
Now, in you accounts events, social security numbers appear as ssn=xxxxx6789 and credit card numbers will appear as cc=xxxx-xxxx-xxxx-xxxx-1234.
Substitute characters
The syntax for a sed character substitution is:
SEDCMD-<class> = y/<string1>/<string2>/
which substitutes each occurrence of the characters in string1 with the characters in string2.
Example
Let's say you have a file you want to index, abc.log, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf:
[source::.../abc.log] SEDCMD-abc = y/abc/ABC/
Now, if you search for source="*/abc.log", you should not find the lowercase letters "a", "b", and "c" in your data at all. Splunk substituted "A" for each "a", "B" for each "b", and "C" for each "c'.
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.