Admin Manual

 


Anonymize data with sed

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Anonymize data with sed

This utility allows you to anonymize your data by replacing or substituting strings in it at index time using a sed script.

Most UNIX users are familiar with sed, a Unix utility which reads a file and modifies the input as specified by a list of commands. Now, you can use sed-like syntax to anonymize your data from props.conf.

Note: Edit or create a copy of props.conf in $SPLUNK_HOME/etc/system/local.

Define the sed script in props.conf

In a props.conf stanza, use SEDCMD to indicate a sed script:

[<stanza_name>]
SEDCMD-<class> = <sed script>

The stanza_name is restricted to the host, source, or sourcetype that you want to modify with your anonymization or transform.

The sed script applies only to the _raw field at index time. Splunk currently supports the following subset of sed commands: replace (s) and character substitution (y).

Note: You need to restart Splunk to implement the changes you made to props.conf

Replace strings with regex match

The syntax for a sed replace is:

SEDCMD-<class> = s/<regex>/<replacement>/flags

Example

Let's say you want to index data containing social security numbers and credit card numbers. At index time, you want to mask these values so that only the last four digits are evident in your events. Your props.conf stanza may look like this:

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

Now, in you accounts events, social security numbers appear as ssn=xxxxx6789 and credit card numbers will appear as cc=xxxx-xxxx-xxxx-xxxx-1234.

Substitute characters

The syntax for a sed character substitution is:

SEDCMD-<class> = y/<string1>/<string2>/

which substitutes each occurrence of the characters in string1 with the characters in string2.

Example

Let's say you have a file you want to index, abc.log, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf:

[source::.../abc.log]
SEDCMD-abc = y/abc/ABC/

Now, if you search for source="*/abc.log", you should not find the lowercase letters "a", "b", and "c" in your data at all. Splunk substituted "A" for each "a", "B" for each "b", and "C" for each "c'.

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!