Admin Manual

 


How Splunk Works

Anonymize data samples

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Anonymize data samples

Splunk contains an anonymize function. The anonymizer combs through sample log files or event files to replace identifying data - usernames, IP addresses, domain names, etc. - with fictional values that maintain the same word length, and event type. For example, it may turn the string user=billg@microsoft.com into user=carol@adalberto.com. This lets Splunk users share log data without revealing confidential or personal information from their networks.


The anonymized file is written to the same directory as the source file, with ANON- prepended to its filename. For example, /tmp/messages will be anonymized as /tmp/ANON-messages.


You can anonymize files from Splunk's CLI. To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command. You can also add Splunk to your path and use the splunk command.


Simple method

The easiest way to anonymize a file is with the anonymizer tool's defaults, as shown in the session below. Note that you currently need to have $SPLUNK_HOME/bin as your current working directory; this will be fixed in an incremental release.


From the CLI, type the following:


# ./splunk anonymize [filename]

# cp -p /var/log/messages /tmp
# cd $SPLUNK_HOME/bin
# splunk anonymize /tmp/messages
Getting timestamp from: /opt/paul207/splunk/lib/python2.4/site-packages/splunk/timestamp.config
Processing files: ['/tmp/messages']
Getting named entities
        Processing /tmp/messages
Adding named entities to list of public terms: Set(['secErrStr', 'MD_SB_DISKS', 'TTY', 'target', 'precision ', 'lpj', 'ip', 'pci', 'hard', 'last bus', 'override with idebus', 'SecKeychainFindGenericPassword err', 'vector', 'USER', 'irq ', 'com  user', 'uid'])
        Processing /tmp/messages for terms.
        Calculating replacements for 4672 terms.
===================================================
Wrote dictionary scrubbed terms with replacements to "/tmp/INFO-mapping.txt"
Wrote suggestions for dictionary to "/tmp/INFO-suggestions.txt"
===================================================
Writing out /tmp/ANON-messages
Done.

Advanced method

You can customize the anonymizer by telling it what terms to anonymize, what terms to leave alone, and what terms to use as replacements. The advanced form of the command is shown below.


# ./splunk anonymize [filename] [public_terms] [private_terms] [name_terms] [dictionary] [timestamp_config] [branding]
2003 2004 2005 2006 abort aborted am apr april aug august auth
authorize authorized authorizing bea certificate class com complete
481-51-6234
passw0rd
charlie
claire
desmond
jack
algol
ansi
arco
arpa
arpanet
ascii

Output Files

Splunk's anonmyizer function will create three new files in the same directory as the source file.


Replacement Mappings
--------------------
kb900485 --> LO200231
1718 --> 1608
transitions --> tstymnbkxno
reboot --> SPLUNK
cdrom --> pqyvi
Terms to consider making private (currently not scrubbed):
['uid', 'pci', 'lpj', 'hard']
Terms to consider making public (currently scrubbed):
['jun', 'security', 'user', 'ariel', 'name', 'logon', 'for', 'process', 'domain', 'audit']

This documentation applies to the following versions of Splunk: 3.2 , 3.2.1 , 3.2.2 , 3.2.3 , 3.2.4 , 3.2.5 , 3.2.6 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!