Anonymize data samples to send to Support
Contents
Anonymize data samples to send to Support
Splunk contains an anonymize function. The anonymizer combs through sample log files or event files to replace identifying data - usernames, IP addresses, domain names, etc. - with fictional values that maintain the same word length, and event type. For example, it may turn the string user=carol@adalberto.com into user=plums@wonderful.com. This lets Splunk users share log data without revealing confidential or personal information from their networks.
The anonymized file is written to the same directory as the source file, with ANON- prepended to its filename. For example, /tmp/messages will be anonymized as /tmp/ANON-messages.
You can anonymize files from Splunk's CLI. To use Splunk's CLI, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command.
Simple method
The easiest way to anonymize a file is with the anonymizer tool's defaults, as shown in the session below. Note that you currently need to have $SPLUNK_HOME/bin as your current working directory; this will be fixed in an incremental release.
From the CLI while you're in $SPLUNK_HOME, type the following:
> ./splunk anonymize file -source </path/to/filename>
Of course it's always good practice to move the file somewhere safe (like /tmp) before doing this sort of thing. So, for example:
> cp -p /var/log/messages /tmp
> cd $SPLUNK_HOME/bin
> ./splunk anonymize file -source /tmp/messages
Processing files: ['/tmp/messages']
Getting named entities
Processing /tmp/messages
Adding named entities to list of public terms: Set(['secErrStr', 'MD_SB_DISKS', 'TTY', 'target', 'precision ', 'lpj', 'ip', 'pci', 'hard', 'last bus', 'override with idebus', 'SecKeychainFindGenericPassword err', 'vector', 'USER', 'irq ', 'com user', 'uid'])
Processing /tmp/messages for terms.
Calculating replacements for 4672 terms.
===================================================
Wrote dictionary scrubbed terms with replacements to "/tmp/INFO-mapping.txt"
Wrote suggestions for dictionary to "/tmp/INFO-suggestions.txt"
===================================================
Writing out /tmp/ANON-messages
Done.
Advanced method
You can customize the anonymizer by telling it what terms to anonymize, what terms to leave alone, and what terms to use as replacements. The advanced form of the command is:
./splunk anonymize file -source <filename> [-public_terms <file>] [-private_terms <file>] [-name_terms <file>] [-dictionary <file>] [-timestamp_config <file>]
-
filename- Default:
None - Path and name of the file to anonymize.
- Default:
-
public_terms- Default:
$SPLUNK_HOME/etc/anonymizer/public-terms.txt - A list of locally-used words that will not be anonymized if they are in the file. It serves as an appendix to the
dictionaryfile. - Here is a sample entry:
- Default:
2003 2004 2005 2006 abort aborted am apr april aug august auth authorize authorized authorizing bea certificate class com complete
-
private_terms- Default:
$SPLUNK_HOME/etc/anonymizer/private-terms.txt - A list of words that will be anonymized if found in the file, because they may denote confidential information.
- Here is a sample entry:
- Default:
401-51-6244 passw0rd
-
name_terms- Default:
$SPLUNK_HOME/etc/anonymizer/names.txt - A global list of common English personal names that Splunk uses to replace anonymized words.
- Splunk always replaces a word with a name of the exact same length, to keep each event's data pattern the same.
- Splunk uses each name in
name_termsonce to replace a character string of equal length throughout the file. After it runs out of names, it begins using randomized character strings, but still mapping each replaced pattern to one anonymized string. - Here is a sample entry:
- Default:
charlie claire desmond jack
-
dictionary- Default:
$SPLUNK_HOME/etc/anonymizer/dictionary.txt - A global list of common words that will not be anonymized, unless overridden by entries in the
private_termsfile. - Here is a sample entry:
- Default:
algol ansi arco arpa arpanet ascii
-
timestamp_config- Default:
$SPLUNK_HOME/etc/anonymizer/anonymizer-time.ini - Splunk's built-in file that determines how timestamps are parsed.
- Default:
Output Files
Splunk's anonymizer function will create three new files in the same directory as the source file.
-
ANON-filename- The anonymized version of the source file.
-
INFO-mapping.txt- This file contains a list of which terms were anonymized into which strings.
- Here is a sample entry:
Replacement Mappings -------------------- kb900485 --> LO200231 1718 --> 1608 transitions --> tstymnbkxno reboot --> SPLUNK cdrom --> pqyvi
-
INFO-suggestions.txt- A report of terms found in the file that, based on their appearance and frequency, you may want to add to
public_terms.txtor toprivate-terms.txtor topublic-terms.txtfor more accurate anonymization of your local data. - Here is a sample entry:
- A report of terms found in the file that, based on their appearance and frequency, you may want to add to
Terms to consider making private (currently not scrubbed): ['uid', 'pci', 'lpj', 'hard'] Terms to consider making public (currently scrubbed): ['jun', 'security', 'user', 'ariel', 'name', 'logon', 'for', 'process', 'domain', 'audit']
Linux tip: Anonymize all log files from a diag at once
Here are the steps to generate then anonymize the logs of a diag.
1. Generate the diag:
cd $SPLUNK_HOME/bin ./splunk diag
2. Uncompress the diag.
tar xfz my-diag-hostname.tar.gz
3. Run anonymize on each file of the diag. For example, a Linux script for this would be:
cd $SPLUNK_HOME/bin
find pathtomyuncompresseddiag/ -name \*.log* | xargs -I{} ./splunk anonymize file -source '{}',
4. Delete all the corresponding files:
For examples:
find pathtomyuncompresseddiag/ -name \INFO-mapping.txt | xargs rm -rf
rm pathtomyuncompresseddiag/log/splunkd.log*
rm pathtomyuncompresseddiag/log/metrics.log*
5. Compress the diag.
tar cfz my-diag-hostname.tar.gz pathtomyuncompresseddiag
6. Upload the diag, adding it to the Support case.
This documentation applies to the following versions of Splunk: 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 View the Article History for its revisions.