Anonymize data samples to send to Support
Splunk Enterprise has a few methods to anonymize data in files you send to Support. This lets Splunk Enterprise users share log data without revealing confidential or personal information from their networks.
Diag by default removes some types of sensitive information from search strings in diag files. Read about configuring search string redaction in server.conf.spec.
The anonymize function combs through sample log files or event files to replace identifying data - like usernames, IP addresses, domain names - with fictional values that maintain the same word length and event type. For example, it might turn the string user=carol@adalberto.com
into user=plums@wonderful.com
.
The anonymized file is written to the same directory as the source file, with ANON-
prepended to its filename. For example, /tmp/messages
will be anonymized as /tmp/ANON-messages
. In Windows, a file \temp\messages
becomes \temp\ANON-messages
.
Anonymize is controlled from the Splunk Enterprise CLI. See About the CLI for instructions on accessing the Splunk Enterprise CLI.
Simple method
The easiest way to anonymize a file is with the anonymizer tool's defaults, as shown in the session below.
From the CLI while you are in $SPLUNK_HOME/bin
or %SPLUNK_HOME%\bin
, type the following:
Unix/Linux | Windows | |
---|---|---|
./splunk anonymize file -source </path/to/filename> |
.\splunk anonymize file -source <\path\to\filename> |
It is good practice to copy the file somewhere safe (like /tmp or \temp) before performing this command.
Unix/Linux example:
> cp -p /var/log/messages /tmp > cd $SPLUNK_HOME/bin > ./splunk anonymize file -source /tmp/messages Processing files: ['/tmp/messages'] Getting named entities Processing /tmp/messages Adding named entities to list of public terms: Set(['secErrStr', 'MD_SB_DISKS', 'TTY', 'target', 'precision ', 'lpj', 'ip', 'pci', 'hard', 'last bus', 'override with idebus', 'SecKeychainFindGenericPassword err', 'vector', 'USER', 'irq ', 'com user', 'uid']) Processing /tmp/messages for terms. Calculating replacements for 4672 terms. =================================================== Wrote dictionary scrubbed terms with replacements to "/tmp/INFO-mapping.txt" Wrote suggestions for dictionary to "/tmp/INFO-suggestions.txt" =================================================== Writing out /tmp/ANON-messages Done.
Windows example:
C:\>xcopy c:\apache\apache.error.log c:\temp C:\apache\apache.error.log 1 File(s) copied C:\>cd \program files\Splunk\bin C:\Program Files\Splunk\bin>.\splunk anonymize file -source c:\temp\apache.error.log Processing files: ['c:\\temp\\apache.error.log'] Getting named entities Processing c:\temp\apache.error.log Adding named entities to list of public terms: set([]) Processing c:\temp\apache.error.log for terms. Calculating replacements for 44 terms. =================================================== Wrote dictionary scrubbed terms with replacements to "c:\temp\INFO-mapping.txt" Wrote suggestions for dictionary to "c:\temp\INFO-suggestions.txt" =================================================== Writing out c:\temp\ANON-apache.error.log Done.
Advanced method
You can customize the anonymizer by telling it what terms to anonymize, what terms to leave alone, and what terms to use as replacements.
On *nix:
./splunk anonymize file -source <filename> [-public_terms <file>] [-private_terms <file>] [-name_terms <file>] [-dictionary <file>] [-timestamp_config <file>]
On Windows:
.\splunk anonymize file -source <filename> [-public_terms <file>] [-private_terms <file>] [-name_terms <file>] [-dictionary <file>] [-timestamp_config <file>]
On both Windows and *nix, the optional parameters are defined as follows:
filename
- Default:
None
- Path and name of the file to anonymize.
- Default:
public_terms
- Default:
$SPLUNK_HOME/etc/anonymizer/public-terms.txt
or%SPLUNK_HOME%\etc\anonymizer\public-terms.txt
- A list of locally used words that will not be anonymized if they are in the file. It serves as an appendix to the
dictionary
file. - Here is a sample entry:
- Default:
2003 2004 2005 2006 abort aborted am apr april aug august auth authorize authorized authorizing bea certificate class com complete
private_terms
- Default:
$SPLUNK_HOME/etc/anonymizer/private-terms.txt
or%SPLUNK_HOME%\etc\anonymizer\private-terms.txt
- A list of words that will be anonymized if found in the file, because they may denote confidential information.
- Here is a sample entry:
- Default:
401-51-6244 passw0rd
name_terms
- Default:
$SPLUNK_HOME/etc/anonymizer/names.txt
or%SPLUNK_HOME%\etc\anonymizer\names.txt
- A global list of common English personal names that Splunk software uses to replace anonymized words.
- Anonymize always replaces a word with a name of the exact same length, to keep each event's data pattern the same.
- Anonymize uses each name in
name_terms
once to replace a character string of equal length throughout the file. After it runs out of names, it begins using randomized character strings, but still mapping each replaced pattern to one anonymized string. - Here is a sample entry:
- Default:
charlie claire desmond jack
dictionary
- Default:
$SPLUNK_HOME/etc/anonymizer/dictionary.txt
or%SPLUNK_HOME%\etc\anonymizer\dictionary.txt
- A global list of common words that will not be anonymized, unless overridden by entries in the
private_terms
file. - Here is a sample entry:
- Default:
algol ansi arco arpa arpanet ascii
timestamp_config
- Default:
$SPLUNK_HOME/etc/anonymizer/anonymizer-time.ini
or%SPLUNK_HOME%\etc\anonymizer\anonymizer-time.ini
- File built into Splunk software that determines how timestamps are parsed.
- Default:
Output Files
Splunk's anonymizer function will create three new files in the same directory as the source file.
ANON-filename
- The anonymized version of the source file.
INFO-mapping.txt
- This file contains a list of which terms were anonymized into which strings.
- Here is a sample entry:
Replacement Mappings -------------------- kb900485 --> LO200231 1718 --> 1608 transitions --> tstymnbkxno reboot --> SPLUNK cdrom --> pqyvi
INFO-suggestions.txt
- A report of terms found in the file that, based on their appearance and frequency, you may want to add to
public_terms.txt
or toprivate-terms.txt
or topublic-terms.txt
for more accurate anonymization of your local data. - Here is a sample entry:
- A report of terms found in the file that, based on their appearance and frequency, you may want to add to
Terms to consider making private (currently not scrubbed): ['uid', 'pci', 'lpj', 'hard'] Terms to consider making public (currently scrubbed): ['jun', 'security', 'user', 'ariel', 'name', 'logon', 'for', 'process', 'domain', 'audit']
Linux tip: Anonymize all log files from a diag at once
Here are the steps to generate a diagnostic (diag file) and then anonymize the logs of that diag.
1. Generate the diag: For example:
cd $SPLUNK_HOME/bin ./splunk diag --exclude "*/passwd"
2. Uncompress the diag. For example:
cd <path_to_uncompressed_diag>/ tar xfz my-diag-hostname.tar.gz
3. Run anonymize
on each file of the diag.
If you run this command for all *.log, then make note of the log files that now have a prefix of ANON*.log.
For example:
find <absolute_path_to_uncompressed_diag>/ -name \*.log* | xargs -I{} ./splunk anonymize file -source '{}'
4. Keep all the files that now have a prefix of ANON*.log while deleting the non-anonymized versions in the diag directory.
5. Compress the diag.
tar cfz my-diag-hostname.tar.gz <path_to_uncompressed_diag>
6. Upload the diag, adding it to the Support case, with the ADD FILE button in the case.
Generate a diag | Collect pstacks |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2, 9.4.0
Feedback submitted, thanks!