Train Splunk's source type autoclassifier
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Train Splunk's source type autoclassifier
Use these instructions to train Splunk to recognize a new source type, or give it new samples to better recognize a pre-trained sourcetype. Autoclassification training enables Splunk to classify future event data with similar patterns as a specific source type. This can be useful when Splunk is indexing directories that contains data with a mix of source types (such as /var/log). Splunk ships "pre-trained," with the ability to assign sourcetype=syslog to most syslog files.
Note: Keep in mind that source type autoclassification training applies to future event data, not event data that has already been indexed.
You can also bypass auto-classification in favor of hardcoded configurations, and just override a sourcetype for an input, or override a sourcetype for a source. Or configure rule-based source type assignation.
You can also anonymize your file using Splunk's built in anonymizer utility.
If Splunk fails to recognize a common format, or applies an incorrect source type value, you should report the problem to Splunk support and send us a sample file.
via the CLI
Here's what you enter to train source types through the CLI:
# splunk train sourcetype $FILE_NAME $SOURCETYPE_NAME
Fill in $FILE_NAME with the entire path to your file. $SOURCETYPE_NAME is the custom source type you wish to create.
It's usually a good idea to train on a few different samples for any new source type so that Splunk learns how varied a source type can be.
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.