Normalize data

Events from different products and vendors are formatted in different ways, even if the events are semantically equivalent. An application's reports and correlation searches are designed to present a unified view across heterogeneous vendor data formats.

Unlike traditional approaches to providing this unified view, based on normalizing the data into a common schema at time of data collection, Splunk does so based on search-time mappings to a common set of field names and tags that can be defined at any time after the data is already captured, indexed, and available for ad hoc search. This improves flexibility and reduces the risk of data loss by normalizing with late-binding searches instead of index-time parsing.

Use lookups to normalize data

Lookups are used to normalize event data by replacing field names or values with standardized names and values. Lookups can replace field values (such as "severity=med" with "severity=medium") or field names (such as replacing "sev=high" with "severity=high"). The objective of normalization is to use the same names and values between equivalent events from different sources or vendors.

For example, NetScreen firewalls indicate a dropped packet with an action of Drop while a Cisco firewall may indicate the same event with an action of Deny. Additional firewall vendors may use "dropped", "block", "blocked", and so on.

The CIM requires that the action field for firewalls simply be either allowed or blocked. Normalizing this field greatly simplifies the ability to report on all firewall data (as opposed to reporting on it separately by vendor).

Fields and tags

The Common Information Model is based on the idea that you can recognize and apply two components from most log files and raw data:

fields
field category tags

With these two components a savvy knowledge manager should be able to set up their log files in a way that makes them easily processable by Splunk and which normalizes non-compliant log files and forces them to follow a similar schema. The Common Information Model details the standard fields and field category tags that Splunk uses when it processes most IT data.

Normalize the standard event format

This is the recommended format that should be used when events are generated or written to a system:

<timestamp> name="<name>" event_id=<event_id> <key>=<value>

Any number of field key-value pairs are allowed. For example:

2008-11-06 22:29:04 name="Failed Login" event_id=sshd:failure src_ip=10.2.3.4 src_port=12355 dest_ip=192.168.1.35 dest_port=22

The keys are ones that are listed in the "Standard fields" for each data model. name and event_id are mandatory.

When events coming from a CISCO PIX log are compliant with the Common Information Model format, the following PIX event:

Sep  2 15:14:11 10.235.224.193 local4:warn|warning fw07 %PIX-4-106023: Deny icmp src internet:213.208.19.33 dst eservices-test-ses-public:193.8.50.70 (type 8, code 0) by access-group "internet_access_in"

looks like this:

2009-09-02 15:14:11 name="Deny icmp" event_id=106023 vendor=CISCO product=PIX log_level=4 dvc_ip=10.235.224.193 dv_host=fw07 syslog_facility=local4 syslog_priority=warn src_ip=213.208.19.33 dest_ip=193.8.50.70 src_network=internet dest_network=eservices-test-ses-public icmp_type=8 icmp_code=0 protocol=icmp rule_number="internet_access_in"

Data model fields and event category tags

The data model sections in this manual contain lists of standard fields that can be extracted from event data as custom search-time field extractions. Tags for event data are included with each category, if applicable.

Please note that we strongly recommend that all of these field extractions be performed at search time. There is no need to add these fields to the set of default fields that Splunk Enterprise extracts at index time.

For more information about the index time/search time distinction, see "Index time versus search time" in the Managing Indexers and Clusters manual. For more information about performing field extractions at search time, see "Data Interpretation: Fields and Field Extraction" in the Knowledge Manager Manual.

Some of these field extractions are fields that have a narrowly defined set of possible values. For example, in most cases an action field can have only two values: success or failure. Most fields have a wide range of possible values, however. For example, affected_user_id, a six-digit user id number, has a large number of possible values. While the set of possible values for a six-digit user id are finite, you wouldn't try to list all of them.

We have grouped fields together by data models and object categories. In some cases the same field appears in several different categories. This is because the meaning of a field can change depending on the context of the event type it belongs to. For example, in an authentication event, the dest field represents the target of the authentication event (the thing being authenticated). But in an intrusion detection/prevention event, dest usually refers to the destination of the attack detected by the intrusion detection system (the target of the attack).

Note: When Expected values is blank for a field, any value fitting the field's data type can be used.

All fields that expect specific values should also allow an value of "unknown". If the data source does not contain a useful or clear map to an expected value, "unknown" should be used. For example, an action field may expect "allowed" or "blocked", but the data source may sometimes log something that is not understood or does not fit the model. In this case, the action field should be set to "unknown". This is not necessary for blank fields. If the model requires a field and the extracted value is null, a value of "unknown" will be calculated.

Category tags correspond to the event categories described in the field tables. If a tag is listed as required for a particular event category, it should be present for all events that belong to that category. Other tags listed are optional.

Related answers from Splunk Community

Normalize data

Use lookups to normalize data

Fields and tags

Normalize the standard event format

Data model fields and event category tags

Comments

Normalize data

Was this topic useful?