Use the CIM to normalize data at search time

If you are working with a new data source, you can manipulate your already-indexed data at search time so that it conforms to the common standard used by other Splunk applications and their dashboards. Your goal might be to create a new application or add-on specific to this data source for use with Splunk Enterprise Security or other existing applications, or you might just want to normalize the data for your own dashboards.

This topic guides you through the steps to normalize your data to the Common Information Model, following established best practices.

Before you start, keep in mind that someone else might have already built an add-on to normalize the data you have in mind. Check Splunkbase for CIM-compatible apps and add-ons that match your requirements.

1. Get your data in

If you have not already done so, get your data into the Splunk platform.

Do not be concerned about making your data conform to the CIM in the parsing or indexing phase. You normalize your data to be CIM compliant at search time. See Getting Data In if you need more direction for capturing and indexing your data.

2. Examine your data in the context of the CIM

Determine which data models are relevant for the data source you are working with.

Use the CIM reference tables to find fields that are relevant to your domain and your data. You might need to normalize data from a single event or source of events against more than one data model. Some events may be logs tracking CRUD changes to a system, others may log the login/logout activities for that system. For each different kind of event, look for data models that match the context of your data. For example, CRUD events map to the Change Analysis data model. Login events map to the Authentication data model.

Refer to How to use these reference tables for a description of how to compare the information in the reference tables with the data models in the Data Model Editor page in Splunk Web. Keep both the documentation and the Data Model Editor open for reference, because you need to refer to them in the following steps.

3. Configure CIM-compliant event tags

Apply tags to categorize your event data according to type.

Categorizing your data allows you to specify the dashboards in which the data should appear, something that cannot necessarily be determined just by field names and sources. Many of the CIM data models have the same field names, so the tags act as constraints to filter the data to just the relevant events for that model. Also, many different sources may produce events relevant to a particular data model. For example, web applications, VPN servers, and email servers all have authentication events, yet the source and structure of these authentication events are considerably different for each type of device. Tagging all of the authentication related events appropriately makes it possible for your dashboards to pull data from the correct events automatically.

To apply the CIM-compliant tags to your data, follow these steps.

1. Determine what tags are necessary for your data. Refer to the data models that use similar domain data to choose what tags from the Common Information Model are needed. Remember to look for inherited tags from parent datasets. See How to use these reference tables for more information.

2. Create the appropriate event types using the Event types manager in Splunk Web by accessing Settings > Event types. You can also edit the eventtypes.conf file directly. For detailed instructions, refer to the Data Classification: Event types and transactions chapter of the Knowledge Manager Manual, part of the Splunk Enterprise documentation.

3. Create the appropriate tags in Splunk Web. Click Settings > Event types, locate the event type that you want to tag and click on its name. On the detail page for the event type, add or edit tags in the Tags field, then click Save. You can also edit the tags.conf file directly. For example:

[eventtype=nessus]
vulnerability = enabled
report = enabled

For more detailed information about managing tags in Splunk Web, see Data normalization: tags and aliases in the Knowledge Manager Manual, part of the Splunk Enterprise documentation.

Repeat this process for each of the tags needed to to map your events to the correct datasets in the data models. These event type and tag modifications that you make are saved in $SPLUNK_HOME/etc/users/$USERNAME$/$APPNAME$/local/eventtypes.conf and $SPLUNK_HOME/etc/users/$USERNAME$/$APPNAME$/local/tags.conf.

4. Verify tags

To verify that the data is tagged correctly, display the event type tags and review the events.

1. Search for the source type.

2. Use the field picker to display the field tag::eventtype at the bottom of each event.

3. Look at your events to verify that they are tagged correctly.

4. If you created more than one event type, also check that each event type is finding the events you intended.

5. Make your fields CIM-compliant

Examine the fields available in the data model, and look for the equivalent fields in your indexed data. Some, or perhaps many, of the fields may already be present with the correct field names and value types that match the expectations of the Common Information Model. If you are not certain if your values match what is expected by the model, check the description of that field in the data model reference tables in this documentation.

Make note of all fields in the data model that do not correspond exactly to your event data. Some may not exist in your data, have different field names, or have the correct field names but have values that do not match the expected type of the model. One by one, normalize your data for each of these fields using a combination of field aliases, field extractions, and lookups.

a. Create field aliases to normalize field names

First, look for field alias opportunities. Determine whether any existing fields in your data have different names than the names expected by the data models. For example, the Web data model has a field called http_referrer. This field may be misspelled as http_referer in your source data. Define field aliases to capture the differently named field in your original data and map it to the field name that the CIM expects.

Also check your existing fields for field names that match the CIM field names but do not match the expected values, as described in the "Description" field in the data model reference tables. Your event may have an extracted field such as id that refers to the name of a completely different entity than the description of the field id in the CIM data model. Define a field alias to rename the id field from your indexed data to something else, such as vendor_id, to divert that data from spuriously appearing in reports and dashboards for which it is not intended. To capture the correct id field that you need for CIM compliance, you can either extract the field from elsewhere in your event, or write a lookup file to add that field from a csv file.

See Add aliases to fields in the Splunk Enterprise documentation for more information about aliasing fields.

b. Create field extractions to extract new fields

After you have aliased all the fields you can, you can work on adding the fields that are missing. When the values that you need exist in the event data, extract the necessary fields using the field extraction capabilities of the Splunk platform. Be sure to name the fields to exactly match the field names in the CIM data models.

See Build field extractions with the field extractor and Create and maintain search-time field extractions through configuration files in the Splunk Enterprise documentation for more information.

c. Write lookups to add fields and normalize field values

After you have aliased or extracted all the fields that you can in your indexed data, you may have to create lookup files to finish normalizing your data.

There are two reasons to create lookup files:

Add fields that cannot be extracted from the event data. For example, your events may not contain the name of the vendor, product, or app of the system logging the event, but the data model you are mapping to expects all three of these fields. In this case, populate a csv file with the source type(s) generating the events and map each to the appropriate vendor name, product name, and application name.
Normalize field values to make them compliant with the CIM. For example, the Network Traffic data model includes a rule field which expects string values that define the action taken in the network event. If your network traffic data contains a numeric value for the field rule, create a field alias for that field to something like rule_id so that it does not clash with the rule field expected by the data model, which must be a string. Then, add a lookup to map your rule_id values to a new rule field with their corresponding string values.

See About lookups in the Knowledge Manager Manual for more information.

d. Verify fields and values

After you finish normalizing your fields and values, validate that the fields appear in your events as you intended.

1. Search for the source type containing the data you are working to map to the CIM.

2. Use the field picker to select all the fields you just aliased, extracted, or looked up in your data.

3. Scan your events and verify that each field is populated correctly.

4. If one of the normalized fields has an incorrect value, edit the extraction, re-alias the field, or correct your lookup file to correct the value.

6. Validate your data against the data model

After you have added your event tags and normalized your data by adding fields, aliasing fields, and writing lookups, the data from your source type should map to the CIM data models that you targeted. You can validate that your data is fully CIM compliant by using the data model itself, either via Pivot or by searching using the datamodel command.

a. Validate using Pivot

Validate your data using Pivot with specific goals in mind. For each field that you normalized within each unique event type, think of a useful metric that you can build with Pivot to assess whether your data appears as you expect.

For example, if you are testing your authentication data, you might use Pivot to check whether your own login activity appears in your data.

1. In the Search and Reporting app, click Pivot.

2. Select the data model against which you want to validate your data, then click into a relevant dataset in the model. For the example above, select Authentication, then Successful Authentication.

3. Set the date range to an appropriate time range to speed up the search. For the example above, set it to Last 15 minutes if you just recently logged in to the system.

4. Apply a filter to match your source type.

5. Split rows and columns by other relevant attributes in the model. For example, you might split the rows by user to see a list of usernames that have logged in during the past 15 minutes.

b. Validate using the `datamodel` command

1. Open the Search and Reporting app.

2. Construct a search using the datamodel command, a filter for your source type, the table command, and the field summary command. Here is the recommended structure:

| datamodel <Data_Model> <Data_Model_Dataset> search 
| search sourcetype=<your:sourcetype> 
| table * | fields - <List any statistics columns that you do not want to display> | fieldsummary

This structure, when applied to check that Cisco ISE data is mapping as expected to the Authentication data model for successful login activities, looks like this:

| datamodel Authentication Successful_Authentication search 
| search sourcetype=cisco:ise* | table * 
| fields - date_* host index punct _raw time* splunk_server sourcetype source eventtype linecount 
| fieldsummary

3. Observe the results. The datamodel command performs a query against the data model and returns a list of all fields in the model, some statistics about them, and sample output from your data in the values column. You can configure which statistics columns display with the | fields - portion of the search string. To flag problems with your field normalizations, scan this table to look for empty values, incorrect values, or statistics that do not match your expectations.

Here is the result using the example search string above.

For more information about the datamodel command, see the datamodel entry in the Search Reference manual.

7. (Optional) Package your configurations as an add-on

Now that you have tested your field extractions, lookups, and tags, you can choose to package the search-time configurations as an add-on and publish it to the community. Using your add-on, other Splunk platform users with the same data source can map their data to the CIM without having to repeat the steps you completed above.

See Package and publish a Splunk app on the Developer Portal.

Related answers from Splunk Community

Use the CIM to normalize data at search time

1. Get your data in

2. Examine your data in the context of the CIM

3. Configure CIM-compliant event tags

4. Verify tags

5. Make your fields CIM-compliant

a. Create field aliases to normalize field names

b. Create field extractions to extract new fields

c. Write lookups to add fields and normalize field values

d. Verify fields and values

6. Validate your data against the data model

a. Validate using Pivot

b. Validate using the datamodel command

7. (Optional) Package your configurations as an add-on

Comments

Use the CIM to normalize data at search time

Was this topic useful?

b. Validate using the `datamodel` command