Search Reference

 


correlate

correlate

Description

Calculates the correlation between different fields.

You can use the correlate command to see an overview of the co-occurrence between fields in your data. The results are presented in a matrix format, where the cross tabulation of two fields is a cell value. The cell value represents the percentage of times that the two fields exist in the same events.

The field the result is specific to is named in the value of the RowField field, while the fields it is compared against are the names of the other fields.

Note: This command looks at the relationship among all the fields in a set of search results. If you want to analyze the relationship between the values of fields, refer to the contingency command, which counts the co-ocurrence of pairs of field values in events.

Syntax

correlate

Limits

There is a limit on the number of fields that correlate considers in a search. From limits.conf, stanza [correlate], the maxfields sets this ceiling. The default is 1000.

If more than this many fields are encountered, the correlate command continues to process data for the first N (eg thousand) field names encountered, but ignores data for additional fields. If this occurs, the notification from the search or alert contains a message "correlate: input fields limit (N) reached. Some fields may have been ignored."

As with all designed-in limits, adjusting this might have significant memory or cpu costs.

Examples

Example 1:

Look at the co-occurrence between all fields in the _internal index.

index=_internal | correlate

Here is a snapshot of the results.

Ex correlate.png

Because there are different types of logs in the _internal, you can expect to see that many of the fields do not co-occur.

Example 2:

Calculate the co-occurrences between all fields in Web access events.

sourcetype=access_* | correlate

You expect all Web access events to share the same fields: clientip, referer, method, and so on. But, because the sourcetype=access_* includes both access_common and access_combined Apache log formats, you should see that the percentages of some of the fields are less than 1.0.

Example 3:

Calculate the co-occurrences between all the fields in download events.

eventtype=download | correlate

The more narrow your search is before you pass the results into correlate, the more likely it is that all the field value pairs have a correlation of 1.0. A correlation of 1.0 means the values co-occur in 100% of the search results. For these download events, you might be able to spot an issue depending on which pairs have less than 1.0 co-occurrence.

See also

associate, contingency

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the correlate command.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 4.3.7 , 5.0 , 5.0.1 , 5.0.2 , 5.0.3 , 5.0.4 , 5.0.5 , 5.0.6 , 5.0.7 , 5.0.8 , 5.0.9 , 5.0.10 , 5.0.11 , 5.0.12 , 5.0.13 , 6.0 , 6.0.1 , 6.0.2 , 6.0.3 , 6.0.4 , 6.0.5 , 6.0.6 , 6.0.7 , 6.0.8 , 6.0.9 , 6.1 , 6.1.1 , 6.1.2 , 6.1.3 , 6.1.4 , 6.1.5 , 6.1.6 , 6.1.7 , 6.1.8 , 6.2.0 , 6.2.1 , 6.2.2 , 6.2.3 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!