associate
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
associate
The associate command tries to find a relationship between pairs of fields by calculating a change in entropy based on their values. This entropy represents whether knowing the value of one field helps to predict the value of another field.
In Information Theory, entropy is defined as a measure of the uncertainty associated with a random variable. In this case, if a field has only one unique value, it has an entropy of zero. If it has multiple values, the more evenly those values are distributed, the higher the entropy.
Synopsis
Identifies correlations between fields.
Syntax
associate [associate-option]* [field-list]
Optional arguments
- associate-option
- Syntax: supcnt | supfreq | improv
- Description: Options for the associate command.
- field-list
- Syntax: <field>, ...
- Description: List of fields, non-wildcarded. If a list of fields is provided, analysis will be restricted to only those fields. By default all fields are used.
Associate options
- supcnt
- Syntax: supcnt=<num>
- Description: Specify the minimum number of times that the "reference key=reference value" combination must appear. Must be a non-negative integer. Defaults to 100.
- supfreq
- Syntax: supfreq=<num>
- Description: Specify the minimum frequency of "reference key=reference value" combination as a fraction of the number of total events. Defaults to 0.1.
- improv
- Syntax: improv=<num>
- Description: Specify a limit, or minimum entropy improvement, for the "target key". The resulting calculated entropy improvement, which is the difference between the unconditional entropy (the entropy of the target key) and the conditional entropy (the entropy of the target key, when the reference key is the reference value) must be greater than or equal to this limit. Defaults to 0.5.
Description
The associate command outputs a table with columns that include the fields that are analyzed (Reference_Key, Reference_Value, and Target_Key), the entropy that is calculated for each pair of field values (Unconditional_Entropy, Conditional_Entropy, and Entropy_Improvement), and a message that summarizes the relationship between the fields values that is deduced based on the entropy calculation (Description).
The Description is intended as a user-friendly representation of the result, and is written in the format: "When the 'Reference_Key' has the value 'Reference_Value', the entropy of 'Target_Key' decreases from Unconditional_Entropy to Conditional_Entropy."
Examples
Example 1: This example demonstrates how you might analyze the relationship of fields in your web access logs.
sourcetype=access_* NOT status=200 | fields method, status | associate | table Reference_Key, Reference_Value, Target_Key, Top_Conditional_Value, DescriptionThe first part of this search retrieves web access events that returned a status that is not 200. Web access data contains a lot of fields and you can use the associate command to see a relationship between all pairs of fields and values in your data. To simplify this example, we restrict the search to two fields: method and status. Also, the associate command outputs a number of columns (see Description) that, for now, we won't go into; so, we use the table command to display only the columns we want to see. The result looks something like this:
For this particular result set, (you can see in the Fields area, to the left of the results area) there are:
- two
methodvalues: POST and GET - five
statusvalues: 301, 302, 304, 404, and 503
The first row of the results tells you that when method=POST, the status field is 302 for all of those events. The associate command concludes that, if method=POST, the status is likely to be 302. You can see this same conclusion in the third row, which references status=302 to predict the value of method.
The Reference_Key and Reference_Value are being correlated to the the Target_Key. The Top_Conditional_Value field states three things: the most common value for the given Reference_Value, the frequency of the Reference_Value for that field in the dataset, and the frequency of the most common associated value in the Target_Key for the events that have the specific Reference_Value in that Reference Key. It is formatted "CV (FRV% -> FCV%)" where CV is the conditional Value, FRV is is the percentage occurrence of the reference value, and FCV is the percentage of occurence for that conditional value, in the case of the reference value.
Note: This example uses sample data from the Splunk Tutorial. which you can download and add to run this search and see these results. For more information, refer to the "Add data tutorial" in the User Manual.
Example 2: Return results associated with each other (that have at least 3 references to each other).
index=_internal sourcetype=splunkd | associate supcnt=3Example 3: Analyze all events from host "reports" and return results associated with each other.
host="reports" | associate supcnt=50 supfreq=0.2 improv=0.5See also
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the associate command.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 View the Article History for its revisions.
