associate command identifies correlations between fields. The command tries to find a relationship between pairs of fields by calculating a change in entropy based on their values. This entropy represents whether knowing the value of one field helps to predict the value of another field.
In Information Theory, entropy is defined as a measure of the uncertainty associated with a random variable. In this case if a field has only one unique value, the field has an entropy of zero. If the field has multiple values, the more evenly those values are distributed, the higher the entropy.
associate command uses Shannon entropy (log base 2). The unit is in
associate [<associate-options>...] [field-list]
- Syntax: supcnt | supfreq | improv
- Description: Options for the associate command. See the Associate-options section.
- Syntax: <field> ...
- Description: A list of one or more fields. You cannot use wildcard characters in the field list. If you specify a list of fields, the analysis is restricted to only those fields.
- Default: All fields are analyzed.
- Syntax: supcnt=<num>
- Description: Specifies the minimum number of times that the "reference key=reference value" combination must appear. Must be a non-negative integer.
- Default: 100
- Syntax: supfreq=<num>
- Description: Specifies the minimum frequency of "reference key=reference value" combination as a fraction of the number of total events.
- Default: 0.1
- Syntax: improv=<num>
- Description: Specifies a limit, or minimum entropy improvement, for the "target key". The calculated entropy improvement must be greater than or equal to this limit.
- Default: 0.5
Columns in the output table
associate command outputs a table with columns containing the following fields.
||The name of the first field in a pair of fields.|
||The value in the first field in a pair of fields.|
||The name of the second field in a pair of fields.|
||The entropy of the target key.|
||The entropy of the target key when the reference key is the reference value.|
||The difference between the unconditional entropy and the conditional entropy.|
|| A message that summarizes the relationship between the field values that is based on the entropy calculations. The |
||Specifies how often the reference field is the reference value, relative to the total number of events. For example, how often field A is equal to value X, in the total number of events.|
1. Analyze the relationship between fields in web access log files
This example demonstrates one way to analyze the relationship of fields in your web access logs.
sourcetype=access_* status!=200 | fields method, status | associate | table Reference_Key, Reference_Value, Target_Key, Top_Conditional_Value, Description
The first part of this search retrieves web access events that returned a status that is not 200. Web access data contains many fields. You can use the
associate command to see a relationship between all pairs of fields and values in your data. To simplify this example, restrict the search to two fields:
Also, to simplify the output, use the
table command to display only select columns.
For this particular result set, (you can see in the Fields area, to the left of the results area) there are:
methodvalues: POST and GET
statusvalues: 301, 302, 304, 404, and 503
From the first row of results, you can see that when
status field is
302 for all of those events. The
associate command concludes that, if
status is likely to be
302. You can see this same conclusion in the third row, which references
status=302 to predict the value of
The Reference_Key and Reference_Value are being correlated to the Target_Key.
The Top_Conditional_Value field states three things:
- The most common value for the given Reference_Value
- The frequency of the Reference_Value for that field in the dataset
- The frequency of the most common associated value in the Target_Key for the events that have the specific Reference_Value in that Reference Key.
It is formatted to read "CV (FRV% -> FCV%)" where CV is the conditional Value, FRV is is the percentage occurrence of the reference value, and FCV is the percentage of occurrence for that conditional value, in the case of the reference value.
Note: This example uses sample data from the Splunk Tutorial. which you can download and add to run this search and see these results. For more information, refer to "Upload the tutorial data" in the Search Tutorial.
2. Return results that have at least 3 references to each other
Return results associated with each other (that have at least 3 references to each other).
index=_internal sourcetype=splunkd | associate supcnt=3
3. Analyze events from a host
Analyze all events from host "reports" and return results associated with each other.
host="reports" | associate supcnt=50 supfreq=0.2 improv=0.5
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the associate command.
This documentation applies to the following versions of Splunk Cloud™: 7.0.0, 6.5.1, 6.5.1612, 6.6.0, 6.6.1, 6.6.3, 6.5.0