Splunk® Enterprise

Search Reference

associate

Description

The associate command identifies correlations between fields. The command tries to find a relationship between pairs of fields by calculating a change in entropy based on their values. This entropy represents whether knowing the value of one field helps to predict the value of another field.

In Information Theory, entropy is defined as a measure of the uncertainty associated with a random variable. In this case if a field has only one unique value, the field has an entropy of zero. If the field has multiple values, the more evenly those values are distributed, the higher the entropy.

The associate command uses Shannon entropy (log base 2). The unit is in bits.

Syntax

associate [<associate-options>...] [field-list]

Required arguments

None.

Optional arguments

associate-option
Syntax: supcnt | supfreq | improv
Description: Options for the associate command. See the Associate-options section.
field-list
Syntax: <field> ...
Description: A list of one or more fields. You cannot use wildcard characters in the field list. If you specify a list of fields, the analysis is restricted to only those fields.
Default: All fields are analyzed.

Associate-options

supcnt
Syntax: supcnt=<num>
Description: Specifies the minimum number of times that the "reference key=reference value" combination must appear. Must be a non-negative integer.
Default: 100
supfreq
Syntax: supfreq=<num>
Description: Specifies the minimum frequency of "reference key=reference value" combination as a fraction of the number of total events.
Default: 0.1
improv
Syntax: improv=<num>
Description: Specifies a limit, or minimum entropy improvement, for the "target key". The calculated entropy improvement must be greater than or equal to this limit.
Default: 0.5

Columns in the output table

The associate command outputs a table with columns containing the following fields.

Field Description
Reference_Key The name of the first field in a pair of fields.
Reference_Value The value in the first field in a pair of fields.
Target_Key The name of the second field in a pair of fields.
Unconditional_Entropy The entropy of the target key.
Conditional_Entropy The entropy of the target key when the reference key is the reference value.
Entropy_Improvement The difference between the unconditional entropy and the conditional entropy.
Description A message that summarizes the relationship between the field values that is based on the entropy calculations. The Description is a textual representation of the result. It is written in the format: "When the 'Reference_Key' has the value 'Reference_Value', the entropy of 'Target_Key' decreases from Unconditional_Entropy to Conditional_Entropy."
Support Specifies how often the reference field is the reference value, relative to the total number of events. For example, how often field A is equal to value X, in the total number of events.

Examples

1. Analyze the relationship between fields in web access log files

This example uses the sample data from the Search Tutorial. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range Yesterday when you run the search.

This example demonstrates one way to analyze the relationship of fields in your web access logs.

sourcetype=access_* status!=200 | fields method, status | associate improv=0.05 | table Reference_Key, Reference_Value, Target_Key, Top_Conditional_Value, Description

The first part of this search retrieves web access events that returned a status that is not 200. Web access data contains many fields. You can use the associate command to see a relationship between all pairs of fields and values in your data. To simplify this example, restrict the search to two fields: method and status.

Because the associate command adds many columns to the output, this search uses the table command to display only select columns.

The results appear on the Statistics tab and look something like this:

Reference_Key Reference_Value Target_Key Top_Conditional_Value Description
method POST status 503 (17.44% -> 33.96%) When 'method' has the value 'POST', the entropy of 'status' decreases from 2.923 to 2.729.
status 400 method GET (76.37% -> 83.45%) When 'status' has the value '400', the entropy of 'method' decreases from 0.789 to 0.647.
status 404 method GET (76.37% -> 81.27%) When 'status' has the value '404', the entropy of 'method' decreases from 0.789 to 0.696.
status 406 method GET (76.37% -> 81.69%) When 'status' has the value '406', the entropy of 'method' decreases from 0.789 to 0.687.
status 408 method GET (76.37% -> 80.00%) When 'status' has the value '408', the entropy of 'method' decreases from 0.789 to 0.722.
status 500 method GET (76.37% -> 80.73%) When 'status' has the value '500', the entropy of 'method' decreases from 0.789 to 0.707.

In the results you can see that there is one method and five status values in the results.

From the first row of results, you can see that when method=POST, the status field is 503 for those events. The associate command concludes that, if method=POST, the Top_Conditional_Value is likely to be 503 as much as 33% of the time.

The Reference_Key and Reference_Value are being correlated to the Target_Key.

The Top_Conditional_Value field states three things:

  • The most common value for the given Reference_Value.
  • The frequency of the Reference_Value for that field in the dataset, sometimes referred to as FRV.
  • The frequency of the most common associated value in the Target_Key for the events that have the specific Reference_Value in that Reference Key. Sometimes referred to as the FCV.

The values in the Top_Conditional_Value field are formatted as "CV (FRV% -> FCV%)", for example GET (76.37% -> 83.45%).

2. Return results that have at least 3 references to each other

Return results associated with each other (that have at least 3 references to each other).

index=_internal sourcetype=splunkd | associate supcnt=3

3. Analyze events from a host

Analyze all events from host "reports" and return results associated with each other.

host="reports" | associate supcnt=50 supfreq=0.2 improv=0.5

See also

arules, correlate, contingency

Last modified on 27 June, 2024
arules   autoregress

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.9, 8.0.10, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.2.0, 9.2.1, 9.2.2, 9.3.0, 8.0.8, 8.1.0, 8.1.10, 8.1.11


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters