Splunk Cloud

Search Reference

Download manual as PDF

Download topic as PDF

contingency

Description

In statistics, contingency tables are used to record and analyze the relationship between two or more (usually categorical) variables. Many metrics of association or independence, such as the phi coefficient or the Cramer's V, can be calculated based on contingency tables.

You can use the contingency command to build a contingency table, which in this case is a co-occurrence matrix for the values of two fields in your data. Each cell in the matrix displays the count of events in which both of the cross-tabulated field values exist. This means that the first row and column of this table is made up of values of the two fields. Each cell in the table contains a number that represents the count of events that contain the two values of the field in that row and column combination.

If a relationship or pattern exists between the two fields, you can spot it easily just by analyzing the information in the table. For example, if the column values vary significantly between rows (or vice versa), there is a contingency between the two fields (they are not independent). If there is no contingency, then the two fields are independent.


Syntax

contingency [<contingency-options>...] <field1> <field2>

Required arguments

<field1>
Syntax: <field>
Description: Any field. You cannot specify wildcard characters in the field name.
<field2>
Syntax: <field>
Description: Any field. You cannot specify wildcard characters in the field name.

Optional arguments

contingency-options
Syntax: <maxopts> | <mincover> | <usetotal> | <totalstr>
Description: Options for the contingency table.

Contingency options

maxopts
Syntax: maxrows=<int> | maxcols=<int>
Description: Specify the maximum number of rows or columns to display. If the number of distinct values of the field exceeds this maximum, the least common values are ignored. A value of 0 means a maximum limit on rows or columns. This limit comes from limits.conf [ctable] maxvalues. maxrows=maxvals and maxcols=maxvals.
mincover
Syntax: mincolcover=<num> | minrowcover=<num>
Description: Specify a percentage of values per column or row that you would like represented in the output table. As the table is constructed, enough rows or columns are included to reach this ratio of displayed values to total values for each row or column. The maximum rows or columns take precedence if those values are reached.
'Default:' mincolcover=1.0 and minrowcover=1.0
usetotal
Syntax: usetotal=<bool>
Description: Specify whether or not to add row, column, and complete totals.
Default: true
totalstr
Syntax: totalstr=<field>
Description: Field name for the totals row and column.
Default: TOTAL

Usage

This command builds a contingency table for two fields. If you have fields with many values, you can restrict the number of rows and columns using the maxrows and maxcols parameters. By default, the contingency table displays the row totals, column totals, and a grand total for the counts of events that are represented in the table.

Values which are empty strings ("") will be represented in the table as EMPTY_STR.

Limits

There is a limit on the value of maxrows or maxcols, which is also the default, which means more than 1000 values for either field will not be used.

Examples

Example 1

Build a contingency table to see if there is a relationship between the values of log_level and component.

index=_internal | contingency log_level component maxcols=5

Ex contingency3.png


These results show you any components that might be causing issues in your Splunk deployment. The component field has many values (>50), so this example, uses maxcols to show only five of the values.

Example 2

Build a contingency table to see the installer download patterns from users based on the platform they are running.

host="download"| contingency name platform

Ex contingency2.png

This is pretty straightforward because you don't expect users running one platform to download an installer file for another platform. Here, the contingency command just confirms that these particular fields are not independent. If this chart showed otherwise, for example if a great number of Windows users downloaded the OSX installer, you might want to take a look at your web site to make sure the download resource is correct.

Example 3

This example uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

Earthquakes occurring at a depth of less than 70 km are classified as shallow-focus earthquakes, while those with a focal-depth between 70 and 300 km are commonly termed mid-focus earthquakes. In subduction zones, deep-focus earthquakes may occur at much greater depths (ranging from 300 up to 700 kilometers).

Build a contingency table to look at the relationship between the magnitudes and depths of recent earthquakes.

index=recentquakes | contingency mag depth | sort mag

This search is very simple. But because there are quite a range of values for the Magnitude and Depth fields, the results is a very large matrix. Before building the table, we want to reformat the values of the field:

source=usgs | eval Magnitude=case(mag<=1, "0.0 - 1.0", mag>1 AND mag<=2, "1.1 - 2.0", mag>2 AND mag<=3, "2.1 - 3.0", mag>3 AND mag<=4, "3.1 - 4.0", mag>4 AND mag<=5, "4.1 - 5.0", mag>5 AND mag<=6, "5.1 - 6.0", mag>6 AND mag<=7, "6.1 - 7.0", mag>7,"7.0+") | eval Depth=case(depth<=70, "Shallow", depth>70 AND depth<=300, "Mid", depth>300 AND depth<=700, "Deep") | contingency Magnitude Depth | sort Magnitude

Now, the search uses the eval command with the case() function to redefine the values of Magnitude and Depth, bucketing them into a range of values. For example, the Depth values are redefined as "Shallow", "Mid", or "Deep". This creates a more readable table:

Searchref contingency ex3.1.png

There were a lot of quakes in this 2 week period. Do higher magnitude earthquakes have a greater depth than lower magnitude earthquakes? Not really. The table shows that the majority of the recent earthquakes in all magnitude ranges were shallow. And, there are significantly fewer earthquakes in the mid-to-high range. In this data set, the deep-focused quakes were all in the mid-range of magnitudes.

See also

associate, correlate

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the contingency command.

PREVIOUS
concurrency
  NEXT
convert

This documentation applies to the following versions of Splunk Cloud: 6.5.0, 6.5.1, 6.5.1612, 6.6.0, 6.6.1, 6.6.3


Comments

Based on the docs for both commands, I believe that `cofilter` should be in the `see also` section. I am not sure because I cannot get `cofilter` to work in some test. Perhaps the `cofilter` command doesn't work or is deprecated?

Woodcock
April 8, 2017

Hello Woodcock
Yes, the example you provide is very similar to the contingency command, although the row and column sorting might be different.

Lstewart splunk, Splunker
May 31, 2016

It is worth noting that this command is identical to the following command:

chart count BY <field1> <field2>| addtotals col=t row=t labelfield=<field1>

Woodcock
May 28, 2016

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters