contingency
Contents
contingency
In statistics, contingency tables are used to record and analyze the relationship between two or more (usually categorical) variables. Many metrics of association or independence, such as the phi coefficient or the Cramer's V, can be calculated based on contingency tables.
You can use the contingency command to build a contingency table, which in this case is a co-occurrence matrix for the values of two fields in your data. Each cell in the matrix displays the count of events in which both of the cross-tabulated field values exist. This means that the first row and column of this table is made up of values of the two fields. Each cell in the table contains a number that represents the count of events that contain the two values of the field in that row and column combination.
If a relationship or pattern exists between the two fields, you can spot it easily just by analyzing the information in the table. For example, if the column values vary significantly between rows (or vice versa), there is a contingency between the two fields (they are not independent). If there is no contingency, then the two fields are independent.
Synopsis
Builds a contingency table for two fields.
Syntax
contingency [<contingency-option>]* <field> <field>
Required arguments
- <field>
- Syntax: <field>
- Description: Any field, non wildcarded.
Optional arguments
- contingency-option
- Syntax: <maxopts> | <mincover> | <usetotal> | <totalstr>
- Description: Options for the contingency table.
Contingency option
- maxopts
- Syntax: maxrows=<int> | maxcols=<int>
- Description: Specify the maximum number of rows or columns to display. If the number of distinct values of the field exceeds this maximum, the least common values will be ignored. A value of 0 means unlimited rows or columns. By default,
maxrows=0andmaxcols=0.
- mincover
- Syntax: mincolcover=<num> | minrowcover=<num>
- Description: Specify the minimum percentage of values for the row or column field. If the number of entries needed to cover the required percentage of values exceeds
maxrowsormaxcols,maxrowsormaxcolstakes precedence. By default,mincolcover=1.0andminrowcover=1.0.
- usetotal
- Syntax: usetotal=<bool>
- Description: Specify whether or not to add row and column totals. Default is
usetotal=true.
- totalstr
- Syntax: totalstr=<field>
- Description: Field name for the totals row and column. Default is
totalstr=TOTAL.
Description
This command builds a contingency table for two fields. If you have fields with many values, you can restrict the number of rows and columns using the maxrows and maxcols parameters. By default, the contingency table displays the row totals, column totals, and a grand total for the counts of events that are represented in the table.
Examples
Example 1
Build a contingency table to see if there is a relationship between the values of log_level and component.
index=_internal | contingency log_level component maxcols=5
These results show you at-a-glance what components, if any, may be causing issues in your Splunk instance. The component field has many values (>50), so this example, uses maxcols to show only five of the values.
Example 2
Build a contingency table to see the installer download patterns from users based on the platform they are running.
host="download"| contingency name platformThis is pretty straightforward because you don't expect users running one platform to download an installer file for another platform. Here, the contingency command just confirms that these particular fields are not independent. If this chart showed otherwise, for example if a great number of Windows users downloaded the OSX installer, you might want to take a look at your web site to make sure the download resource is correct.
Example 3
| This example uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains the source network (Src), ID (Eqid), version, date, location, magnitude, depth (km) and number of reporting stations (NST) for each earthquake over the last 7 days.
Download the text file, M 2.5+ earthquakes, past 7 days, save it as a CSV file, and upload it to Splunk. Splunk should extract the fields automatically. Note that you'll be seeing data from the 7 days previous to your download, so your results will vary from the ones displayed below. (Here, the CSV file is uploaded to the custom index Earthquakes occurring at a depth of less than 70 km are classified as shallow-focus earthquakes, while those with a focal-depth between 70 and 300 km are commonly termed mid-focus earthquakes. In subduction zones, deep-focus earthquakes may occur at much greater depths (ranging from 300 up to 700 kilometers). |
Build a contingency table to look at the relationship between the magnitudes and depths of recent earthquakes.
index=recentquakes | contingency Magnitude Depth | sort MagnitudeThis search is very simple. But because there are quite a range of values for the Magnitude and Depth fields, the results is a very large matrix. Before building the table, we want to reformat the values of the field:
index=recentquakes | eval Magnitude=case(Magnitude<=1, "0.0 - 1.0", Magnitude>1 AND Magnitude<=2, "1.1 - 2.0", Magnitude>2 AND Magnitude<=3, "2.1 - 3.0", Magnitude>3 AND Magnitude<=4, "3.1 - 4.0", Magnitude>4 AND Magnitude<=5, "4.1 - 5.0", Magnitude>5 AND Magnitude<=6, "5.1 - 6.0", Magnitude>6 AND Magnitude<=7, "6.1 - 7.0", Magnitude>7,"7.0+") | eval Depth=case(Depth<=70, "Shallow", Depth>70 AND Depth<=300, "Mid", Depth>300 AND Depth<=700, "Deep") | contingency Magnitude Depth | sort MagnitudeNow, the search uses the eval command with the case() function to redefine the values of Magnitude and Depth, bucketing them into a range of values. For example, the Depth values are redefined as "Shallow", "Mid", or "Deep". This creates a more readable table:
There were a lot of quakes in this 2 week period. Do higher magnitude earthquakes have a greater depth than lower magnitude earthquakes? Not really. The table shows that the majority of the recent earthquakes in all magnitude ranges were shallow. And, there are significantly fewer earthquakes in the mid-to-high range. In this data set, the deep-focused quakes were all in the mid-range of magnitudes.
See also
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the contingency command.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 View the Article History for its revisions.


