2x2 Chi-squared
contingency table

Chi-Square 2 x 2 Contingency Table

The proportions of two alternatively nominal scaled variables are represented in a contingency table. The Chi-Square test examines whether there is an interrelation between the two variables or not.

Requirements:

Every observation must be assignable unambiguously to exactly one cell.
The expected frequency should not fall below 7 for each cell. If there are expected frequencies below 7 the Fisher’s exact test should be used instead.

Hypothesis:

H0: The two variables are independent

H1: The two variables are dependent

 

For the assumption that the two variables are independent (H0) the following holds for the probabilities of the four cells:

 

 

 

The expected frequencies under H0 can be obtained by multiplication of the cell probability by the total number of observations.

Probability Table:

 

 

Var. B

 

Cat. 1

Cat. 2

Var. A

Cat. 1

p(A1∩B1)

p(A1∩B2)

p(A1)

Cat. 2

p(A2∩B1

p(A2∩B2)

p(A2)

 

p(B1)

p(B2)


 

Frequency Table:

 

 

Var. B

 

Cat. 1

Cat. 2

Var. A

Cat. 1

fo(A1B1)
fe(A1B1)

fo(A1B2)
fe(A1B2)

fo(A1)

Cat. 2

fo(A2B1)
fe(A2B1)

fo(A2B2)
fe(A2B2)

fo(A2)

 

fo(B1)

fo(B2)

ftotal

 

For each cell the expected frequency is estimated in the following way:

 

  

 

whereas fe means expected frequency, fo means observed frequency.

The following term is a measure for the deviation between the observed and expected frequencies, and it is approximately Chi-Square distributed:

 

  

 

Chi-Square is the sum over all cells of the squared cells’ residual (fo - fe ) divided by the cells’ expected frequency fe.

Degrees of freedom are determined as follows:

  • If  ,  ,  and  are known
    df = number of cells - 1 = 3
  • If  ,  ,  and  are estimated from the sample
    df = (number of rows - 1)*(number of columns - 1) = 1

 

Continuity Correction after Yates:

The continuity correction after Yates considers the fact that frequency and Chi-Square values are different, the first being discrete the second being continuous.

The Chi-Square value is corrected the following way:

 

  

 

Interpretation of a significant result:

The interrelation between the two variables is expressed by the deviance of the cells observed percentages  from the row or column percentages  and  .

The standardized residual is another measure for the strength of the deviance of the cells observed frequency from its expected frequency. For each cell it is calculated as follows:

 

  

 

For sufficient big sample sizes the standardized residual is comparable to a z-value. As a rule of thumb a standardized residual of –2 or less indicates that the cells observed frequency is significantly lower than its expected frequency and a standard residual of +2 or more indicates that the cells observed frequency is significantly higher than its expected frequency.


Example of a 2 x 2 contingency table and chi-squared test

Accidents were recorded and classified according to the severity of the accident (Heavy, Slight) and the blood alcohol level of the driver (No Alcohol, Alcohol). Heavy accidents are accidents with dead and injured persons. Slight accidents are accidents with property damage only. Drivers who had a blood alcohol level of 0 were assingned to the No Alcohol group. Drivers who had a blood alcohol level of greater than 0 were assigned to the Alcohol group.

The observed frequencies are diplayed in the contingency table below:

 

 

Alcohol

 

0

>0

Total

Accident

Heavy

320

56

376

Slight

831

51

882

 

Total

1151

107

1258

 

The expected frequencies are estimated from the sample and displayed in the following table:

 

 

Alcohol

 

0

>0

Total

Accident

Heavy

344.02

31.98

376

Slight

806.98

75.02

882

 

Total

1151

107

1258

 

The residuals (fo - fe ) are diplayed in the following table:

 

 

Alcohol

0

>0

Accident

Heavy

24.02

-24.02

Slight

-24.02

24.02

 

For Chi-Square we get:

 

  

 

  

 

df = (2-1)(2-1) = 1

Critical Chi-Square Value 5 % = 3.84

 

The obtained Chi-Square is greater than the critical Chi-Square value. Hence, there must be some relationship between the two variables.

The contigency table is displayed again. The cells now include the row and column percentages and the standardized residuals:

 

 

Alcohol

 

0

>0

Total

Accident

Heavy
Observed
Expected
Row%
Column%
Std. Res.


320
344.02
85.11%
27.8%
-1.29


56
31.98
14.89%
52.34%
4.25


376
376
100%
29.89%

Slight
Observed
Expected
Row%
Column%
Std. Res.


831
806.98
94.22%
72.2%
0.85


51
75.02
5.78%
47.66%
-2.77


822
882
100%
70.11%

 

Total
Observed
Expected
Row%
Column%


1151
1151
91.49%
100%


107
107
8.51%
100%


1258
1258
100%
100%

 

We can either compare the column percentages of each cell with the total column percentages. For Heavy accidents we see, that total column percentage is 29.89 % and Slight acccidents it is 70.11 %. Further the percentages in the cells are 85.11 % (Heavy x no Alcohol), 94.22 % (Slight x no Alcohol) and 14.89 % (Heavy x Alcohol), 5.78 % (Slight Alcohol). For the no Alcohol cells the column percentages match the total column percentages quite well (27,8 % compared to 29.89 % and 72.2 % compared to 70.11 %), whereas the column percentages in the Alcohol cells deviate strongly from the total column percentages (52.34 % compared to 29.89 % and 47.66 % compared to 70.11 %). If we look closer to the Alcohol cells it is obvious, that there are more heavy accidents and less slight accident than expected.

The standard residuals confirm this finding. In the “Heavy Accident x Alcohol” cell the standard residual is 4.25, indicating that in this cell there are far more observations than expected. In the “Slight Accident x Alcohol” cell the standard residual is –2.77, indicating that in this cell there are less observations than expected.

 

In short: Alocohol leads to significant more heavy accidents.

 

BrightStat Output of this example

 

This is a fictitious example


Wiki link


References

Yates, F. (1934). Contingency table involving small numbers and the χ2 test. Supplement to the Journal of the Royal Statistical Society 1 (2), 217 – 235. JSTOR 2983604.

Bortz, J. (2005). Statistik für Human- und Sozialwissenschaftler (6th Edition). Heidelberg: Springer Medizin Verlag.

Conover, W.J. (1999). Practical nonparametric Statistics.(3rd edition). Wiley.




 

Gallery

 
 
map kinase