Chi-Square Tests | Statistics Cheat Sheet

Overview

Chi-square tests are used for categorical data to test goodness-of-fit (single variable) and independence (two variables).

\chi^2 = \sum \frac{(O - E)^2}{E}

Where:

Tests whether observed frequencies match expected frequencies.

E = n \times p

Where $n$ = total observations, $p$ = hypothesized proportion

df = k - 1

Where $k$ = number of categories

Tests whether two categorical variables are associated.

E = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}

df = (r - 1)(c - 1)

Where $r$ = rows, $c$ = columns

Testing if a die is fair (600 rolls):

Face	1	2	3	4	5	6
$O$	92	108	97	103	88	112
$E$	100	100	100	100	100	100

\chi^2 = \frac{(92-100)^2}{100} + \frac{(108-100)^2}{100} + \frac{(97-100)^2}{100} + \frac{(103-100)^2}{100} + \frac{(88-100)^2}{100} + \frac{(112-100)^2}{100}

= 0.64 + 0.64 + 0.09 + 0.09 + 1.44 + 1.44 = 4.34

df = 6 - 1 = 5, \quad \chi^2_{\text{crit}} (\alpha=0.05) = 11.07

4.34 < 11.07 \Rightarrow \text{Fail to reject } H_0

The die appears fair.

Survey: Gender vs Product Preference (300 people)

	Product A	Product B	Product C	Total
Male	50	40	60	150
Female	30	50	70	150
Total	80	90	130	300

Expected values:

E(\text{Male, A}) = \frac{150 \times 80}{300} = 40

E(\text{Male, B}) = \frac{150 \times 90}{300} = 45

E(\text{Male, C}) = \frac{150 \times 130}{300} = 65

And similarly for Female row.

\chi^2 = \frac{(50-40)^2}{40} + \frac{(40-45)^2}{45} + \frac{(60-65)^2}{65} + \frac{(30-40)^2}{40} + \frac{(50-45)^2}{45} + \frac{(70-65)^2}{65}

= 2.5 + 0.56 + 0.38 + 2.5 + 0.56 + 0.38 = 6.88

df = (2-1)(3-1) = 2, \quad \chi^2_{\text{crit}} (\alpha=0.05) = 5.99

6.88 > 5.99 \Rightarrow \text{Reject } H_0

Gender and product preference are associated.

Same calculation as independence, but tests if distributions are the same across groups.

For test of independence:

V = \sqrt{\frac{\chi^2}{n \times \min(r-1, c-1)}}

For 2×2 tables:

\chi^2 = \sum \frac{(\lvert O - E \rvert - 0.5)^2}{E}

Reduces Type I error for small samples.