Tests for categorical data: goodness-of-fit and test of independence.
Overview
Chi-square tests are used for categorical data to test goodness-of-fit (single variable) and independence (two variables).
Test Statistic
χ2=∑E(O−E)2
Where:
O = observed frequency
E = expected frequency
Goodness-of-Fit Test
Tests whether observed frequencies match expected frequencies.
Hypotheses
H0: The data follows the specified distribution
H1: The data does not follow the specified distribution
Expected Frequencies
E=n×p
Where n = total observations, p = hypothesized proportion
Degrees of Freedom
df=k−1
Where k = number of categories
Test of Independence
Tests whether two categorical variables are associated.
Hypotheses
H0: The variables are independent
H1: The variables are associated (dependent)
Expected Frequencies
E=Grand TotalRow Total×Column Total
Degrees of Freedom
df=(r−1)(c−1)
Where r = rows, c = columns
Conditions
Random sampling
Independent observations
Expected frequencies ≥5 (all cells)
Examples
Example 1: Goodness-of-Fit
Testing if a die is fair (600 rolls):
Face
1
2
3
4
5
6
O
92
108
97
103
88
112
E
100
100
100
100
100
100
χ2=100(92−100)2+100(108−100)2+100(97−100)2+100(103−100)2+100(88−100)2+100(112−100)2=0.64+0.64+0.09+0.09+1.44+1.44=4.34df=6−1=5,χcrit2(α=0.05)=11.074.34<11.07⇒Fail to reject H0