Overview
The chi-square (χ2) distribution is used for inference about population variance and for categorical data analysis, including goodness-of-fit and independence tests.
Definition
If Z1,Z2,…,Zk are independent standard normal random variables:
χ2=Z12+Z22+⋯+Zk2
follows a chi-square distribution with k degrees of freedom.
Properties
| Property | Value |
|---|
| Range | 0 to ∞ (non-negative) |
| Mean | df |
| Variance | 2×df |
| Skewness | Positive (right-skewed) |
| Mode | df−2 (for df≥2) |
Shape
df = 2 df = 5 df = 10
╲ ╱╲ ╱─╲
╲ ╱ ╲ ╱ ╲
────╲─────╱────╲────╱─────╲────
- Always right-skewed (less so for large df)
- Becomes more symmetric as df increases
Notation
χα,df2
Example: χ0.05,102=18.307 (right-tail area = 0.05, df=10)
Critical Values Table
| df | χ0.9952 | χ0.992 | χ0.052 | χ0.0252 | χ0.012 |
|---|
| 1 | 0.000 | 0.000 | 3.841 | 5.024 | 6.635 |
| 5 | 0.412 | 0.554 | 11.070 | 12.833 | 15.086 |
| 10 | 2.156 | 2.558 | 18.307 | 20.483 | 23.209 |
| 15 | 4.601 | 5.229 | 24.996 | 27.488 | 30.578 |
| 20 | 7.434 | 8.260 | 31.410 | 34.170 | 37.566 |
Applications
1. Variance Testing
Test statistic for σ2:
χ2=σ02(n−1)s2
With df=n−1
2. Goodness-of-Fit Test
Tests if observed frequencies match expected frequencies:
χ2=∑E(O−E)2
Where:
- O = observed frequency
- E = expected frequency
- df = (number of categories) −1
3. Test of Independence
Tests association between categorical variables:
χ2=∑E(O−E)2
Where:
- E=grand totalrow total×column total
- df=(rows−1)×(columns−1)
Examples
Example 1: Goodness-of-Fit
Testing if a die is fair. Roll 60 times, expect 10 per face.
| Face | O | E | (O−E)2/E |
|---|
| 1 | 8 | 10 | 0.4 |
| 2 | 12 | 10 | 0.4 |
| 3 | 9 | 10 | 0.1 |
| 4 | 11 | 10 | 0.1 |
| 5 | 10 | 10 | 0.0 |
| 6 | 10 | 10 | 0.0 |
χ2=1.0,df=6−1=5
Critical value χ0.05,52=11.07
1.0<11.07⇒fail to reject (die appears fair)
Example 2: Independence Test
Survey of 200 people on product preference by gender:
| Product A | Product B | Total |
|---|
| Male | 40 | 60 | 100 |
| Female | 60 | 40 | 100 |
| Total | 100 | 100 | 200 |
Expected (for each cell): E=200100×100=50
χ2=50(40−50)2+50(60−50)2+50(60−50)2+50(40−50)2
=2+2+2+2=8
df=(2−1)(2−1)=1,χ0.05,12=3.841
8>3.841⇒reject H0 (preference depends on gender)
Example 3: Variance Test
Testing H0:σ2=25 vs H1:σ2=25
Sample: n=20, s2=40
χ2=25(20−1)(40)=30.4,df=19
Lower critical: χ0.975,192=8.907
Upper critical: χ0.025,192=32.852
8.907<30.4<32.852⇒fail to reject H0
Assumptions
- Random sampling
- Independence of observations
- Expected frequencies ≥5 (for categorical tests)