Probability DistributionsTopic #18 of 33

Chi-Square Distribution

Distribution for variance and categorical data: degrees of freedom and critical values.

Overview

The chi-square (χ2\chi^2) distribution is used for inference about population variance and for categorical data analysis, including goodness-of-fit and independence tests.

Definition

If Z1,Z2,,ZkZ_1, Z_2, \ldots, Z_k are independent standard normal random variables:

χ2=Z12+Z22++Zk2\chi^2 = Z_1^2 + Z_2^2 + \cdots + Z_k^2

follows a chi-square distribution with kk degrees of freedom.

Properties

PropertyValue
Range00 to \infty (non-negative)
Meandfdf
Variance2×df2 \times df
SkewnessPositive (right-skewed)
Modedf2df - 2 (for df2df \geq 2)

Shape

df = 2    df = 5    df = 10
  ╲         ╱╲        ╱─╲
   ╲       ╱  ╲      ╱   ╲
────╲─────╱────╲────╱─────╲────
  • Always right-skewed (less so for large dfdf)
  • Becomes more symmetric as dfdf increases

Notation

χα,df2\chi^2_{\alpha, df}

Example: χ0.05,102=18.307\chi^2_{0.05, 10} = 18.307 (right-tail area = 0.05, df=10df = 10)

Critical Values Table

dfdfχ0.9952\chi^2_{0.995}χ0.992\chi^2_{0.99}χ0.052\chi^2_{0.05}χ0.0252\chi^2_{0.025}χ0.012\chi^2_{0.01}
10.0000.0003.8415.0246.635
50.4120.55411.07012.83315.086
102.1562.55818.30720.48323.209
154.6015.22924.99627.48830.578
207.4348.26031.41034.17037.566

Applications

1. Variance Testing

Test statistic for σ2\sigma^2:

χ2=(n1)s2σ02\chi^2 = \frac{(n - 1)s^2}{\sigma_0^2}

With df=n1df = n - 1

2. Goodness-of-Fit Test

Tests if observed frequencies match expected frequencies:

χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

Where:

  • OO = observed frequency
  • EE = expected frequency
  • dfdf = (number of categories) 1- 1

3. Test of Independence

Tests association between categorical variables:

χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

Where:

  • E=row total×column totalgrand totalE = \frac{\text{row total} \times \text{column total}}{\text{grand total}}
  • df=(rows1)×(columns1)df = (\text{rows} - 1) \times (\text{columns} - 1)

Examples

Example 1: Goodness-of-Fit

Testing if a die is fair. Roll 60 times, expect 10 per face.

FaceOOEE(OE)2/E(O-E)^2/E
18100.4
212100.4
39100.1
411100.1
510100.0
610100.0
χ2=1.0,df=61=5\chi^2 = 1.0, \quad df = 6 - 1 = 5 Critical value χ0.05,52=11.07\text{Critical value } \chi^2_{0.05, 5} = 11.07 1.0<11.07fail to reject (die appears fair)1.0 < 11.07 \Rightarrow \text{fail to reject (die appears fair)}

Example 2: Independence Test

Survey of 200 people on product preference by gender:

Product AProduct BTotal
Male4060100
Female6040100
Total100100200

Expected (for each cell): E=100×100200=50E = \frac{100 \times 100}{200} = 50

χ2=(4050)250+(6050)250+(6050)250+(4050)250\chi^2 = \frac{(40-50)^2}{50} + \frac{(60-50)^2}{50} + \frac{(60-50)^2}{50} + \frac{(40-50)^2}{50} =2+2+2+2=8= 2 + 2 + 2 + 2 = 8 df=(21)(21)=1,χ0.05,12=3.841df = (2-1)(2-1) = 1, \quad \chi^2_{0.05, 1} = 3.841 8>3.841reject H0 (preference depends on gender)8 > 3.841 \Rightarrow \text{reject } H_0 \text{ (preference depends on gender)}

Example 3: Variance Test

Testing H0:σ2=25H_0: \sigma^2 = 25 vs H1:σ225H_1: \sigma^2 \neq 25

Sample: n=20n = 20, s2=40s^2 = 40

χ2=(201)(40)25=30.4,df=19\chi^2 = \frac{(20-1)(40)}{25} = 30.4, \quad df = 19 Lower critical: χ0.975,192=8.907\text{Lower critical: } \chi^2_{0.975, 19} = 8.907 Upper critical: χ0.025,192=32.852\text{Upper critical: } \chi^2_{0.025, 19} = 32.852 8.907<30.4<32.852fail to reject H08.907 < 30.4 < 32.852 \Rightarrow \text{fail to reject } H_0

Assumptions

  • Random sampling
  • Independence of observations
  • Expected frequencies 5\geq 5 (for categorical tests)