Sampling & EstimationTopic #20 of 33

Central Limit Theorem

The foundation of inference: sample means approach normality regardless of population shape.

Overview

The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population's distribution.

Statement

For a population with mean μ\mu and finite variance σ2\sigma^2, the sampling distribution of xˉ\bar{x}:

xˉN(μ,σ2n)as n\bar{x} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \quad \text{as } n \to \infty

Or equivalently, the standardized mean:

Z=xˉμσ/nN(0,1)Z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}} \sim N(0, 1)

Conditions

  1. Independence: Observations are independent
  2. Sample size: nn is "large enough" (typically n30n \geq 30)
  3. Finite variance: Population has finite mean and variance

How Large is "Large Enough"?

Population ShapeRecommended nn
NormalAny nn
Symmetricn15n \geq 15
Moderate skewnessn30n \geq 30
Highly skewedn50n \geq 50 or more

Visual Demonstration

Population (any shape):
    The CLT says: For large n,
     /\         the sampling distribution
    /  \        of x̄ becomes:
___/    \___          ∩
                    ╱    ╲
                  ╱        ╲
                ─┴──────────┴─
                   Normal

Key Points

  1. Works for any shape: Population can be skewed, uniform, bimodal, etc.
  2. Larger n → better approximation: More samples mean closer to normal
  3. Focus on xˉ\bar{x}, not XX: Individual observations keep the population distribution
  4. Foundation of inference: Enables z-tests and confidence intervals

Applications

Confidence Intervals

When nn is large:

xˉ±zα/2×σn\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}

Hypothesis Testing

Test statistic (large nn):

Z=xˉμ0σ/nZ = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}

This ZZ follows N(0,1)N(0,1) approximately by CLT.

Examples

Example 1: Non-Normal Population

Die rolls (uniform distribution, μ=3.5\mu = 3.5, σ=1.71\sigma = 1.71):

For n=40n = 40 die rolls, find P(xˉ>3.7)P(\bar{x} > 3.7):

SE=1.7140=0.27SE = \frac{1.71}{\sqrt{40}} = 0.27 Z=3.73.50.27=0.74Z = \frac{3.7 - 3.5}{0.27} = 0.74 P(Z>0.74)=10.7704=0.23P(Z > 0.74) = 1 - 0.7704 = 0.23

Example 2: Skewed Population (Incomes)

Income: \mu = \60{,}000,, \sigma = $25{,}000$ (right-skewed)

For n=100n = 100, find P(\bar{x} < \55{,}000)$:

SE=25000100=2500SE = \frac{25000}{\sqrt{100}} = 2500 Z=55000600002500=2.0Z = \frac{55000 - 60000}{2500} = -2.0 P(Z<2.0)=0.0228P(Z < -2.0) = 0.0228

Example 3: Sum of Random Variables

By CLT, for large nn, the sum Sn=X1+X2++XnS_n = X_1 + X_2 + \cdots + X_n:

SnN(nμ,nσ2)approximatelyS_n \sim N(n\mu, n\sigma^2) \quad \text{approximately}

Applied to insurance claims, total sales, etc.

Relationship to Other Concepts

ConceptCLT Connection
Standard ErrorSE=σ/nSE = \sigma/\sqrt{n} comes from CLT
Confidence IntervalsJustified by CLT
Z-testsWork because of CLT
Sample SizeLarger nn means CLT applies better

Common Misconceptions

MisconceptionReality
"Population becomes normal"No, only xˉ\bar{x}'s distribution does
"Works for any nn"Need nn large enough
"Individual values are normal"No, only sample means are
"Exact normality"It's an approximation

Why It Matters

  1. Enables inference: Most statistical procedures rely on CLT
  2. Practical flexibility: Don't need to know population distribution
  3. Universal application: Works for many types of data
  4. Quality control: Basis for control charts
  5. Polling/surveys: Justifies sample-based conclusions