Overview
The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population's distribution.
Statement
For a population with mean and finite variance , the sampling distribution of :
Or equivalently, the standardized mean:
Conditions
- Independence: Observations are independent
- Sample size: is "large enough" (typically )
- Finite variance: Population has finite mean and variance
How Large is "Large Enough"?
| Population Shape | Recommended |
|---|---|
| Normal | Any |
| Symmetric | |
| Moderate skewness | |
| Highly skewed | or more |
Visual Demonstration
Population (any shape):
The CLT says: For large n,
/\ the sampling distribution
/ \ of x̄ becomes:
___/ \___ ∩
╱ ╲
╱ ╲
─┴──────────┴─
Normal
Key Points
- Works for any shape: Population can be skewed, uniform, bimodal, etc.
- Larger n → better approximation: More samples mean closer to normal
- Focus on , not : Individual observations keep the population distribution
- Foundation of inference: Enables z-tests and confidence intervals
Applications
Confidence Intervals
When is large:
Hypothesis Testing
Test statistic (large ):
This follows approximately by CLT.
Examples
Example 1: Non-Normal Population
Die rolls (uniform distribution, , ):
For die rolls, find :
Example 2: Skewed Population (Incomes)
Income: \mu = \60{,}000\sigma = $25{,}000$ (right-skewed)
For , find P(\bar{x} < \55{,}000)$:
Example 3: Sum of Random Variables
By CLT, for large , the sum :
Applied to insurance claims, total sales, etc.
Relationship to Other Concepts
| Concept | CLT Connection |
|---|---|
| Standard Error | comes from CLT |
| Confidence Intervals | Justified by CLT |
| Z-tests | Work because of CLT |
| Sample Size | Larger means CLT applies better |
Common Misconceptions
| Misconception | Reality |
|---|---|
| "Population becomes normal" | No, only 's distribution does |
| "Works for any " | Need large enough |
| "Individual values are normal" | No, only sample means are |
| "Exact normality" | It's an approximation |
Why It Matters
- Enables inference: Most statistical procedures rely on CLT
- Practical flexibility: Don't need to know population distribution
- Universal application: Works for many types of data
- Quality control: Basis for control charts
- Polling/surveys: Justifies sample-based conclusions