Sampling & EstimationTopic #19 of 33

Sampling Distributions

Distribution of sample statistics: sampling variability and the sampling distribution of the mean.

Overview

A sampling distribution is the probability distribution of a statistic (like the sample mean) calculated from all possible samples of a given size from a population.

Key Concepts

TermDefinition
PopulationThe entire group of interest
SampleA subset of the population
ParameterA numerical characteristic of a population (μ\mu, σ\sigma)
StatisticA numerical characteristic of a sample (xˉ\bar{x}, ss)
Sampling distributionDistribution of a statistic over all possible samples

Sampling Distribution of the Mean

If we take all possible samples of size nn from a population and calculate xˉ\bar{x} for each:

Mean of xˉ\bar{x}

μxˉ=μ\mu_{\bar{x}} = \mu

The mean of sample means equals the population mean.

Standard Error of the Mean

σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

The standard deviation of sample means (standard error) decreases as nn increases.

Standard Error

The standard error (SE) measures sampling variability:

SE=σn(if σ known)SE = \frac{\sigma}{\sqrt{n}} \quad \text{(if } \sigma \text{ known)} SE=sn(if σ estimated by s)SE = \frac{s}{\sqrt{n}} \quad \text{(if } \sigma \text{ estimated by } s \text{)}

Interpretation

Smaller SE means:

  • Sample means cluster more tightly around μ\mu
  • Estimates are more precise
  • Larger sample sizes give smaller SE

Properties

PropertyFormula
Mean of xˉ\bar{x}μ\mu
Variance of xˉ\bar{x}σ2/n\sigma^2 / n
Standard Errorσ/n\sigma / \sqrt{n}

Effect of Sample Size

nnSE relative to σ\sigma
1σ\sigma
4σ/2\sigma/2
9σ/3\sigma/3
25σ/5\sigma/5
100σ/10\sigma/10

Quadrupling nn cuts SE in half.

Other Sampling Distributions

Sample Proportion

For proportion p^\hat{p} from samples of size nn:

E(p^)=pE(\hat{p}) = p SE=p(1p)nSE = \sqrt{\frac{p(1-p)}{n}}

Difference of Means

For xˉ1xˉ2\bar{x}_1 - \bar{x}_2 from independent samples:

Mean: μ1μ2\text{Mean: } \mu_1 - \mu_2 SE=σ12n1+σ22n2SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

Sampling Variability

Population
    ↓
┌─────────────────────────────┐
│  Sample 1 → x̄₁             │
│  Sample 2 → x̄₂             │  → Sampling Distribution
│  Sample 3 → x̄₃             │     of x̄
│  ...                         │
│  Sample k → x̄ₖ             │
└─────────────────────────────┘

Each sample gives a different xˉ\bar{x}, creating variability.

Examples

Example 1: Standard Error

Population: μ=100\mu = 100, σ=20\sigma = 20

Sample size n=25n = 25:

SE=2025=205=4SE = \frac{20}{\sqrt{25}} = \frac{20}{5} = 4

Sample size n=100n = 100:

SE=20100=2010=2SE = \frac{20}{\sqrt{100}} = \frac{20}{10} = 2

Example 2: Probability Using SE

Population: μ=500\mu = 500, σ=100\sigma = 100, Sample n=64n = 64

P(xˉ>520)P(\bar{x} > 520)?

SE=10064=12.5SE = \frac{100}{\sqrt{64}} = 12.5 Z=52050012.5=1.6Z = \frac{520 - 500}{12.5} = 1.6 P(Z>1.6)=10.9452=0.0548P(Z > 1.6) = 1 - 0.9452 = 0.0548

Example 3: Required Sample Size

To cut SE in half from current value, need:

nnew=4×ncurrentn_{\text{new}} = 4 \times n_{\text{current}}

To reduce SE from 10 to 5 when σ=50\sigma = 50:

Current: 10=50nn=25\text{Current: } 10 = \frac{50}{\sqrt{n}} \Rightarrow n = 25 Target: 5=50nnewnnew=100\text{Target: } 5 = \frac{50}{\sqrt{n_{\text{new}}}} \Rightarrow n_{\text{new}} = 100

Importance

  1. Inference foundation: Understanding sampling variability enables hypothesis testing and confidence intervals
  2. Precision planning: Calculate required sample sizes
  3. Estimating parameters: Quantify uncertainty in estimates