Hypothesis TestingTopic #25 of 33

Type I and Type II Errors

False positives and negatives: significance level α, power, and the tradeoff between errors.

Overview

In hypothesis testing, we can make two types of errors when drawing conclusions about a population based on sample data.

Error Types

DecisionH0H_0 TrueH0H_0 False
Reject H0H_0Type I Error (α\alpha)Correct Decision (Power)
Fail to Reject H0H_0Correct DecisionType II Error (β\beta)

Type I Error (α\alpha)

Definition

Rejecting H0H_0 when it is actually true.

Also Called

  • False positive
  • False alarm
  • α\alpha error

Probability

P(Type I Error)=α=significance levelP(\text{Type I Error}) = \alpha = \text{significance level}

Example

Concluding a drug works when it actually doesn't.

Type II Error (β\beta)

Definition

Failing to reject H0H_0 when it is actually false.

Also Called

  • False negative
  • Missed detection
  • β\beta error

Probability

P(Type II Error)=βP(\text{Type II Error}) = \beta

Example

Concluding a drug doesn't work when it actually does.

Power

Definition

The probability of correctly rejecting a false H0H_0.

Power=1β=P(Reject H0H0 is false)\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_0 \text{ is false})

Desirable Values

  • Power 0.80\geq 0.80 is common standard
  • Higher power = better ability to detect effects

Relationships

αβ(for fixed n)\alpha \downarrow \Rightarrow \beta \uparrow \quad \text{(for fixed } n \text{)} nβ(α constant)n \uparrow \Rightarrow \beta \downarrow \quad \text{(} \alpha \text{ constant)} Effect sizeβ\text{Effect size} \uparrow \Rightarrow \beta \downarrow

Visual Representation

       H₀ true           H₁ true
       distribution      distribution
          ↓                   ↓
       ╭───╮              ╭───╮
      ╱     ╲            ╱     ╲
     ╱       ╲          ╱       ╲
────┴─────────┴────────┴─────────┴────
              │                 │
              │    Rejection    │
              │←── Region ─────→│
              Critical Value

Area under H₀ curve in rejection region = α
Area under H₁ curve NOT in rejection region = β

Tradeoff

ChoiceEffect
Lower α\alphaHigher β\beta (less power)
Higher α\alphaLower β\beta (more power)
Larger nnLower β\beta (keeping α\alpha same)

Factors Affecting Power

FactorEffect on Power
Larger sample size (nn)↑ Increases
Larger effect size↑ Increases
Lower variance (σ2\sigma^2)↑ Increases
Higher α\alpha↑ Increases
One-tailed vs Two-tailedOne-tailed has more power

Examples

Example 1: Courtroom Analogy

H0H_0: Defendant is innocent

  • Type I Error: Convicting an innocent person (α\alpha)
  • Type II Error: Acquitting a guilty person (β\beta)

The justice system sets α\alpha very low ("beyond reasonable doubt").

Example 2: Medical Screening

H0H_0: Patient does not have disease

  • Type I Error: False positive (unnecessary treatment)
  • Type II Error: False negative (missed diagnosis)

Which error is worse depends on the disease and treatment.

Example 3: Quality Control

H0H_0: Product meets specifications

  • Type I Error: Rejecting good products (waste)
  • Type II Error: Accepting bad products (customer complaints)

Controlling Errors

To Reduce α\alpha

  • Lower significance level
  • Tradeoff: increases β\beta

To Reduce β\beta (Increase Power)

  • Increase sample size
  • Increase α\alpha (if acceptable)
  • Reduce measurement error
  • Focus on larger effect sizes

Power Analysis

Before conducting a study:

n=f(α,power,effect size,σ)n = f(\alpha, \text{power}, \text{effect size}, \sigma)

Determines required sample size to detect a meaningful effect.

Practical Significance vs Statistical Significance

  • Statistical significance: pαp \leq \alpha
  • Practical significance: Effect is large enough to matter

A very large sample can detect statistically significant but practically unimportant effects.