Overview
Bayes' Theorem allows us to update our beliefs about the probability of an event based on new evidence. It's the foundation of Bayesian statistics and has wide applications in medicine, machine learning, and decision-making.
The Theorem
P(A∣B)=P(B)P(B∣A)×P(A)
Or more explicitly:
P(A∣B)=P(B∣A)×P(A)+P(B∣A′)×P(A′)P(B∣A)×P(A)
Terminology
| Term | Symbol | Description |
|---|
| Prior | P(A) | Initial probability before evidence |
| Likelihood | P(B∣A) | Probability of evidence given A |
| Posterior | P(A∣B) | Updated probability after evidence |
| Marginal | P(B) | Total probability of evidence |
Extended Form
For multiple hypotheses H1,H2,…,Hn:
P(Hi∣E)=∑jP(E∣Hj)×P(Hj)P(E∣Hi)×P(Hi)
The Bayesian Approach
Posterior∝Likelihood×Prior
P(Hypothesis∣Data)∝P(Data∣Hypothesis)×P(Hypothesis)
Key Insights
- Prior matters: Initial beliefs affect the posterior
- Evidence updates: Strong evidence shifts the posterior
- Rare events: Even with good tests, rare events often yield surprising results
- Sequential updating: Can apply Bayes repeatedly as new evidence arrives
Examples
Example 1: Medical Diagnosis
A disease affects 1% of the population. A test is:
- 95% accurate for positive when disease present (sensitivity)
- 90% accurate for negative when disease absent (specificity)
If you test positive, what's the probability you have the disease?
P(D)=0.01(prior: disease rate)
P(D′)=0.99(no disease)
P(+∣D)=0.95(true positive)
P(+∣D′)=0.10(false positive)
P(D∣+)=P(+∣D)×P(D)+P(+∣D′)×P(D′)P(+∣D)×P(D)
P(D∣+)=(0.95×0.01)+(0.10×0.99)0.95×0.01
P(D∣+)=0.0095+0.0990.0095=0.10850.0095≈0.088 or 8.8%
Only 8.8% chance of disease despite positive test—the false positive rate dominates because disease is rare.
Example 2: Spam Filter
- P(spam)=0.30 (30% of emails are spam)
- P("lottery"∣spam)=0.15 (15% of spam contains "lottery")
- P("lottery"∣not spam)=0.01 (1% of legitimate emails contain "lottery")
If an email contains "lottery," probability it's spam:
P(spam∣lottery)=(0.15×0.30)+(0.01×0.70)0.15×0.30
=0.045+0.0070.045=0.0520.045≈0.87 or 87%
Example 3: Two Defective Machines
Machine A produces 60% of items with 2% defective rate.
Machine B produces 40% of items with 5% defective rate.
If an item is defective, probability it came from Machine B:
P(B∣def)=P(def∣A)×P(A)+P(def∣B)×P(B)P(def∣B)×P(B)
=(0.02×0.60)+(0.05×0.40)0.05×0.40=0.012+0.020.02=0.0320.02=0.625 or 62.5%
Common Applications
| Field | Application |
|---|
| Medicine | Diagnostic testing |
| Email | Spam filtering |
| Legal | Evaluating evidence |
| Finance | Risk assessment |
| AI/ML | Naive Bayes classifiers |
| Science | Updating hypotheses |
Base Rate Fallacy
A common error is ignoring the prior (base rate). Example:
If a rare disease test is 99% accurate but the disease affects only 0.1% of people, most positive tests are false positives. Always consider the base rate!