Overview
Correlation measures the strength and direction of the linear relationship between two quantitative variables.
Pearson Correlation Coefficient (r)
Formula
r=∑(xi−xˉ)2×∑(yi−yˉ)2∑(xi−xˉ)(yi−yˉ)
Or equivalently:
r=[n∑xi2−(∑xi)2][n∑yi2−(∑yi)2]n∑xiyi−(∑xi)(∑yi)
Properties
- Range: −1≤r≤1
- r=1: Perfect positive linear relationship
- r=−1: Perfect negative linear relationship
- r=0: No linear relationship
Interpretation
| ∣r∣ | Strength |
|---|
| 0.00 - 0.19 | Very weak |
| 0.20 - 0.39 | Weak |
| 0.40 - 0.59 | Moderate |
| 0.60 - 0.79 | Strong |
| 0.80 - 1.00 | Very strong |
Direction
| r Sign | Direction | Interpretation |
|---|
| r>0 | Positive | As X increases, Y tends to increase |
| r<0 | Negative | As X increases, Y tends to decrease |
Spearman's Rank Correlation (ρ)
For ordinal data or non-linear relationships:
ρ=1−n(n2−1)6∑di2
Where di = difference between ranks of corresponding values.
Assumptions (Pearson)
- Continuous data
- Linear relationship
- Bivariate normality (for inference)
- No significant outliers
Hypothesis Testing
Hypotheses
- H0: ρ=0 (no linear correlation)
- H1: ρ=0 (or one-tailed alternative)
Test Statistic
t=1−r2rn−2
With df=n−2
Examples
Example 1: Calculating r
∑x=25,∑y=35,∑xy=193
∑x2=151,∑y2=271,n=5
r=[5(151)−625][5(271)−1225]5(193)−(25)(35)
=(130)(130)965−875=13090=0.692
Strong positive correlation.
Example 2: Testing Significance
r=0.65, n=20, α=0.05
t=1−0.6520.6520−2=0.57750.65×4.243=0.762.76=3.63
df=18,tcrit=2.101
3.63>2.101⇒Reject H0
Significant correlation exists.
Important Cautions
Correlation ≠ Causation
Just because X and Y are correlated does NOT mean:
- X causes Y
- Y causes X
- There's any causal connection
A third variable may explain both (confounding).
Restricted Range
Limiting the range of X or Y artificially reduces r.
Outliers
Single outliers can dramatically change r.
Nonlinear Relationships
r only measures linear relationships. A perfect curve may have r≈0.
Coefficient of Determination
r2=coefficient of determination
Interpretation: The proportion of variance in Y explained by X.
Example: r=0.8⇒r2=0.64⇒ 64% of variance in Y is explained by X.