Inference in Regression

Overview

Regression inference involves testing hypotheses and constructing confidence intervals for regression coefficients and predictions.

Standard Error of the Slope

SE(\beta_1) = \frac{s}{\sqrt{\sum(x_i - \bar{x})^2}}

Where $s$ = standard error of the estimate:

s = \sqrt{\frac{SSE}{n - 2}}

Hypothesis Test for Slope

Hypotheses

Testing if there's a linear relationship:

$H_0$ : $\beta_1 = 0$ (no linear relationship)
$H_1$ : $\beta_1 \neq 0$ (linear relationship exists)

Test Statistic

t = \frac{\beta_1}{SE(\beta_1)}

With $df = n - 2$

Decision

If $\lvert t \rvert > t_{\text{critical}}$ or p-value $< \alpha$ , reject $H_0$ .

Confidence Interval for Slope

\beta_1 \pm t_{\alpha/2, n-2} \times SE(\beta_1)

Interpretation: We are $(1-\alpha) \times 100\%$ confident the true slope is in this interval.

Standard Error of the Intercept

SE(\beta_0) = s \times \sqrt{\frac{1}{n} + \frac{\bar{x}^2}{\sum(x_i - \bar{x})^2}}

Confidence Interval for Mean Response

For a given $x_0$ , the CI for $E(Y|X = x_0)$ :

\hat{y} \pm t_{\alpha/2, n-2} \times s \times \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}}

Prediction Interval

For a single new observation at $x_0$ :

\hat{y} \pm t_{\alpha/2, n-2} \times s \times \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}}

Note: Prediction intervals are wider than confidence intervals.

Comparison

Interval Type	What It Estimates	Width
CI for mean response	Average $Y$ for given $X$	Narrower
Prediction interval	Individual $Y$ for given $X$	Wider

ANOVA Approach to Regression

ANOVA Table

Source	df	SS	MS	F
Regression	1	SSR	$MSR = SSR/1$	$MSR/MSE$
Error	$n-2$	SSE	$MSE = SSE/(n-2)$
Total	$n-1$	SST

F-Test for Overall Significance

F = \frac{MSR}{MSE}

For simple regression, $F = t^2$ (from slope test)

Example

Regression output:

$n = 20$
$\beta_1 = 2.5$
$SE(\beta_1) = 0.8$
$s = 3.2$
$\sum(x_i - \bar{x})^2 = 100$

Testing $H_0: \beta_1 = 0$

t = \frac{2.5}{0.8} = 3.125

df = 18, \quad t_{\text{crit}} (\alpha = 0.05) = 2.101

3.125 > 2.101 \Rightarrow \text{Reject } H_0

Significant linear relationship exists.

95% CI for Slope

2.5 \pm 2.101 \times 0.8 = 2.5 \pm 1.68 = (0.82, 4.18)

We're 95% confident the true slope is between 0.82 and 4.18.

Assumptions for Valid Inference

Linearity: Check with residual plot
Independence: Random sampling
Normality of errors: Q-Q plot, Shapiro-Wilk test
Homoscedasticity: Constant variance (residual plot)

Residual Analysis

What to Look For

Random scatter: Assumptions met
Pattern/curvature: Nonlinearity
Funnel shape: Non-constant variance
Clusters: Possible subgroups

Standardized Residuals

\text{Standard residual} = \frac{e_i}{s}

Values beyond $\pm 2$ or $\pm 3$ may be outliers.