Regression & CorrelationTopic #33 of 33

Inference in Regression

Testing regression coefficients: standard errors, confidence intervals, and hypothesis tests for β.

Overview

Regression inference involves testing hypotheses and constructing confidence intervals for regression coefficients and predictions.

Standard Error of the Slope

SE(β1)=s(xixˉ)2SE(\beta_1) = \frac{s}{\sqrt{\sum(x_i - \bar{x})^2}}

Where ss = standard error of the estimate:

s=SSEn2s = \sqrt{\frac{SSE}{n - 2}}

Hypothesis Test for Slope

Hypotheses

Testing if there's a linear relationship:

  • H0H_0: β1=0\beta_1 = 0 (no linear relationship)
  • H1H_1: β10\beta_1 \neq 0 (linear relationship exists)

Test Statistic

t=β1SE(β1)t = \frac{\beta_1}{SE(\beta_1)}

With df=n2df = n - 2

Decision

If t>tcritical\lvert t \rvert > t_{\text{critical}} or p-value <α< \alpha, reject H0H_0.

Confidence Interval for Slope

β1±tα/2,n2×SE(β1)\beta_1 \pm t_{\alpha/2, n-2} \times SE(\beta_1)

Interpretation: We are (1α)×100%(1-\alpha) \times 100\% confident the true slope is in this interval.

Standard Error of the Intercept

SE(β0)=s×1n+xˉ2(xixˉ)2SE(\beta_0) = s \times \sqrt{\frac{1}{n} + \frac{\bar{x}^2}{\sum(x_i - \bar{x})^2}}

Confidence Interval for Mean Response

For a given x0x_0, the CI for E(YX=x0)E(Y|X = x_0):

y^±tα/2,n2×s×1n+(x0xˉ)2(xixˉ)2\hat{y} \pm t_{\alpha/2, n-2} \times s \times \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}}

Prediction Interval

For a single new observation at x0x_0:

y^±tα/2,n2×s×1+1n+(x0xˉ)2(xixˉ)2\hat{y} \pm t_{\alpha/2, n-2} \times s \times \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}}

Note: Prediction intervals are wider than confidence intervals.

Comparison

Interval TypeWhat It EstimatesWidth
CI for mean responseAverage YY for given XXNarrower
Prediction intervalIndividual YY for given XXWider

ANOVA Approach to Regression

ANOVA Table

SourcedfSSMSF
Regression1SSRMSR=SSR/1MSR = SSR/1MSR/MSEMSR/MSE
Errorn2n-2SSEMSE=SSE/(n2)MSE = SSE/(n-2)
Totaln1n-1SST

F-Test for Overall Significance

F=MSRMSEF = \frac{MSR}{MSE}

For simple regression, F=t2F = t^2 (from slope test)

Example

Regression output:

  • n=20n = 20
  • β1=2.5\beta_1 = 2.5
  • SE(β1)=0.8SE(\beta_1) = 0.8
  • s=3.2s = 3.2
  • (xixˉ)2=100\sum(x_i - \bar{x})^2 = 100

Testing H0:β1=0H_0: \beta_1 = 0

t=2.50.8=3.125t = \frac{2.5}{0.8} = 3.125 df=18,tcrit(α=0.05)=2.101df = 18, \quad t_{\text{crit}} (\alpha = 0.05) = 2.101 3.125>2.101Reject H03.125 > 2.101 \Rightarrow \text{Reject } H_0

Significant linear relationship exists.

95% CI for Slope

2.5±2.101×0.8=2.5±1.68=(0.82,4.18)2.5 \pm 2.101 \times 0.8 = 2.5 \pm 1.68 = (0.82, 4.18)

We're 95% confident the true slope is between 0.82 and 4.18.

Assumptions for Valid Inference

  1. Linearity: Check with residual plot
  2. Independence: Random sampling
  3. Normality of errors: Q-Q plot, Shapiro-Wilk test
  4. Homoscedasticity: Constant variance (residual plot)

Residual Analysis

What to Look For

  • Random scatter: Assumptions met
  • Pattern/curvature: Nonlinearity
  • Funnel shape: Non-constant variance
  • Clusters: Possible subgroups

Standardized Residuals

Standard residual=eis\text{Standard residual} = \frac{e_i}{s}

Values beyond ±2\pm 2 or ±3\pm 3 may be outliers.