Descriptive StatisticsTopic #2 of 33

Variance and Standard Deviation

Measures of spread and dispersion: population vs sample variance, standard deviation interpretation.

Overview

Variance and standard deviation measure the spread or dispersion of data around the mean. They tell us how much individual values typically differ from the average.

Key Concepts

MeasureSymbolDescription
Varianceσ2\sigma^2 (population), s2s^2 (sample)Average squared deviation
Standard Deviationσ\sigma (population), ss (sample)Square root of variance

Key Formulas

Population Variance

σ2=(xiμ)2N\sigma^2 = \frac{\sum (x_i - \mu)^2}{N}

Sample Variance

s2=(xixˉ)2n1s^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}

Note: We divide by (n1)(n-1) for samples (Bessel's correction) to get an unbiased estimate.

Population Standard Deviation

σ=(xiμ)2N\sigma = \sqrt{\frac{\sum (x_i - \mu)^2}{N}}

Sample Standard Deviation

s=(xixˉ)2n1s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n - 1}}

Computational Formulas

These are algebraically equivalent but often easier to calculate:

Variance (Computational Form)

s2=xi2(xi)2nn1s^2 = \frac{\sum x_i^2 - \frac{(\sum x_i)^2}{n}}{n - 1}

Standard Deviation

s=xi2(xi)2nn1s = \sqrt{\frac{\sum x_i^2 - \frac{(\sum x_i)^2}{n}}{n - 1}}

Calculation Steps

  1. Calculate the mean (xˉ\bar{x})
  2. Find each deviation from the mean (xixˉx_i - \bar{x})
  3. Square each deviation
  4. Sum the squared deviations
  5. Divide by n1n-1 (sample) or NN (population)
  6. For SD, take the square root

Properties

Variance Properties

  • Always non-negative (σ20\sigma^2 \geq 0)
  • Zero only when all values are identical
  • Units are squared (e.g., meters² for length data)
  • More sensitive to outliers than range

Standard Deviation Properties

  • Same units as the original data
  • Approximately 68% of data within 1 SD of mean (for normal distributions)
  • Approximately 95% within 2 SD
  • Approximately 99.7% within 3 SD

Transformations

For Y=aX+bY = aX + b:

Var(Y)=a2Var(X)\text{Var}(Y) = a^2 \cdot \text{Var}(X) SD(Y)=aSD(X)\text{SD}(Y) = \lvert a \rvert \cdot \text{SD}(X)

Note: Adding a constant doesn't change the spread.

Coefficient of Variation

A relative measure of variability:

CV=sxˉ×100%CV = \frac{s}{\bar{x}} \times 100\%

Useful for comparing variability across datasets with different units or scales.

Examples

Example 1: Basic Calculation

Data: 2, 4, 4, 4, 5, 5, 7, 9

xˉ=408=5\bar{x} = \frac{40}{8} = 5
xix_ixixˉx_i - \bar{x}(xixˉ)2(x_i - \bar{x})^2
2-39
4-11
4-11
4-11
500
500
724
9416
Sum032
s2=3281=327=4.57s^2 = \frac{32}{8-1} = \frac{32}{7} = 4.57 s=4.57=2.14s = \sqrt{4.57} = 2.14

Example 2: Computational Formula

Same data: 2, 4, 4, 4, 5, 5, 7, 9

xi=40,xi2=4+16+16+16+25+25+49+81=232\sum x_i = 40, \quad \sum x_i^2 = 4 + 16 + 16 + 16 + 25 + 25 + 49 + 81 = 232 s2=23240287=2322007=327=4.57s^2 = \frac{232 - \frac{40^2}{8}}{7} = \frac{232 - 200}{7} = \frac{32}{7} = 4.57

Example 3: Comparing Datasets

DatasetMeanSDCV
A1001515%
B501020%

Dataset B has higher relative variability despite lower absolute SD.

Population vs Sample

AspectPopulationSample
Symbolσ2\sigma^2, σ\sigmas2s^2, ss
DivisorNNn1n-1
Meanμ\muxˉ\bar{x}
UseKnown full populationEstimating from sample