Regular Expressions | Discrete Mathematics Cheat Sheet

Definition

A regular expression (regex) describes a pattern for matching strings.

Building Blocks

Expression	Matches
$\emptyset$	Nothing (empty language)
$\epsilon$	Empty string only
$a$ (symbol)	Just the string "a"

Operations

Operation	Notation	Meaning
Union	$R_1 \cup R_2$ or $R_1	R_2$
Concatenation	$R_1 R_2$	$R_1$ followed by $R_2$
Kleene Star	$R^*$	Zero or more of $R$

Precedence (highest to lowest)

Kleene star (*)
Concatenation
Union (|)

Use parentheses to override.

Examples

Basic Patterns

Regex	Language
$a$	$\{a\}$
$ab$	$\{ab\}$
$a \cup b$	$\{a, b\}$
$a^*$	$\{\epsilon, a, aa, aaa, \ldots\}$
$(ab)^*$	$\{\epsilon, ab, abab, \ldots\}$
$a^b^$	$\{a^i b^j \mid i, j \geq 0\}$
$(a \cup b)^*$	All strings over $\{a, b\}$

More Complex Examples

Regex	Description
$(0 \cup 1)^* 1$	Binary strings ending in 1
$0^* 1 0^*$	Exactly one 1
$(0 \cup 1)^* 00 (0 \cup 1)^*$	Contains 00
$(01 \cup 10)^*$	Alternating, starting with 0 or 1
$1^* (01^+ )^*$	No consecutive 0s

Additional Operators

These are shorthand (not fundamental):

Operator	Meaning	Equivalent
$R^+$	One or more	$RR^*$
$R?$	Zero or one	$R \cup \epsilon$
$R\{n\}$	Exactly $n$	$RR\cdots R$ ( $n$ times)
$R\{n,m\}$	Between $n$ and $m$	—
$[abc]$	Character class	$a \cup b \cup c$
$.$	Any character	$\Sigma$

Equivalence with Finite Automata

Theorem (Kleene)

A language is regular if and only if it can be described by a regular expression.

Regular expressions ↔ NFAs ↔ DFAs

All three describe exactly the regular languages.

Regex to NFA

Thompson's Construction

Base cases:

$\epsilon$ :

→(i)──ε──→((f))

$a$ :

→(i)──a──→((f))

Union ( $R_1 \cup R_2$ ):

        ε     [N1]     ε
→(i)──────→       ──────→((f))
        ε     [N2]     ε

Concatenation ( $R_1 R_2$ ):

→(i)──→[N1]──→[N2]──→((f))

Kleene Star ( $R^*$ ):

        ε
→(i)──────────────────→((f))
     ε      ε
     ↓      ↑
    [N]──→──┘

NFA to Regex

State Elimination Method

Add new start and accept states
Eliminate states one by one
Update edge labels with regex
Final edge label is the regex

Algebraic Properties

Identities

$R \cup \emptyset = R$ $R \cdot \epsilon = \epsilon \cdot R = R$ $R \cdot \emptyset = \emptyset \cdot R = \emptyset$ $R \cup R = R$ $(R^*)^* = R^*$ $\emptyset^* = \epsilon$ $\epsilon^* = \epsilon$

Distributive Laws

$R(S \cup T) = RS \cup RT$ $(R \cup S)T = RT \cup ST$

Arden's Lemma

If $X = AX \cup B$ where $\epsilon \notin L(A)$ , then: $X = A^* B$

Useful for converting DFAs to regex.

Practical Regex (Programming)

Modern regex engines add features beyond regular languages:

POSIX Character Classes

\d = digit = [0-9]
\w = word character = [a-zA-Z0-9_]
\s = whitespace

Anchors

^ = start of string
$ = end of string
\b = word boundary

Backreferences

(.)\1 matches repeated character (NOT regular!)

Lookahead/Lookbehind

(?=...) = positive lookahead
(?!...) = negative lookahead

Common Patterns

Pattern	Regex
Email	`[\w.]+@[\w.]+\.\w+`
Phone (US)	`\d{3}-\d{3}-\d{4}`
IP address	`\d{1,3}(\.\d{1,3}){3}`
Integer	`-?[0-9]+`
Identifier	`[a-zA-Z_][a-zA-Z0-9_]*`

Limitations

Regular expressions cannot match:

Balanced parentheses: $\{a^n b^n\}$
Palindromes
Nested structures
Anything requiring counting or memory

For these, use context-free grammars/parsers.