Definition
A context-free grammar (CFG) is a 4-tuple where:
- = finite set of variables (non-terminals)
- = finite set of terminals (disjoint from )
- = finite set of production rules of form
- (single variable on left)
- (string of variables and terminals)
- = start symbol
Example Grammar
Grammar for balanced parentheses:
Notation Conventions
- Capital letters: variables ()
- Lowercase letters: terminals ()
- : empty string
- : separates alternatives
Derivations
A derivation shows how to generate a string from the start symbol.
One-Step Derivation ()
if is a rule.
Multi-Step Derivation ()
Zero or more derivation steps.
Example
Grammar:
Derivation of "aabb":
Language of a Grammar
All terminal strings derivable from the start symbol.
Context-Free Languages
A language is context-free if some CFG generates it.
Examples
| Language | Grammar |
|---|---|
| Balanced parentheses | |
| Palindromes | |
| , |
Parse Trees
A parse tree visualizes a derivation:
- Root = start symbol
- Internal nodes = variables
- Leaves = terminals (or )
- Children of node match right side of some rule
Yield
Reading leaves left-to-right gives the derived string.
Leftmost and Rightmost Derivations
Leftmost derivation: Always expand the leftmost variable.
Rightmost derivation: Always expand the rightmost variable.
Every parse tree corresponds to exactly one leftmost and one rightmost derivation.
Ambiguity
A grammar is ambiguous if some string has two or more parse trees.
Example: Ambiguous Grammar
String "" has two parse trees:
- (add first)
- (multiply first)
Resolving Ambiguity
Rewrite grammar to enforce precedence/associativity:
Inherently Ambiguous Languages
Some CFLs have no unambiguous grammar.
Example:
Chomsky Normal Form (CNF)
Every rule has form:
- (two variables), or
- (single terminal)
(Start symbol may have )
Conversion to CNF
- Eliminate -rules (except from start)
- Eliminate unit rules ()
- Replace terminals in long rules with variables
- Break long rules into binary rules
Importance
- Simplifies parsing algorithms (CYK)
- Guarantees parsing
Greibach Normal Form (GNF)
Every rule has form:
where is a terminal and is a (possibly empty) string of variables.
Pumping Lemma for CFLs
If is context-free, there exists such that:
For any with , we can write where:
- For all :
Example: is not context-free
Choose . Any pumping produces imbalance.
Closure Properties
| Operation | CFLs Closed? |
|---|---|
| Union | Yes |
| Concatenation | Yes |
| Kleene star | Yes |
| Intersection | No |
| Complement | No |
| Intersection with regular | Yes |
Parsing Algorithms
| Algorithm | Time | Notes |
|---|---|---|
| CYK | Any CFG in CNF | |
| Earley | Any CFG | |
| LL(k) | Restricted grammars | |
| LR(k) | Most programming languages |
Applications
Programming Languages
- Syntax defined by CFGs
- Parsed by compiler front-end
Natural Language Processing
- Sentence structure analysis
- Machine translation
Markup Languages
- HTML, XML structure
- JSON format