The Chomsky Hierarchy
Classification of Formal Grammars
The Chomsky hierarchy, introduced by Noam Chomsky in 1956, classifies formal grammars into four types based on their generative power. This classification creates a containment hierarchy where each grammar type generates a corresponding class of formal languages.
Formal Grammars
A formal grammar is defined as a 4-tuple:
Where:
- is a finite set of non-terminal symbols
- is a finite set of terminal symbols (the alphabet), where
- is a finite set of production rules of the form
- is the start symbol
The Four Types of Grammars
Type-0: Unrestricted Grammars
- Production rule form: where contains at least one non-terminal
- Language class: Recursively enumerable languages
- Computational model: Turing machine
- Constraints: None except that the left-hand side must contain at least one non-terminal
Example production:
Type-1: Context-Sensitive Grammars
- Production rule form: where
- Language class: Context-sensitive languages
- Computational model: Linear-bounded automaton
- Constraints: The length of the right-hand side must be greater than or equal to the length of the left-hand side
Example production:
Type-2: Context-Free Grammars
- Production rule form: where is a single non-terminal
- Language class: Context-free languages
- Computational model: Pushdown automaton
- Constraints: The left-hand side must be a single non-terminal
Example production:
Type-3: Regular Grammars
- Production rule form: where are non-terminals and is a terminal
- Language class: Regular languages
- Computational model: Finite-state automaton
- Constraints: The right-hand side must be either a single terminal or a terminal followed by a single non-terminal
Example production:
Hierarchy Visualization
Mathematical Relationships
Chomsky Normal Form (CNF)
For context-free grammars, the Chomsky Normal Form is a special form where all productions are of the form:
- (a non-terminal produces exactly two non-terminals)
- (a non-terminal produces a terminal)
- (only if doesn't appear on the right-hand side of any production)
Where are non-terminals, is a terminal, and is the start symbol.
Every context-free grammar can be transformed into an equivalent grammar in Chomsky Normal Form.
Computational Power and Decidability
Type-0 (Recursively Enumerable)
- Decidability: The membership problem is undecidable
- Example problem: The halting problem
Type-1 (Context-Sensitive)
- Decidability: The membership problem is decidable but PSPACE-complete
- Example problem: Linear bounded automaton acceptance
Type-2 (Context-Free)
- Decidability: The membership problem is decidable in polynomial time
- Example problem: Recognition of programming language syntax
Type-3 (Regular)
- Decidability: The membership problem is decidable in linear time
- Example problem: Pattern matching with regular expressions
Applications in Computer Science
-
Type-3 (Regular):
- Lexical analysis in compilers
- Pattern matching
- Protocol specification
-
Type-2 (Context-Free):
- Syntax of programming languages
- Parsing in compilers
- XML and HTML document validation
-
Type-1 (Context-Sensitive):
- Natural language processing
- Some programming language features (e.g., declarations before use)
-
Type-0 (Unrestricted):
- Theoretical computation models
- Universal computation
Theoretical Significance
The Chomsky hierarchy provides a framework for understanding:
- The expressive power of different formal systems
- The computational complexity of language recognition
- The trade-off between expressiveness and decidability
- The relationship between formal languages and automata
In the next section, we'll explore context-free grammars in greater depth, as they are particularly important in programming language design and compiler construction.