Skip to main content

The Chomsky Hierarchy

Classification of Formal Grammars

The Chomsky hierarchy, introduced by Noam Chomsky in 1956, classifies formal grammars into four types based on their generative power. This classification creates a containment hierarchy where each grammar type generates a corresponding class of formal languages.

Formal Grammars

A formal grammar GG is defined as a 4-tuple:

G=(V,Σ,R,S)G = (V, \Sigma, R, S)

Where:

  • VV is a finite set of non-terminal symbols
  • Σ\Sigma is a finite set of terminal symbols (the alphabet), where VΣ=V \cap \Sigma = \emptyset
  • RR is a finite set of production rules of the form αβ\alpha \rightarrow \beta
  • SVS \in V is the start symbol

The Four Types of Grammars

Type-0: Unrestricted Grammars

  • Production rule form: αβ\alpha \rightarrow \beta where α\alpha contains at least one non-terminal
  • Language class: Recursively enumerable languages
  • Computational model: Turing machine
  • Constraints: None except that the left-hand side must contain at least one non-terminal

Example production: aAbaaBbaAb \rightarrow aaBb

Type-1: Context-Sensitive Grammars

  • Production rule form: αAβαγβ\alpha A \beta \rightarrow \alpha \gamma \beta where γ1|\gamma| \geq 1
  • Language class: Context-sensitive languages
  • Computational model: Linear-bounded automaton
  • Constraints: The length of the right-hand side must be greater than or equal to the length of the left-hand side

Example production: aAbaabbaAb \rightarrow aabb

Type-2: Context-Free Grammars

  • Production rule form: AγA \rightarrow \gamma where AA is a single non-terminal
  • Language class: Context-free languages
  • Computational model: Pushdown automaton
  • Constraints: The left-hand side must be a single non-terminal

Example production: AaAbabA \rightarrow aAb \mid ab

Type-3: Regular Grammars

  • Production rule form: AaBaA \rightarrow aB \mid a where A,BA, B are non-terminals and aa is a terminal
  • Language class: Regular languages
  • Computational model: Finite-state automaton
  • Constraints: The right-hand side must be either a single terminal or a terminal followed by a single non-terminal

Example production: AaBaA \rightarrow aB \mid a

Hierarchy Visualization

Mathematical Relationships

RegularContext-FreeContext-SensitiveRecursively Enumerable\text{Regular} \subset \text{Context-Free} \subset \text{Context-Sensitive} \subset \text{Recursively Enumerable}

Chomsky Normal Form (CNF)

For context-free grammars, the Chomsky Normal Form is a special form where all productions are of the form:

  1. ABCA \rightarrow BC (a non-terminal produces exactly two non-terminals)
  2. AaA \rightarrow a (a non-terminal produces a terminal)
  3. SεS \rightarrow \varepsilon (only if SS doesn't appear on the right-hand side of any production)

Where A,B,CA, B, C are non-terminals, aa is a terminal, and SS is the start symbol.

Every context-free grammar can be transformed into an equivalent grammar in Chomsky Normal Form.

Computational Power and Decidability

Type-0 (Recursively Enumerable)

  • Decidability: The membership problem is undecidable
  • Example problem: The halting problem

Type-1 (Context-Sensitive)

  • Decidability: The membership problem is decidable but PSPACE-complete
  • Example problem: Linear bounded automaton acceptance

Type-2 (Context-Free)

  • Decidability: The membership problem is decidable in polynomial time
  • Example problem: Recognition of programming language syntax

Type-3 (Regular)

  • Decidability: The membership problem is decidable in linear time
  • Example problem: Pattern matching with regular expressions

Applications in Computer Science

  1. Type-3 (Regular):

    • Lexical analysis in compilers
    • Pattern matching
    • Protocol specification
  2. Type-2 (Context-Free):

    • Syntax of programming languages
    • Parsing in compilers
    • XML and HTML document validation
  3. Type-1 (Context-Sensitive):

    • Natural language processing
    • Some programming language features (e.g., declarations before use)
  4. Type-0 (Unrestricted):

    • Theoretical computation models
    • Universal computation

Theoretical Significance

The Chomsky hierarchy provides a framework for understanding:

  1. The expressive power of different formal systems
  2. The computational complexity of language recognition
  3. The trade-off between expressiveness and decidability
  4. The relationship between formal languages and automata

In the next section, we'll explore context-free grammars in greater depth, as they are particularly important in programming language design and compiler construction.