Abstract Syntax Trees: Representing and Traversing Program Structure
After parsing the source code, compilers use Abstract Syntax Trees (ASTs) to represent the hierarchical structure of a program. ASTs are the foundation for most compiler operations like semantic analysis, optimization, and code generation.
What is an Abstract Syntax Tree?
An Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code. Unlike parse trees, ASTs omit syntactic details like parentheses and focus on the essential structure and meaning of the code.
AST Node Types
Common AST node types include:
- Program: The root node representing the entire program
- Statements: Declarations, assignments, conditionals, loops
- Expressions: Binary operations, literals, identifiers, function calls
- Literals: Numbers, strings, booleans
- Identifiers: Variable and function names
Example AST Structure
For the expression (a + b) * c
, the AST would look like:
AST Traversal
There are several ways to traverse an AST:
- Depth-First Traversal: Visit all children before siblings
- Breadth-First Traversal: Visit all nodes at the same level before going deeper
- Visitor Pattern: Define separate visitor objects for each node type
Implementing AST Traversal in JavaScript
Let's implement a visitor pattern to traverse our AST:
class ASTVisitor {
visitProgram(node) {
for (const statement of node.body) {
this.visit(statement);
}
}
visitBinaryExpression(node) {
this.visit(node.left);
this.visit(node.right);
}
visitVariableDeclaration(node) {
if (node.initializer) {
this.visit(node.initializer);
}
}
visitExpressionStatement(node) {
this.visit(node.expression);
}
visitIdentifier(node) {
// Leaf node, nothing to traverse
}
visitLiteral(node) {
// Leaf node, nothing to traverse
}
visitBlockStatement(node) {
for (const statement of node.body) {
this.visit(statement);
}
}
visitGroupExpression(node) {
this.visit(node.expression);
}
visit(node) {
switch (node.type) {
case 'Program': return this.visitProgram(node);
case 'BinaryExpression': return this.visitBinaryExpression(node);
case 'VariableDeclaration': return this.visitVariableDeclaration(node);
case 'ExpressionStatement': return this.visitExpressionStatement(node);
case 'Identifier': return this.visitIdentifier(node);
case 'Literal': return this.visitLiteral(node);
case 'BlockStatement': return this.visitBlockStatement(node);
case 'GroupExpression': return this.visitGroupExpression(node);
default:
throw new Error(`Unknown node type: ${node.type}`);
}
}
}
// Example: AST Printer Visitor
class ASTPrinter extends ASTVisitor {
constructor() {
super();
this.indentLevel = 0;
this.result = '';
}
indent() {
return ' '.repeat(this.indentLevel * 2);
}
visitProgram(node) {
this.result += `${this.indent()}Program\n`;
this.indentLevel++;
super.visitProgram(node);
this.indentLevel--;
}
visitBinaryExpression(node) {
this.result += `${this.indent()}BinaryExpression: ${node.operator}\n`;
this.indentLevel++;
super.visitBinaryExpression(node);
this.indentLevel--;
}
visitVariableDeclaration(node) {
this.result += `${this.indent()}VariableDeclaration: ${node.kind} ${node.name}\n`;
this.indentLevel++;
super.visitVariableDeclaration(node);
this.indentLevel--;
}
visitExpressionStatement(node) {
this.result += `${this.indent()}ExpressionStatement\n`;
this.indentLevel++;
super.visitExpressionStatement(node);
this.indentLevel--;
}
visitIdentifier(node) {
this.result += `${this.indent()}Identifier: ${node.name}\n`;
}
visitLiteral(node) {
this.result += `${this.indent()}Literal: ${node.value}\n`;
}
visitBlockStatement(node) {
this.result += `${this.indent()}BlockStatement\n`;
this.indentLevel++;
super.visitBlockStatement(node);
this.indentLevel--;
}
visitGroupExpression(node) {
this.result += `${this.indent()}GroupExpression\n`;
this.indentLevel++;
super.visitGroupExpression(node);
this.indentLevel--;
}
print(ast) {
this.visit(ast);
return this.result;
}
}
// Example usage
const sourceCode = `
let x = 10;
let y = (x + 5) * 2;
`;
const lexer = new Lexer(sourceCode); // Assuming Lexer is defined
const tokens = lexer.tokenize();
const parser = new Parser(tokens); // Assuming Parser is defined
const ast = parser.parse();
const printer = new ASTPrinter();
console.log(printer.print(ast));
Output:
Program
VariableDeclaration: let x
Literal: 10
VariableDeclaration: let y
BinaryExpression: *
GroupExpression
BinaryExpression: +
Identifier: x
Literal: 5
Literal: 2
Semantic Analysis with AST
A key use of ASTs is for semantic analysis, which checks for errors that syntax alone can't catch:
class SemanticAnalyzer extends ASTVisitor {
constructor() {
super();
this.symbolTable = new Map(); // Track variable declarations
this.errors = [];
}
visitVariableDeclaration(node) {
// Check for redeclaration
if (this.symbolTable.has(node.name)) {
this.errors.push(`Variable '${node.name}' is already declared`);
}
// Add to symbol table
this.symbolTable.set(node.name, {
kind: node.kind,
initialized: node.initializer !== null
});
super.visitVariableDeclaration(node);
}
visitIdentifier(node) {
// Check for undefined variables
if (!this.symbolTable.has(node.name)) {
this.errors.push(`Variable '${node.name}' is not defined`);
}
}
analyze(ast) {
this.visit(ast);
return this.errors;
}
}
// Usage:
const analyzer = new SemanticAnalyzer();
const semanticErrors = analyzer.analyze(ast);
if (semanticErrors.length > 0) {
console.error('Semantic errors found:');
semanticErrors.forEach(error => console.error(` - ${error}`));
} else {
console.log('No semantic errors found.');
}
AST Transformation
ASTs can be transformed for various purposes such as optimization:
class ConstantFoldingOptimizer extends ASTVisitor {
visitBinaryExpression(node) {
// First recursively optimize the children
super.visitBinaryExpression(node);
// If both operands are literals, fold the expression
if (node.left.type === 'Literal' && node.right.type === 'Literal') {
let result;
switch (node.operator) {
case '+': result = node.left.value + node.right.value; break;
case '-': result = node.left.value - node.right.value; break;
case '*': result = node.left.value * node.right.value; break;
case '/': result = node.left.value / node.right.value; break;
default: return node; // Can't fold
}
// Replace the binary expression with a literal
return {
type: 'Literal',
value: result
};
}
return node;
}
visit(node) {
const originalNodeType = node.type;
const optimizedNode = super.visit(node);
// Return the potentially transformed node
return optimizedNode || node;
}
optimize(ast) {
return this.visit(JSON.parse(JSON.stringify(ast))); // Deep clone the AST
}
}
AST Visualization
Common AST Operations
- Symbol Table Construction: Track variables, functions, and their scopes
- Type Checking: Ensure operations are performed on compatible types
- Dead Code Elimination: Remove unreachable or unused code
- Constant Propagation: Replace variables with their constant values
- Code Generation: Translate the AST into target code
Conclusion
Abstract Syntax Trees are a powerful representation of program structure that enables semantic analysis, optimization, and code generation. By using visitor patterns and other traversal techniques, we can manipulate and analyze our program's structure effectively.
In the next section, we'll look at how to generate code from our AST to complete the compilation process.