Skip to main content

Abstract Syntax Trees: Representing and Traversing Program Structure

After parsing the source code, compilers use Abstract Syntax Trees (ASTs) to represent the hierarchical structure of a program. ASTs are the foundation for most compiler operations like semantic analysis, optimization, and code generation.

What is an Abstract Syntax Tree?

An Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code. Unlike parse trees, ASTs omit syntactic details like parentheses and focus on the essential structure and meaning of the code.

AST Node Types

Common AST node types include:

  • Program: The root node representing the entire program
  • Statements: Declarations, assignments, conditionals, loops
  • Expressions: Binary operations, literals, identifiers, function calls
  • Literals: Numbers, strings, booleans
  • Identifiers: Variable and function names

Example AST Structure

For the expression (a + b) * c, the AST would look like:

AST Traversal

There are several ways to traverse an AST:

  1. Depth-First Traversal: Visit all children before siblings
  2. Breadth-First Traversal: Visit all nodes at the same level before going deeper
  3. Visitor Pattern: Define separate visitor objects for each node type

Implementing AST Traversal in JavaScript

Let's implement a visitor pattern to traverse our AST:

class ASTVisitor {
visitProgram(node) {
for (const statement of node.body) {
this.visit(statement);
}
}

visitBinaryExpression(node) {
this.visit(node.left);
this.visit(node.right);
}

visitVariableDeclaration(node) {
if (node.initializer) {
this.visit(node.initializer);
}
}

visitExpressionStatement(node) {
this.visit(node.expression);
}

visitIdentifier(node) {
// Leaf node, nothing to traverse
}

visitLiteral(node) {
// Leaf node, nothing to traverse
}

visitBlockStatement(node) {
for (const statement of node.body) {
this.visit(statement);
}
}

visitGroupExpression(node) {
this.visit(node.expression);
}

visit(node) {
switch (node.type) {
case 'Program': return this.visitProgram(node);
case 'BinaryExpression': return this.visitBinaryExpression(node);
case 'VariableDeclaration': return this.visitVariableDeclaration(node);
case 'ExpressionStatement': return this.visitExpressionStatement(node);
case 'Identifier': return this.visitIdentifier(node);
case 'Literal': return this.visitLiteral(node);
case 'BlockStatement': return this.visitBlockStatement(node);
case 'GroupExpression': return this.visitGroupExpression(node);
default:
throw new Error(`Unknown node type: ${node.type}`);
}
}
}

// Example: AST Printer Visitor
class ASTPrinter extends ASTVisitor {
constructor() {
super();
this.indentLevel = 0;
this.result = '';
}

indent() {
return ' '.repeat(this.indentLevel * 2);
}

visitProgram(node) {
this.result += `${this.indent()}Program\n`;
this.indentLevel++;
super.visitProgram(node);
this.indentLevel--;
}

visitBinaryExpression(node) {
this.result += `${this.indent()}BinaryExpression: ${node.operator}\n`;
this.indentLevel++;
super.visitBinaryExpression(node);
this.indentLevel--;
}

visitVariableDeclaration(node) {
this.result += `${this.indent()}VariableDeclaration: ${node.kind} ${node.name}\n`;
this.indentLevel++;
super.visitVariableDeclaration(node);
this.indentLevel--;
}

visitExpressionStatement(node) {
this.result += `${this.indent()}ExpressionStatement\n`;
this.indentLevel++;
super.visitExpressionStatement(node);
this.indentLevel--;
}

visitIdentifier(node) {
this.result += `${this.indent()}Identifier: ${node.name}\n`;
}

visitLiteral(node) {
this.result += `${this.indent()}Literal: ${node.value}\n`;
}

visitBlockStatement(node) {
this.result += `${this.indent()}BlockStatement\n`;
this.indentLevel++;
super.visitBlockStatement(node);
this.indentLevel--;
}

visitGroupExpression(node) {
this.result += `${this.indent()}GroupExpression\n`;
this.indentLevel++;
super.visitGroupExpression(node);
this.indentLevel--;
}

print(ast) {
this.visit(ast);
return this.result;
}
}

// Example usage
const sourceCode = `
let x = 10;
let y = (x + 5) * 2;
`;

const lexer = new Lexer(sourceCode); // Assuming Lexer is defined
const tokens = lexer.tokenize();
const parser = new Parser(tokens); // Assuming Parser is defined
const ast = parser.parse();

const printer = new ASTPrinter();
console.log(printer.print(ast));

Output:

Program
VariableDeclaration: let x
Literal: 10
VariableDeclaration: let y
BinaryExpression: *
GroupExpression
BinaryExpression: +
Identifier: x
Literal: 5
Literal: 2

Semantic Analysis with AST

A key use of ASTs is for semantic analysis, which checks for errors that syntax alone can't catch:

class SemanticAnalyzer extends ASTVisitor {
constructor() {
super();
this.symbolTable = new Map(); // Track variable declarations
this.errors = [];
}

visitVariableDeclaration(node) {
// Check for redeclaration
if (this.symbolTable.has(node.name)) {
this.errors.push(`Variable '${node.name}' is already declared`);
}

// Add to symbol table
this.symbolTable.set(node.name, {
kind: node.kind,
initialized: node.initializer !== null
});

super.visitVariableDeclaration(node);
}

visitIdentifier(node) {
// Check for undefined variables
if (!this.symbolTable.has(node.name)) {
this.errors.push(`Variable '${node.name}' is not defined`);
}
}

analyze(ast) {
this.visit(ast);
return this.errors;
}
}

// Usage:
const analyzer = new SemanticAnalyzer();
const semanticErrors = analyzer.analyze(ast);

if (semanticErrors.length > 0) {
console.error('Semantic errors found:');
semanticErrors.forEach(error => console.error(` - ${error}`));
} else {
console.log('No semantic errors found.');
}

AST Transformation

ASTs can be transformed for various purposes such as optimization:

class ConstantFoldingOptimizer extends ASTVisitor {
visitBinaryExpression(node) {
// First recursively optimize the children
super.visitBinaryExpression(node);

// If both operands are literals, fold the expression
if (node.left.type === 'Literal' && node.right.type === 'Literal') {
let result;
switch (node.operator) {
case '+': result = node.left.value + node.right.value; break;
case '-': result = node.left.value - node.right.value; break;
case '*': result = node.left.value * node.right.value; break;
case '/': result = node.left.value / node.right.value; break;
default: return node; // Can't fold
}

// Replace the binary expression with a literal
return {
type: 'Literal',
value: result
};
}

return node;
}

visit(node) {
const originalNodeType = node.type;
const optimizedNode = super.visit(node);

// Return the potentially transformed node
return optimizedNode || node;
}

optimize(ast) {
return this.visit(JSON.parse(JSON.stringify(ast))); // Deep clone the AST
}
}

AST Visualization

Common AST Operations

  1. Symbol Table Construction: Track variables, functions, and their scopes
  2. Type Checking: Ensure operations are performed on compatible types
  3. Dead Code Elimination: Remove unreachable or unused code
  4. Constant Propagation: Replace variables with their constant values
  5. Code Generation: Translate the AST into target code

Conclusion

Abstract Syntax Trees are a powerful representation of program structure that enables semantic analysis, optimization, and code generation. By using visitor patterns and other traversal techniques, we can manipulate and analyze our program's structure effectively.

In the next section, we'll look at how to generate code from our AST to complete the compilation process.