Structure of Our Compiler

So, now that we have a so-so idea of the language we are compiling, let's design our compiler and think about how we want to implement it.

A basic compiler consists of several pieces. These pieces can be very independent, or very intertwined. I'll try an independent approach here...

The compiler will take an input source file, and output some sort of executable. Let's break it down into tasks:

The approach I'll take is to generate a pseudo-machine code, and we'll write a simple interpreter at the end. This approach is similar to the Pascal P-Code machines and the current Java Virtual Machine approach...

Lexical Analysis

First, the compiler will read the input file and lump the characters together into tokens. Using PCCTS, this job is done by DLG. We'll write a DLG specification to set up our tokens.

Semantic Analysis

Next, we write grammar rules to pump into ANTLR. These rules will have action code (C++ code) attached to them to specify what to do when we see certain patterns of tokens in the input file.

Tree Generation

This actually occurs along with the semantic analysis. We'll generate an Abstract Syntax Tree (AST) using PCCTS' built-in tree generation routines. This tree will act as the communication device between the parser and the code generator.

Code Generation

Once we have a tree, we'll walk it and write out code. (SAS note: Should I use Sorcerer, or just a tree walk, or both? -- TBD... decide when get to that point)


The interpreter for XL is really simple, and will allow us to test our compiled output without needing to learn specific machine code.

Structure of this Tutorial

Now how will we build this compiler? Should we try to do everything at once so you get good and confused? Of course not!

We'll break the work up into steps:

  1. Build a Recognizer
  2. Add a symbol table
  3. Add type checking
  4. Build an AST
  5. Write a Tree Walker to generate code
  6. Write an Interpreter
  7. Test the output code

Let's start our recognizer!