backend compiler
Writing a whole compiler from scratch is a lot of work. There's so many parts to it from SSA to instruction fitting to register allocation optimisation and constant folding and and and...
I don't really want to do all that. The simplest solution is to transpile the program in to something simple like C. However as we established in the last post we need to our own fancy things when calling functions and returning from them. It's not enough to let C do calls and returns. Not without introducing wild inefficiencies.
Instead we want to pick up some other kind of backend that is lower level than C but high enough level that it's doing SSA and all those optimisations for us. There's two decent contenders out there without going full on LLVM.
- QBE (http://c9x.me/compile/)
- Tilde which is part of Cuik (https://github.com/RealNeGate/Cuik)
QBE comes with an example called 'mini c' while Cuik comes with an example of Forth. They're both quite interesting. QBE is a delightful looking mess in the code base but what it does is absolute magic. Tilde is more traditionally written and probably the better choice.
But I have a soft spot for QBE. It looks like assembler rather than a stack machine. I'm going to indulge myself and pick QBE as the backend initially. That initial might remain so forever.
Why QBE or Tilde instead of LLVM? Well, LLVM is gigantic. Its compilation times are huge and its runtime is big too. I would want the compilation of an stz program to be as instant as possible. These smaller backends promise just that at the cost of some of the crunchier impressive optimisations that LLVM can do.
On to QBE... we are introduced to the basics of it on the homepage:
function w $add(w %a, w %b) { # Define a function add
@start
%c =w add %a, %b # Adds the 2 arguments
ret %c # Return the result
}
export function w $main() { # Main function
@start
%r =w call $add(w 1, w 1) # Call add(1, 1)
call $printf(l $fmt, ..., w %r) # Show the result
ret 0
}
data $fmt = { b "One and one make %d!\n", b 0 }(I'm going to start optimistically tagging code snippets as stz and qbe in case I ever get around to writing a Prismjs syntax highlighter for them)
For the first pass at compilation it makes sense to generate QBE intermediate code and pass that to the qbe compiler for the sake of debugging. The second pass would be to use QBE api to call it directly. It might therefore make sense to have pluggable backends for the stz compiler (stzc).
w here means 32-bit value, while l would mean 64-bit value. There's also s for 32-bit float and d for 64-bit float.
When dig in to the documentation we discover that we cannot jump anywhere but a block inside of a function. Now - we could compile the entire program as one big function but that'd likely cause extremely poor compilation results (assigning variables to registers and avoiding spilling becomes hard when you have a million things in one function).
There are two kinds of results from a closure - continue or return. Instead of passing continuation pointers like I had hoped we will instead need to unwind the stack. When the closure returns we check its special return value for continue (do nothing) or return (jump to the exit label of our function).
This is where things get a little interesting. There's multiple blocks potentially passed to us and we could be N-calls deep before we get back to the original closure in, say, the main method. Passing a closure around therefore has two parameters: the-closure, the-closure-return-state. The closure itself is two variables: the-captured-variables, the-function-pointer.
When we return from a function where we passed a closure we have to check the closure-return-state for that closure and see if it says return. If it does we jump to our @exit label.
Passing an extra parameter might sound expensive but on a 64-bit machine with stdcall you have ~7 parameters before you spill to the stack. QBE doesn't seem to let you specify calling conventions. It'd be awesome if we could make a custom one for stz versus calling foreign functions. But having everything use the platform calling convention means we pick up platform debugger tools for free. That's a good thing.
Let's look at an example that uses a closure. Let's call then:
[ main | () -> int |
true then: [^100]
^200
]When we translate that, by hand, in to QBE we should get something like this
// ^100
type :main_1_return = { l, l }
export function :main_1_return $main_1() {
ret { 100, 1 }
}
export function l $main {
@start
// true then: [^100]
%r =l call $core_boolean_then(l 1, l )
%r_offs =l add %r, 8
%closure_return =l loadl %r
%closure_result =l loadl %r_offs
jnz %closure_return, @exit, @closure_continue
@closure_continue
// ^200
ret 200
@closure_exit
// return closure result
ret %closure_result
}* I have not tested this at all...
Allocations are for the stack in QBE. There's probably all kinds of surprises in here when it comes to trying to implement these things. But given we have to return the closure state that does mean we could implement the same thing in C.
A C backend would allow us to pick up the best state of the art compilation if we wanted it. The least amount of effort to get to a working language is the most desirable. One of the biggest impediments to a working compiler though is that the compiler needs to be an interpreter.
Now that we've determined we will need multiple backends and unless we get real low level we won't be doing continuations we can think about the frontend and what is involved in interpreting stz...