The Lobster WASM code generator implementation

NOTE: this document contains detail on how code generation for the WebAssembly backend works internally, if you just want to USE this backend, please read the sections on WebAssembly here.

This document is also meant to be useful for implementors of other languages that may wish to do something similar (and could re-use Lobster's WASM binary writer ).

Structure of the backend

Lobster is structured such that the front-end outputs Lobster bytecode, which sits in a FlatBuffer file together with all its metadata. That file is stand-alone (does not need any further information from the front-end or source code) and can be interpreted by the VM directly, or passed to any of the native code backends, currently C++ or WebAssembly.

The backend is composed of these 2 files:

Design of the binary writer

Most languages, like Lobster, come with a runtime in C or C++ that you wouldn't want to translate to wasm by hand (or worse, emit instruction by instruction from a backend), so it is important that generated wasm code can be linked correctly against a runtime compiled to object files by the C/C++ compiler. In this case we're piggy-backing on Clang, LLD and Binaryen as used in the Emscripten toolchain.

This means generating a wasm module with linking information, and supplying the right relocations. Many instructions in wasm refer to functions and other things by index, but the linker has to merge many such index spaces into the final module, which means renumbering all of these indices. Luckily, the binary writer takes care of all this automatically, and allows us the generate code as-if we're the only object file.

Streaming interface.

The binary writer is a "streaming" API (as opposed to an object API), meaning that it immediately writes out the wasm module data with no or few intermediary data structures. This has the advantage that it is low in memory usage and is very fast, with the downside being that most things have to be supplied to it in order. This should not be very difficult for most code generators, however.

A wasm module should contain sections in order, and the API enforces this same order:

Type     1    Function signature declarations
Import   2    Import declarations
Function 3    Function declarations
Table    4    Indirect function table and other tables
Memory   5    Memory attributes
Global   6    Global declarations
Export   7    Exports
Start    8    Start function declaration
Element  9    Elements section
Code    10    Function bodies (code)
Data    11    Data segments

What is not shown here is custom sections, which may appear before and after each of these sections. One such custom section is the linking section, which appears at the end of all of the above. Its contents is generated automatically by the binary writer as a consequence of all the things emitted in the above section (primarily the Code section).

Simple example.

A minimal usage of the API, where we generate code for "hello world", assuming the printing function is in the runtime.

Note that this assumes a minimum familiarity with wasm in general. I am not going to explain the "why" of this design, if you end up using this API then reading up on the wasm module format would be useful.

vector<uint8_t> bytes;
WASM::BinaryWriter bw(bytes);

// Write a (function) type section, to be referred to by functions below.
// For any of these sections, if you write them out of order, or don't match
// begin/end, you'll get an assert.
// As with everything, to refer to things in wasm, use a 0 based index.
// A list of arguments followed by a list of return values.
// You don't have to use the return value of AddType, but it may make referring
// to these types easier.
auto type_ii_i = bw.AddType({ WASM::I32, WASM::I32 }, { WASM::I32 });  // 0
auto type_i_v = bw.AddType({ WASM::I32 }, {});  // 1

// Import some functions, from the runtime compiled in other object files.
// For our example that will just be the printing function.
// Note: we assume this function has been declared with: extern "C"
// You can link against C++ functions as well if you don't mind dealing
// with name mangling.
auto import_print = bw.AddImportLinkFunction("print", type_i_v);  // 0

// Declare all the functions we will generate. Note this is just the type,
// the body of the code will follow below.
bw.AddFunction(type_ii_i);  // main()

// Declare the linear memory we want to use, with 1 initial page.

// Here we'd normally declare a "Start" section, but the linker will
// take care for that for us.

// Now the exciting part: emitting function bodies.

// A list of 0 local types,
bw.AddCode({}, "main", false);
// Refer to data segment 0 at offset 0 below. This emits an i32.const
// instruction, whose immediate value will get relocated to refer to the
// data correctly.
bw.EmitI32ConstDataRef(0, 0);
bw.EmitI32Const(0);  // Return value.

// Here, call AddCode..EmitEndFunction for more functions.


// Add all our static data.
// This is our first segment, we referred to this above as 0.
auto hello = "Hello, World\n\0"sv;
// Data, name, and alignment.
bw.AddData(hello, "hello", 0);

// This call does all the remaining work of generating the linking
// information, and wrapping up the file.

// Here, we can write the contents of "bytes" above to a file.
// Then, using emcc to link print.c and generated.o should actually
// produce a valid module!

The binary writer API contains functionality for many more instructions and sections, best to have a browse through wasm_binary_writer.h and wasm_binary_writer_test.h which are both fairly small, so should give a good impression what else you can generate.

The Lobster generator.

A more complex example using the binary writer is the Lobster generator in towasm.cpp, which follows the same pattern as the above simple example.

In terms of imports, it imports one function for each of Lobsters bytecodes, which means a Lobster bytecode can be directly mapped to a function id. Then there are a couple of additional functions in the runtime it needs, most importantly EngineRunCompiledCodeMain which it calls with the starting compiled function, such that runtime can initialize itself and then call back into the Lobster compiled code. This structure was chosen such that the maximum amount of code can live in the runtime, and the minimum needs to be explicitly generated.

It generates Lobster code not on a per function basis, but per basic block. This is because Lobster has more complicated control flow features (such as non-local returns and co-routines) which (at the time of writing) wasm can't support natively. So it uses a "trampoline" system where each basic block returns the function address of the next basic block it wants to go to. Some of these transitions can be optimized (and the generator does use tail calls in many cases), but in the most general case this has to be an indirect jump.

The actual instructions generated are pretty simple, with it mostly emitting a call per VM instruction, with the exception of control flow operations which have more special purpose implementation.

You would expect it to directly emit wasm instructions like i32.add for simple Lobster bytecode instructions like IL_IADD, but it doesn't, for various reasons. First, we rely on Binaryen to do whole program optimisation (similar to LTO) after linking (since the we're calling the implementation of bytecode instruction that sits in a different object file). This means that if the implementation of a bytecode instruction is trivial, it will get inlined, saving us the trouble of doing so. The second reason is more complicated: Lobster currently has all of its variables on its own stack (where it can do custom management of it, required for e.g. its memory management, non local return features). Instructions like i32.add instead expects values on the wasm stack. So maximum benefit of emitting such instructions directly would only be achieved if (a subset of) Lobster variables could be moved into wasm locals first. That is a more extensive optimization that may be added in the future.

Speed of code produced by this backend is almost competitive with code that instead goes through the C++ backend (which is quite a feat, as the C++ code goes through the entire LLVM optimisation pipeline, and the WASM backend does not). One area where it can seriously slow down is exceptions, which are currently not natively implemented in WASM (they go through an indirection in JS) and the Lobster runtime relies on them for error handling, and even frame termination. This problem will be solved when WASM gains native exceptions, or when the runtime is able to execute without exceptions (which would cause some overhead of its own).