Introduction to WebAssembly and Emscripten

What is Web Assembly?

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. [source]

Yeah, but what is it really?

Wasm means high level languages like C, C++ or Rust running in the browser nicely interoperating with JavaScript. At near native speed!

Why is that interesting?

You might say JavaScript is already super fast, why do we need something even faster? No matter how fast it is, JavaScript will never beat the performance of a compiled C program in many areas. Think about computation heavy stuff like multimedia or cryptography. Or blockchain!

That on one hand means you can hyper-optimize the heavy part of a JavaScript application using Wasm and keep the rest untouched. On the other hand you can take existing high performant C library off the shelf, compile it to WebAssembly and use it without even knowing how to write C code.

First steps

To start we will need something to compile to Wasm and something to compile with. The C/C++ to Wasm compiler toolkit is called Emscripten and can be downloaded from here. Follow the instructions on the same page to set it up.

Let’s also find something to compile! Because I am working on a multimedia project that uses ffmpeg.js (also compiled with Emscripten) for encoding audio data into AAC format I was wondering how easy would it be to port a standalone AAC encoder to the browser. I will use it as an example for the rest of this post.

Any C/C++ project will do do it as long as it uses the standard ./configure; make; make install toolchain like (probably) 99% of the programs written for UNIX/Linux. If you never used these tools here is a brief explanation.

Compiling the project to Wasm

Once Emscripten is set up we can run the compiler on the C/C++ project with the following commands:

# Not every project has autoconf.sh, FDK AAC does
./autoconf.sh
# Note that we run emconfigure ./configure not just ./configure
emconfigure ./configure
# Depending on the project size, this might take a while:
emmake make

These steps might be slightly different for a given project, it is recommended to consult the project documentation for compilation instructions and then substitute ./configure with emconfigure ./configure and make with emmake make.

If the last command succeeds you should have a compiled LLVM bitcode (sort of an intermediary format before the final Wasm binary) somewhere. Figuring out where it exactly is is a bit tricky as it depends on the project setup. Usually the last lines printed from the emmake make command should give a good hint. In my case, the result is at .libs/aac-enc. You can make sure you are looking at the right format by running file on the output file, like in my case:

$ file .libs/aac-enc
.libs/aac-enc: LLVM IR bitcode

The last step is producing a Wasm binary and a JavaScript wrapper from the LLVM binary. It usually goes like this:

emcc project.bc -o project.js

Where project.bc is the binary produced in the previous step. If it does not have a .bc extension, just rename it, otherwise emcc will complain.

If emcc complains about “undefined symbols” you probably have to include the shared dependencies of the “main” binary on the command line. This step requires familiarity with the build tools and/or the project or some fiddling and intuition. If it’s not obvious what the dependencies are try to find more “LLVM IR bitcode” type of files near the root folder of the project. In the case of the FDK AAC project the complete command is:

emcc .libs/aac-enc.bc .libs/libfdk-aac.2.dylib -o aac-enc.js

This should result in a JavaScript file and a Wasm file ready to be executed from Node.js or a web browser. For example the AAC encoder prints out the usage:

$ node aac-enc.js
aac-enc.js [-r bitrate] [-t aot] [-a afterburner] [-s sbr] [-v vbr] in.wav out.aac
Supported AOTs:
  2	AAC-LC
  5	HE-AAC
  29	HE-AAC v2
  23	AAC-LD
  39	AAC-ELD

If you load the same file in a browser using a <script> tag (or open the generated aac-enc.html) you get the same output logged to the console. Cool but not very useful.

Making use of the compiled code

Encoding WAV files with the compiled Wasm code requires a little more work: we need to pass in files and arguments to the executable.

First we need to recompile the module with the File System API exposed:

emcc .libs/aac-enc.bc .libs/libfdk-aac.2.dylib -o aac-enc.js \
  -s EXTRA_EXPORTED_RUNTIME_METHODS=['FS']

With the recompiled module at hand we have to write a bit of glue code. Interacting with the generated code is possible via the exposed Module object.

Add this before loading aac-enc.js to prevent running the encoder with no arguments initially and to actually expose the module:

var Module = {
  noInitialRun: true
}

Note: the generated JavaScript file is neither a CommonJS module nor an AMD or ES6 one. It’s a global that can optionally be created beforehand for passing in configuration options. It is possible to emit a UMD module instead using the -s MODULARIZE=1 flag of the emcc command or -s EXPORT_ES6=1 for ES6 modules. See more in settings.js.

Next we will have to set up an input file and copy it to the “virtual” file system of the module:

Module.FS.writeFile('input.wav', new Uint8Array(inputBuffer))

The second argument to writeFile is the content of the file in the form of a TypedArray. Obtaining inputBuffer depends on the environment and where the input comes from. Examples:

// If you have a File instance eg. from a form input:
inputBuffer = await (new Response(inputFile).arrayBuffer())

// If you have a URL
inputBuffer = fetch(inputUrl).then((res) => res.arrayBuffer())

// In Node.js with a file path
inputBuffer = fs.readFileSync(inputPath)

Now we can invoke the main method of the compiled application:

Module.callMain(['input.wav', 'output.aac'])

And the last step is to read the output back from the file system:

// Read virtual file into a TypedArray
const output = Module.FS.readFile('output.aac')
// Turn the TypedArray into a File and return it
const outputFile = new File([output], 'output.aac', {type: 'audio/aac'})

The outputFile can then be turned into a blob URL with URL.createObjectURL and downloaded from the browser or directly used in an <audio> element as src. When using Node.js output can be directly passed in to fs.writeFile.

The above code is available as a working example here.

Where to continue from here?

The above simple example is only the beginning. In real life you might want to turn the code above into something reusable, which needs a bit more complicated glue to account for the initialization phase of the Wasm module and to cover error scenarios. I started turning the FDK AAC encoder example into a more elaborate library, the work-in-progress state is available here.

Some to-be-ported codebases might do more than read command line arguments and files, eg. use threads, graphics or audio, which needs special treatment. The Emscripten documentation has an entire chapter on porting.

Even in our arguments-and-files use case, we could do better. The underlying library exposes C functions to encode AAC in a buffered way, without using files directly. This has the advantage or more fine grained control of execution and potentially using less memory but the glue code will be more complicated. It also requires some knowledge of C programming. A good start is the Calling compiled C functions from JavaScript section in the Emscripten docs.

It is also important to note that the Emscripten commands above generate non-optimized code. Make sure you read and follow the Optimization section before using Emscripten and Wasm in production.

Further reading