Introduction to WebAssembly and Emscripten
What is Web Assembly?
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. [source]
Yeah, but what is it really?
Wasm means high level languages like C, C++ or Rust running in the browser nicely interoperating with JavaScript. At near native speed!
Why is that interesting?
You might say JavaScript is already super fast, why do we need something even faster? No matter how fast it is, JavaScript will never beat the performance of a compiled C program in many areas. Think about computation heavy stuff like multimedia or cryptography. Or blockchain!
That on one hand means you can hyper-optimize the heavy part of a JavaScript application using Wasm and keep the rest untouched. On the other hand you can take existing high performant C library off the shelf, compile it to WebAssembly and use it without even knowing how to write C code.
First steps
To start we will need something to compile to Wasm and something to compile with. The C/C++ to Wasm compiler toolkit is called Emscripten and can be downloaded from here. Follow the instructions on the same page to set it up.
Let’s also find something to compile! Because I am working on a multimedia project that uses ffmpeg.js (also compiled with Emscripten) for encoding audio data into AAC format I was wondering how easy would it be to port a standalone AAC encoder to the browser. I will use it as an example for the rest of this post.
Any C/C++ project will do do it as long as it uses the standard
./configure; make; make install
toolchain like (probably) 99% of the
programs written for UNIX/Linux. If you never used these tools here
is a brief
explanation.
Compiling the project to Wasm
Once Emscripten is set up we can run the compiler on the C/C++ project with the following commands:
# Not every project has autoconf.sh, FDK AAC does
./autoconf.sh
# Note that we run emconfigure ./configure not just ./configure
emconfigure ./configure
# Depending on the project size, this might take a while:
emmake make
These steps might be slightly different for a given project, it is
recommended to consult the project documentation for compilation
instructions and then substitute ./configure
with emconfigure
./configure
and make
with emmake make
.
If the last command succeeds you should have a compiled LLVM bitcode
(sort of an intermediary format before the final Wasm binary)
somewhere. Figuring out where it exactly is is a bit tricky as it
depends on the project setup. Usually the last lines printed from the
emmake make
command should give a good hint. In my case, the result
is at .libs/aac-enc
. You can make sure you are looking at the right
format by running file
on the output file, like in my case:
$ file .libs/aac-enc
.libs/aac-enc: LLVM IR bitcode
The last step is producing a Wasm binary and a JavaScript wrapper from the LLVM binary. It usually goes like this:
emcc project.bc -o project.js
Where project.bc
is the binary produced in the previous step. If it
does not have a .bc
extension, just rename it, otherwise emcc
will
complain.
If emcc
complains about “undefined symbols” you probably have to
include the shared dependencies of the “main” binary on the command
line. This step requires familiarity with the build tools and/or the
project or some fiddling and intuition. If it’s not obvious what the
dependencies are try to find more “LLVM IR bitcode” type of files near
the root folder of the project. In the case of the FDK AAC project the
complete command is:
emcc .libs/aac-enc.bc .libs/libfdk-aac.2.dylib -o aac-enc.js
This should result in a JavaScript file and a Wasm file ready to be executed from Node.js or a web browser. For example the AAC encoder prints out the usage:
$ node aac-enc.js
aac-enc.js [-r bitrate] [-t aot] [-a afterburner] [-s sbr] [-v vbr] in.wav out.aac
Supported AOTs:
2 AAC-LC
5 HE-AAC
29 HE-AAC v2
23 AAC-LD
39 AAC-ELD
If you load the same file in a browser using a <script>
tag (or open
the generated aac-enc.html
) you get the same output logged to the
console. Cool but not very useful.
Making use of the compiled code
Encoding WAV files with the compiled Wasm code requires a little more work: we need to pass in files and arguments to the executable.
First we need to recompile the module with the File System API exposed:
emcc .libs/aac-enc.bc .libs/libfdk-aac.2.dylib -o aac-enc.js \
-s EXTRA_EXPORTED_RUNTIME_METHODS=['FS']
With the recompiled module at hand we have to write a bit of glue
code. Interacting with the generated code is possible via the exposed
Module
object.
Add this before loading aac-enc.js
to prevent running the encoder
with no arguments initially and to actually expose the module:
var Module = {
noInitialRun: true
}
Note: the generated JavaScript file is neither a CommonJS module nor
an AMD or ES6 one. It’s a global that can optionally be created
beforehand for passing in configuration options. It is possible to
emit a UMD module instead using the -s MODULARIZE=1
flag of the
emcc
command or -s EXPORT_ES6=1
for ES6 modules. See more in
settings.js.
Next we will have to set up an input file and copy it to the “virtual” file system of the module:
Module.FS.writeFile('input.wav', new Uint8Array(inputBuffer))
The second argument to writeFile
is the content of the file in the
form of a TypedArray
. Obtaining inputBuffer
depends on the
environment and where the input comes from. Examples:
// If you have a File instance eg. from a form input:
inputBuffer = await (new Response(inputFile).arrayBuffer())
// If you have a URL
inputBuffer = fetch(inputUrl).then((res) => res.arrayBuffer())
// In Node.js with a file path
inputBuffer = fs.readFileSync(inputPath)
Now we can invoke the main
method of the compiled application:
Module.callMain(['input.wav', 'output.aac'])
And the last step is to read the output back from the file system:
// Read virtual file into a TypedArray
const output = Module.FS.readFile('output.aac')
// Turn the TypedArray into a File and return it
const outputFile = new File([output], 'output.aac', {type: 'audio/aac'})
The outputFile
can then be turned into a blob URL with
URL.createObjectURL
and downloaded from the browser or directly used
in an <audio>
element as src
. When using Node.js output
can be
directly passed in to fs.writeFile
.
The above code is available as a working example here.
Where to continue from here?
The above simple example is only the beginning. In real life you might want to turn the code above into something reusable, which needs a bit more complicated glue to account for the initialization phase of the Wasm module and to cover error scenarios. I started turning the FDK AAC encoder example into a more elaborate library, the work-in-progress state is available here.
Some to-be-ported codebases might do more than read command line arguments and files, eg. use threads, graphics or audio, which needs special treatment. The Emscripten documentation has an entire chapter on porting.
Even in our arguments-and-files use case, we could do better. The underlying library exposes C functions to encode AAC in a buffered way, without using files directly. This has the advantage or more fine grained control of execution and potentially using less memory but the glue code will be more complicated. It also requires some knowledge of C programming. A good start is the Calling compiled C functions from JavaScript section in the Emscripten docs.
It is also important to note that the Emscripten commands above generate non-optimized code. Make sure you read and follow the Optimization section before using Emscripten and Wasm in production.