Binary Encoding

优质
小牛编辑
142浏览
2023-12-01

This document describes the portable binary encoding of the WebAssembly modules.

The binary encoding is a dense representation of module information that enables small files, fast decoding, and reduced memory usage. See the rationale document for more detail.

:unicorn: = Planned future feature

The encoding is split into three layers:

  • Layer 0 is a simple binary encoding of the bytecode instructions and related data structures. The encoding is dense and trivial to interact with, making it suitable for scenarios like JIT, instrumentation tools, and debugging.
  • Layer 1 provides structural compression on top of layer 0, exploiting specific knowledge about the nature of the syntax tree and its nodes. The structural compression introduces more efficient encoding of values, rearranges values within the module, and prunes structurally identical tree nodes.
  • Layer 2 :unicorn: Layer 2 applies generic compression algorithms, like gzip and Brotli, that are already available in browsers and other tooling.

Most importantly, the layering approach allows development and standardization to occur incrementally. For example, Layer 1 and Layer 2 encoding techniques can be experimented with by application-level decompressing to the layer below. As compression techniques stabilize, they can be standardized and moved into native implementations.

See proposed layer 1 compression for a proposal for layer 1 structural compression.

Numbers

uintN

An unsigned integer of N bits, represented in N/8 bytes in little endian order. N is either 8, 16, or 32.

varuintN

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

Note: Currently, the only sizes used are varuint1, varuint7, and varuint32, where the former two are used for compatibility with potential future extensions.

varintN

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

Note: Currently, the only sizes used are varint7, varint32 and varint64.

Instruction Opcodes

In the MVP, the opcodes of instructions are all encoded in a single byte since there are fewer than 256 opcodes. Future features like SIMD and atomics will bring the total count above 256 and so an extension scheme will be necessary, designating one or more single-byte values as prefixes for multi-byte opcodes.

Language Types

All types are distinguished by a negative varint7 values that is the first byte of their encoding (representing a type constructor):

OpcodeType constructor
-0x01 (i.e., the byte 0x7f)i32
-0x02 (i.e., the byte 0x7e)i64
-0x03 (i.e., the byte 0x7d)f32
-0x04 (i.e., the byte 0x7c)f64
-0x10 (i.e., the byte 0x70)anyfunc
-0x20 (i.e., the byte 0x60)func
-0x40 (i.e., the byte 0x40)pseudo type for representing an empty block_type

Some of these will be followed by additional fields, see below.

Note: Gaps are reserved for future extensions. The use of a signed scheme is so that types can coexist in a single space with (positive) indices into the type section, which may be relevant for future extensions of the type system.

value_type

A varint7 indicating a value type. One of:

  • i32
  • i64
  • f32
  • f64

as encoded above.

block_type

A varint7 indicating a block signature. These types are encoded as:

  • either a value_type indicating a signature with a single result
  • or -0x40 (i.e., the byte 0x40) indicating a signature with 0 results.

elem_type

A varint7 indicating the types of elements in a table. In the MVP, only one type is available:

Note: In the future, other element types may be allowed.

func_type

The description of a function signature. Its type constructor is followed by an additional description:

FieldTypeDescription
formvarint7the value for the func type constructor as defined above
param_countvaruint32the number of parameters to the function
param_typesvalue_type*the parameter types of the function
return_countvaruint1the number of results from the function
return_typevalue_type?the result type of the function (if return_count is 1)

Note: In the future, return_count and return_type might be generalised to allow multiple values.

Other Types

global_type

The description of a global variable.

FieldTypeDescription
content_typevalue_typetype of the value
mutabilityvaruint10 if immutable, 1 if mutable

table_type

The description of a table.

FieldTypeDescription
element_typeelem_typethe type of elements
limitsresizable_limitssee below

memory_type

The description of a memory.

FieldTypeDescription
limitsresizable_limitssee below

external_kind

A single-byte unsigned integer indicating the kind of definition being imported or defined:

resizable_limits

A packed tuple that describes the limits of a table or memory:

FieldTypeDescription
flagsvaruint11 if the maximum field is present, 0 otherwise
initialvaruint32initial length (in units of table elements or wasm pages)
maximumvaruint32?only present if specified by flags

Note: In the future, the “flags” field may be changed to varuint32, e.g., to include a flag for sharing between threads.

init_expr

The encoding of an initializer expression is the normal encoding of the expression followed by the end opcode as a delimiter.

Note that get_global in an initializer expression can only refer to immutable imported globals and all uses of init_expr can only appear after the Imports section.

The following documents the current prototype format. This format is based on and supersedes the v8-native prototype format, originally in a public design doc.

High-level structure

The module starts with a preamble of two fields:

FieldTypeDescription
magic numberuint32Magic number 0x6d736100 (i.e., ‘\0asm’)
versionuint32Version number, 0x1

The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte section code that encodes either a known section or a custom section. The section length and payload data then follow. Known sections have non-zero ids, while custom sections have a 0 id followed by an identifying string as part of the payload.

FieldTypeDescription
idvaruint7section code
payload_lenvaruint32size of this section in bytes
name_lenvaruint32 ?length of name in bytes, present if id == 0
namebytes ?section name: valid UTF-8 byte sequence, present if id == 0
payload_databytescontent of this section, of length payload_len - sizeof(name) - sizeof(name_len)

Each known section is optional and may appear at most once. Custom sections all have the same id (0), and can be named non-uniquely (all bytes composing their names may be identical).

Custom sections are intended to be used for debugging information, future evolution, or third party extensions. For MVP, we use a specific custom section (the Name Section) for debugging information.

If a WebAssembly implementation interprets the payload of any custom section during module validation or compilation, errors in that payload must not invalidate the module.

Known sections from the list below may not appear out of order, while custom sections may be interspersed before, between, as well as after any of the elements of the list, in any order. Certain custom sections may have their own ordering and cardinality requirements. For example, the Name section is expected to appear at most once, immediately after the Data section. Violation of such requirements may at most cause an implementation to ignore the section, while not invalidating the module.

The content of each section is encoded in its payload_data.

Section NameCodeDescription
Type1Function signature declarations
Import2Import declarations
Function3Function declarations
Table4Indirect function table and other tables
Memory5Memory attributes
Global6Global declarations
Export7Exports
Start8Start function declaration
Element9Elements section
Code10Function bodies (code)
Data11Data segments

The end of the last present section must coincide with the last byte of the module. The shortest valid module is 8 bytes (magic number, version, followed by zero sections).

Type section

The type section declares all function signatures that will be used in the module.

FieldTypeDescription
countvaruint32count of type entries to follow
entriesfunc_type*repeated type entries as described above

Note: In the future, this section may contain other forms of type entries as well, which can be distinguished by the form field of the type encoding.

Import section

The import section declares all imports that will be used in the module.

FieldTypeDescription
countvaruint32count of import entries to follow
entriesimport_entry*repeated import entries as described below

Import entry

FieldTypeDescription
module_lenvaruint32length of module_str in bytes
module_strbytesmodule name: valid UTF-8 byte sequence
field_lenvaruint32length of field_str in bytes
field_strbytesfield name: valid UTF-8 byte sequence
kindexternal_kindthe kind of definition being imported

Followed by, if the kind is Function:

FieldTypeDescription
typevaruint32type index of the function signature

or, if the kind is Table:

FieldTypeDescription
typetable_typetype of the imported table

or, if the kind is Memory:

FieldTypeDescription
typememory_typetype of the imported memory

or, if the kind is Global:

FieldTypeDescription
typeglobal_typetype of the imported global

Note that, in the MVP, only immutable global variables can be imported.

Function section

The function section declares the signatures of all functions in the module (their definitions appear in the code section).

FieldTypeDescription
countvaruint32count of signature indices to follow
typesvaruint32*sequence of indices into the type section

Table section

The encoding of a Table section:

FieldTypeDescription
countvaruint32indicating the number of tables defined by the module
entriestable_type*repeated table_type entries as described above

In the MVP, the number of tables must be no more than 1.

Memory section

ID: memory

The encoding of a Memory section:

FieldTypeDescription
countvaruint32indicating the number of memories defined by the module
entriesmemory_type*repeated memory_type entries as described above

Note that the initial/maximum fields are specified in units of WebAssembly pages.

In the MVP, the number of memories must be no more than 1.

Global section

The encoding of the Global section:

FieldTypeDescription
countvaruint32count of global variable entries
globalsglobal_variable*global variables, as described below

Global Entry

Each global_variable declares a single global variable of a given type, mutability and with the given initializer.

FieldTypeDescription
typeglobal_typetype of the variables
initinit_exprthe initial value of the global

Note that, in the MVP, only immutable global variables can be exported.

Export section

The encoding of the Export section:

FieldTypeDescription
countvaruint32count of export entries to follow
entriesexport_entry*repeated export entries as described below

Export entry

FieldTypeDescription
field_lenvaruint32length of field_str in bytes
field_strbytesfield name: valid UTF-8 byte sequence
kindexternal_kindthe kind of definition being exported
indexvaruint32the index into the corresponding index space

For example, if the “kind” is Function, then “index” is a function index. Note that, in the MVP, the only valid index value for a memory or table export is 0.

Start section

The start section declares the start function.

FieldTypeDescription
indexvaruint32start function index

Element section

The encoding of the Elements section:

FieldTypeDescription
countvaruint32count of element segments to follow
entrieselem_segment*repeated element segments as described below

a elem_segment is:

FieldTypeDescription
indexvaruint32the table index (0 in the MVP)
offsetinit_expran i32 initializer expression that computes the offset at which to place the elements
num_elemvaruint32number of elements to follow
elemsvaruint32*sequence of function indices

Code section

ID: code

The code section contains a body for every function in the module. The count of function declared in the function section and function bodies defined in this section must be the same and the ith declaration corresponds to the ith function body.

FieldTypeDescription
countvaruint32count of function bodies to follow
bodiesfunction_body*sequence of Function Bodies

Data section

The data section declares the initialized data that is loaded into the linear memory.

FieldTypeDescription
countvaruint32count of data segments to follow
entriesdata_segment*repeated data segments as described below

a data_segment is:

FieldTypeDescription
indexvaruint32the linear memory index (0 in the MVP)
offsetinit_expran i32 initializer expression that computes the offset at which to place the data
sizevaruint32size of data (in bytes)
databytessequence of size bytes

Name section

Custom section name field: "name"

The name section is a custom section. It is therefore encoded with id 0 followed by the name string "name". Like all custom sections, this section being malformed does not cause the validation of the module to fail. It is up to the implementation how it handles a malformed or partially malformed name section. The WebAssembly implementation is also free to choose to read and process this section lazily, after the module has been instantiated, should debugging be required.

The name section may appear only once, and only after the Data section. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the data in this section will be used as the names of functions and locals in the text format.

The name section contains a sequence of name subsections:

FieldTypeDescription
name_typevaruint7code identifying type of name contained in this subsection
name_payload_lenvaruint32size of this subsection in bytes
name_payload_databytescontent of this section, of length name_payload_len

Since name subsections have a given length, unknown or unwanted subsections can be skipped over by an engine. The current list of valid name_type codes are:

Name TypeCodeDescription
Module0Assigns a name to the module
Function1Assigns names to functions
Local2Assigns names to locals in functions

When present, subsections must appear in this order and at most once. The end of the last subsection must coincide with the last byte of the name section to be a well-formed name section.

Module name

The module name subsection assigns a name to the module itself. It simply consists of a single string:

FieldTypeDescription
name_lenvaruint32length of name_str in bytes
name_strbytesUTF-8 encoding of the name

Name Map

In the following subsections, a name_map is encoded as:

FieldTypeDescription
countvaruint32number of naming in names
namesnaming*sequence of naming sorted by index

where a naming is encoded as:

FieldTypeDescription
indexvaruint32the index which is being named
name_lenvaruint32length of name_str in bytes
name_strbytesUTF-8 encoding of the name

Function names

The function names subsection is a name_map which assigns names to a subset of the function index space (both imports and module-defined).

Each function may be named at most once. Naming a function more than once results in the section being malformed.

However, names need not be unique. The same name may be given for multiple functions. This is common for C++ programs where the multiple compilation units that comprise a binary can contain local functions with the same name.

Local names

The local names subsection assigns name_maps to a subset of functions in the function index space (both imports and module-defined). The name_map for a given function assigns names to a subset of local variable indices.

FieldTypeDescription
countvaruint32count of local_names in funcs
funcslocal_names*sequence of local_names sorted by index

where a local_name is encoded as:

FieldTypeDescription
indexvaruint32the index of the function whose locals are being named
local_mapname_mapassignment of names to local indices

Function bodies consist of a sequence of local variable declarations followed by bytecode instructions. Instructions are encoded as an opcode followed by zero or more immediates as defined by the tables below. Each function body must end with the end opcode.

FieldTypeDescription
body_sizevaruint32size of function body to follow, in bytes
local_countvaruint32number of local entries
localslocal_entry*local variables
codebyte*bytecode of the function
endbyte0x0b, indicating the end of the body

Local Entry

Each local entry declares a number of local variables of a given type. It is legal to have several entries with the same type.

FieldTypeDescription
countvaruint32number of local variables of the following type
typevalue_typetype of the variables

Control flow operators (described here)

NameOpcodeImmediatesDescription
unreachable0x00 trap immediately
nop0x01 no operation
block0x02sig : block_typebegin a sequence of expressions, yielding 0 or 1 values
loop0x03sig : block_typebegin a block which can also form control flow loops
if0x04sig : block_typebegin if expression
else0x05 begin else expression of if
end0x0b end a block, loop, or if
br0x0crelative_depth : varuint32break that targets an outer nested block
br_if0x0drelative_depth : varuint32conditional break that targets an outer nested block
br_table0x0esee belowbranch table control flow construct
return0x0f return zero or one value from this function

The sig fields of block and if operators specify function signatures which describe their use of the operand stack.

The br_table operator has an immediate operand which is encoded as follows:

FieldTypeDescription
target_countvaruint32number of entries in the target_table
target_tablevaruint32*target entries that indicate an outer block or loop to which to break
default_targetvaruint32an outer block or loop to which to break in the default case

The br_table operator implements an indirect branch. It accepts an optional value argument (like other branches) and an additional i32 expression as input, and branches to the block or loop at the given offset within the target_table. If the input value is out of range, br_table branches to the default target.

Note: Gaps in the opcode space, here and elsewhere, are reserved for future extensions.

Call operators (described here)

NameOpcodeImmediatesDescription
call0x10function_index : varuint32call a function by its index
call_indirect0x11type_index : varuint32, reserved : varuint1call a function indirect with an expected signature

The call_indirect operator takes a list of function arguments and as the last operand the index into the table. Its reserved immediate is for future use and must be 0 in the MVP.

Parametric operators (described here)

NameOpcodeImmediatesDescription
drop0x1a ignore value
select0x1b select one of two values based on condition

Variable access (described here)

NameOpcodeImmediatesDescription
get_local0x20local_index : varuint32read a local variable or parameter
set_local0x21local_index : varuint32write a local variable or parameter
tee_local0x22local_index : varuint32write a local variable or parameter and return the same value
get_global0x23global_index : varuint32read a global variable
set_global0x24global_index : varuint32write a global variable

Memory-related operators (described here)

NameOpcodeImmediateDescription
i32.load0x28memory_immediateload from memory
i64.load0x29memory_immediateload from memory
f32.load0x2amemory_immediateload from memory
f64.load0x2bmemory_immediateload from memory
i32.load8_s0x2cmemory_immediateload from memory
i32.load8_u0x2dmemory_immediateload from memory
i32.load16_s0x2ememory_immediateload from memory
i32.load16_u0x2fmemory_immediateload from memory
i64.load8_s0x30memory_immediateload from memory
i64.load8_u0x31memory_immediateload from memory
i64.load16_s0x32memory_immediateload from memory
i64.load16_u0x33memory_immediateload from memory
i64.load32_s0x34memory_immediateload from memory
i64.load32_u0x35memory_immediateload from memory
i32.store0x36memory_immediatestore to memory
i64.store0x37memory_immediatestore to memory
f32.store0x38memory_immediatestore to memory
f64.store0x39memory_immediatestore to memory
i32.store80x3amemory_immediatestore to memory
i32.store160x3bmemory_immediatestore to memory
i64.store80x3cmemory_immediatestore to memory
i64.store160x3dmemory_immediatestore to memory
i64.store320x3ememory_immediatestore to memory
current_memory0x3freserved : varuint1query the size of memory
grow_memory0x40reserved : varuint1grow the size of memory

The memory_immediate type is encoded as follows:

NameTypeDescription
flagsvaruint32a bitfield which currently contains the alignment in the least significant bits, encoded as log2(alignment)
offsetvaruint32the value of the offset

As implied by the log2(alignment) encoding, the alignment must be a power of 2. As an additional validation criteria, the alignment must be less or equal to natural alignment. The bits after the log(memory-access-size) least-significant bits must be set to 0. These bits are reserved for future use (e.g., for shared memory ordering requirements).

The reserved immediate to the current_memory and grow_memory operators is for future use and must be 0 in the MVP.

Constants (described here)

NameOpcodeImmediatesDescription
i32.const0x41value : varint32a constant value interpreted as i32
i64.const0x42value : varint64a constant value interpreted as i64
f32.const0x43value : uint32a constant value interpreted as f32
f64.const0x44value : uint64a constant value interpreted as f64

Comparison operators (described here)

NameOpcodeImmediateDescription
i32.eqz0x45  
i32.eq0x46  
i32.ne0x47  
i32.lt_s0x48  
i32.lt_u0x49  
i32.gt_s0x4a  
i32.gt_u0x4b  
i32.le_s0x4c  
i32.le_u0x4d  
i32.ge_s0x4e  
i32.ge_u0x4f  
i64.eqz0x50  
i64.eq0x51  
i64.ne0x52  
i64.lt_s0x53  
i64.lt_u0x54  
i64.gt_s0x55  
i64.gt_u0x56  
i64.le_s0x57  
i64.le_u0x58  
i64.ge_s0x59  
i64.ge_u0x5a  
f32.eq0x5b  
f32.ne0x5c  
f32.lt0x5d  
f32.gt0x5e  
f32.le0x5f  
f32.ge0x60  
f64.eq0x61  
f64.ne0x62  
f64.lt0x63  
f64.gt0x64  
f64.le0x65  
f64.ge0x66  

Numeric operators (described here)

NameOpcodeImmediateDescription
i32.clz0x67  
i32.ctz0x68  
i32.popcnt0x69  
i32.add0x6a  
i32.sub0x6b  
i32.mul0x6c  
i32.div_s0x6d  
i32.div_u0x6e  
i32.rem_s0x6f  
i32.rem_u0x70  
i32.and0x71  
i32.or0x72  
i32.xor0x73  
i32.shl0x74  
i32.shr_s0x75  
i32.shr_u0x76  
i32.rotl0x77  
i32.rotr0x78  
i64.clz0x79  
i64.ctz0x7a  
i64.popcnt0x7b  
i64.add0x7c  
i64.sub0x7d  
i64.mul0x7e  
i64.div_s0x7f  
i64.div_u0x80  
i64.rem_s0x81  
i64.rem_u0x82  
i64.and0x83  
i64.or0x84  
i64.xor0x85  
i64.shl0x86  
i64.shr_s0x87  
i64.shr_u0x88  
i64.rotl0x89  
i64.rotr0x8a  
f32.abs0x8b  
f32.neg0x8c  
f32.ceil0x8d  
f32.floor0x8e  
f32.trunc0x8f  
f32.nearest0x90  
f32.sqrt0x91  
f32.add0x92  
f32.sub0x93  
f32.mul0x94  
f32.div0x95  
f32.min0x96  
f32.max0x97  
f32.copysign0x98  
f64.abs0x99  
f64.neg0x9a  
f64.ceil0x9b  
f64.floor0x9c  
f64.trunc0x9d  
f64.nearest0x9e  
f64.sqrt0x9f  
f64.add0xa0  
f64.sub0xa1  
f64.mul0xa2  
f64.div0xa3  
f64.min0xa4  
f64.max0xa5  
f64.copysign0xa6  

Conversions (described here)

NameOpcodeImmediateDescription
i32.wrap/i640xa7  
i32.trunc_s/f320xa8  
i32.trunc_u/f320xa9  
i32.trunc_s/f640xaa  
i32.trunc_u/f640xab  
i64.extend_s/i320xac  
i64.extend_u/i320xad  
i64.trunc_s/f320xae  
i64.trunc_u/f320xaf  
i64.trunc_s/f640xb0  
i64.trunc_u/f640xb1  
f32.convert_s/i320xb2  
f32.convert_u/i320xb3  
f32.convert_s/i640xb4  
f32.convert_u/i640xb5  
f32.demote/f640xb6  
f64.convert_s/i320xb7  
f64.convert_u/i320xb8  
f64.convert_s/i640xb9  
f64.convert_u/i640xba  
f64.promote/f320xbb  

Reinterpretations (described here)

NameOpcodeImmediateDescription
i32.reinterpret/f320xbc  
i64.reinterpret/f640xbd  
f32.reinterpret/i320xbe  
f64.reinterpret/i640xbf