Binary Encoding

优质

小牛编辑

148浏览

2023-12-01

This document describes the portable binary encoding of the WebAssembly modules.

The binary encoding is a dense representation of module information that enables small files, fast decoding, and reduced memory usage. See the rationale document for more detail.

:unicorn: = Planned future feature

The encoding is split into three layers:

Layer 0 is a simple binary encoding of the bytecode instructions and related data structures. The encoding is dense and trivial to interact with, making it suitable for scenarios like JIT, instrumentation tools, and debugging.
Layer 1 provides structural compression on top of layer 0, exploiting specific knowledge about the nature of the syntax tree and its nodes. The structural compression introduces more efficient encoding of values, rearranges values within the module, and prunes structurally identical tree nodes.
Layer 2 :unicorn: Layer 2 applies generic compression algorithms, like gzip and Brotli, that are already available in browsers and other tooling.

Most importantly, the layering approach allows development and standardization to occur incrementally. For example, Layer 1 and Layer 2 encoding techniques can be experimented with by application-level decompressing to the layer below. As compression techniques stabilize, they can be standardized and moved into native implementations.

See proposed layer 1 compression for a proposal for layer 1 structural compression.

Numbers

`uintN`

An unsigned integer of N bits, represented in N/8 bytes in little endian order. N is either 8, 16, or 32.

`varuintN`

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

Note: Currently, the only sizes used are varuint1, varuint7, and varuint32, where the former two are used for compatibility with potential future extensions.

`varintN`

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

Note: Currently, the only sizes used are varint7, varint32 and varint64.

Instruction Opcodes

In the MVP, the opcodes of instructions are all encoded in a single byte since there are fewer than 256 opcodes. Future features like SIMD and atomics will bring the total count above 256 and so an extension scheme will be necessary, designating one or more single-byte values as prefixes for multi-byte opcodes.

Language Types

All types are distinguished by a negative varint7 values that is the first byte of their encoding (representing a type constructor):

Opcode	Type constructor
`-0x01` (i.e., the byte `0x7f`)	`i32`
`-0x02` (i.e., the byte `0x7e`)	`i64`
`-0x03` (i.e., the byte `0x7d`)	`f32`
`-0x04` (i.e., the byte `0x7c`)	`f64`
`-0x10` (i.e., the byte `0x70`)	`anyfunc`
`-0x20` (i.e., the byte `0x60`)	`func`
`-0x40` (i.e., the byte `0x40`)	pseudo type for representing an empty `block_type`

Some of these will be followed by additional fields, see below.

Note: Gaps are reserved for future extensions. The use of a signed scheme is so that types can coexist in a single space with (positive) indices into the type section, which may be relevant for future extensions of the type system.

`value_type`

A varint7 indicating a value type. One of:

i32
i64
f32
f64

as encoded above.

`block_type`

A varint7 indicating a block signature. These types are encoded as:

either a value_type indicating a signature with a single result
or -0x40 (i.e., the byte 0x40) indicating a signature with 0 results.

`elem_type`

A varint7 indicating the types of elements in a table. In the MVP, only one type is available:

anyfunc

Note: In the future, other element types may be allowed.

`func_type`

The description of a function signature. Its type constructor is followed by an additional description:

Field	Type	Description
form	`varint7`	the value for the `func` type constructor as defined above
param_count	`varuint32`	the number of parameters to the function
param_types	`value_type*`	the parameter types of the function
return_count	`varuint1`	the number of results from the function
return_type	`value_type?`	the result type of the function (if return_count is 1)

Note: In the future, return_count and return_type might be generalised to allow multiple values.

Other Types

`global_type`

The description of a global variable.

Field	Type	Description
content_type	`value_type`	type of the value
mutability	`varuint1`	`0` if immutable, `1` if mutable

`table_type`

The description of a table.

Field	Type	Description
element_type	`elem_type`	the type of elements
limits	`resizable_limits`	see below

`memory_type`

The description of a memory.

Field	Type	Description
limits	`resizable_limits`	see below

`external_kind`

A single-byte unsigned integer indicating the kind of definition being imported or defined:

0 indicating a Function import or definition
1 indicating a Table import or definition
2 indicating a Memory import or definition
3 indicating a Global import or definition

`resizable_limits`

A packed tuple that describes the limits of a table or memory:

Field	Type	Description
flags	`varuint1`	`1` if the maximum field is present, `0` otherwise
initial	`varuint32`	initial length (in units of table elements or wasm pages)
maximum	`varuint32`?	only present if specified by `flags`

Note: In the future, the “flags” field may be changed to varuint32, e.g., to include a flag for sharing between threads.

`init_expr`

The encoding of an initializer expression is the normal encoding of the expression followed by the end opcode as a delimiter.

Note that get_global in an initializer expression can only refer to immutable imported globals and all uses of init_expr can only appear after the Imports section.

The following documents the current prototype format. This format is based on and supersedes the v8-native prototype format, originally in a public design doc.

High-level structure

The module starts with a preamble of two fields:

Field	Type	Description
magic number	`uint32`	Magic number `0x6d736100` (i.e., ‘\0asm’)
version	`uint32`	Version number, `0x1`

The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte section code that encodes either a known section or a custom section. The section length and payload data then follow. Known sections have non-zero ids, while custom sections have a 0 id followed by an identifying string as part of the payload.

Field	Type	Description
id	`varuint7`	section code
payload_len	`varuint32`	size of this section in bytes
name_len	`varuint32` ?	length of `name` in bytes, present if `id == 0`
name	`bytes` ?	section name: valid UTF-8 byte sequence, present if `id == 0`
payload_data	`bytes`	content of this section, of length `payload_len - sizeof(name) - sizeof(name_len)`

Each known section is optional and may appear at most once. Custom sections all have the same id (0), and can be named non-uniquely (all bytes composing their names may be identical).

Custom sections are intended to be used for debugging information, future evolution, or third party extensions. For MVP, we use a specific custom section (the Name Section) for debugging information.

If a WebAssembly implementation interprets the payload of any custom section during module validation or compilation, errors in that payload must not invalidate the module.

Known sections from the list below may not appear out of order, while custom sections may be interspersed before, between, as well as after any of the elements of the list, in any order. Certain custom sections may have their own ordering and cardinality requirements. For example, the Name section is expected to appear at most once, immediately after the Data section. Violation of such requirements may at most cause an implementation to ignore the section, while not invalidating the module.

The content of each section is encoded in its payload_data.

Section Name	Code	Description
Type	`1`	Function signature declarations
Import	`2`	Import declarations
Function	`3`	Function declarations
Table	`4`	Indirect function table and other tables
Memory	`5`	Memory attributes
Global	`6`	Global declarations
Export	`7`	Exports
Start	`8`	Start function declaration
Element	`9`	Elements section
Code	`10`	Function bodies (code)
Data	`11`	Data segments

The end of the last present section must coincide with the last byte of the module. The shortest valid module is 8 bytes (magic number, version, followed by zero sections).

Type section

The type section declares all function signatures that will be used in the module.

Field	Type	Description
count	`varuint32`	count of type entries to follow
entries	`func_type*`	repeated type entries as described above

Note: In the future, this section may contain other forms of type entries as well, which can be distinguished by the form field of the type encoding.

Import section

The import section declares all imports that will be used in the module.

Field	Type	Description
count	`varuint32`	count of import entries to follow
entries	`import_entry*`	repeated import entries as described below

Import entry

Field	Type	Description
module_len	`varuint32`	length of `module_str` in bytes
module_str	`bytes`	module name: valid UTF-8 byte sequence
field_len	`varuint32`	length of `field_str` in bytes
field_str	`bytes`	field name: valid UTF-8 byte sequence
kind	`external_kind`	the kind of definition being imported

Followed by, if the kind is Function:

Field	Type	Description
type	`varuint32`	type index of the function signature

or, if the kind is Table:

Field	Type	Description
type	`table_type`	type of the imported table

or, if the kind is Memory:

Field	Type	Description
type	`memory_type`	type of the imported memory

or, if the kind is Global:

Field	Type	Description
type	`global_type`	type of the imported global

Note that, in the MVP, only immutable global variables can be imported.

Function section

The function section declares the signatures of all functions in the module (their definitions appear in the code section).

Field	Type	Description
count	`varuint32`	count of signature indices to follow
types	`varuint32*`	sequence of indices into the type section

Table section

The encoding of a Table section:

Field	Type	Description
count	`varuint32`	indicating the number of tables defined by the module
entries	`table_type*`	repeated `table_type` entries as described above

In the MVP, the number of tables must be no more than 1.

Memory section

ID: memory

The encoding of a Memory section:

Field	Type	Description
count	`varuint32`	indicating the number of memories defined by the module
entries	`memory_type*`	repeated `memory_type` entries as described above

Note that the initial/maximum fields are specified in units of WebAssembly pages.

In the MVP, the number of memories must be no more than 1.

Global section

The encoding of the Global section:

Field	Type	Description
count	`varuint32`	count of global variable entries
globals	`global_variable*`	global variables, as described below

Global Entry

Each global_variable declares a single global variable of a given type, mutability and with the given initializer.

Field	Type	Description
type	`global_type`	type of the variables
init	`init_expr`	the initial value of the global

Note that, in the MVP, only immutable global variables can be exported.

Export section

The encoding of the Export section:

Field	Type	Description
count	`varuint32`	count of export entries to follow
entries	`export_entry*`	repeated export entries as described below

Export entry

Field	Type	Description
field_len	`varuint32`	length of `field_str` in bytes
field_str	`bytes`	field name: valid UTF-8 byte sequence
kind	`external_kind`	the kind of definition being exported
index	`varuint32`	the index into the corresponding index space

For example, if the “kind” is Function, then “index” is a function index. Note that, in the MVP, the only valid index value for a memory or table export is 0.

Start section

The start section declares the start function.

Field	Type	Description
index	`varuint32`	start function index

Element section

The encoding of the Elements section:

Field	Type	Description
count	`varuint32`	count of element segments to follow
entries	`elem_segment*`	repeated element segments as described below

a elem_segment is:

Field	Type	Description
index	`varuint32`	the table index (0 in the MVP)
offset	`init_expr`	an `i32` initializer expression that computes the offset at which to place the elements
num_elem	`varuint32`	number of elements to follow
elems	`varuint32*`	sequence of function indices

Code section

ID: code

The code section contains a body for every function in the module. The count of function declared in the function section and function bodies defined in this section must be the same and the ith declaration corresponds to the ith function body.

Field	Type	Description
count	`varuint32`	count of function bodies to follow
bodies	`function_body*`	sequence of Function Bodies

Data section

The data section declares the initialized data that is loaded into the linear memory.

Field	Type	Description
count	`varuint32`	count of data segments to follow
entries	`data_segment*`	repeated data segments as described below

a data_segment is:

Field	Type	Description
index	`varuint32`	the linear memory index (0 in the MVP)
offset	`init_expr`	an `i32` initializer expression that computes the offset at which to place the data
size	`varuint32`	size of `data` (in bytes)
data	`bytes`	sequence of `size` bytes

Name section

Custom section name field: "name"

The name section is a custom section. It is therefore encoded with id 0 followed by the name string "name". Like all custom sections, this section being malformed does not cause the validation of the module to fail. It is up to the implementation how it handles a malformed or partially malformed name section. The WebAssembly implementation is also free to choose to read and process this section lazily, after the module has been instantiated, should debugging be required.

The name section may appear only once, and only after the Data section. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the data in this section will be used as the names of functions and locals in the text format.

The name section contains a sequence of name subsections:

Field	Type	Description
name_type	`varuint7`	code identifying type of name contained in this subsection
name_payload_len	`varuint32`	size of this subsection in bytes
name_payload_data	`bytes`	content of this section, of length `name_payload_len`

Since name subsections have a given length, unknown or unwanted subsections can be skipped over by an engine. The current list of valid name_type codes are:

Name Type	Code	Description
Module	`0`	Assigns a name to the module
Function	`1`	Assigns names to functions
Local	`2`	Assigns names to locals in functions

When present, subsections must appear in this order and at most once. The end of the last subsection must coincide with the last byte of the name section to be a well-formed name section.

Module name

The module name subsection assigns a name to the module itself. It simply consists of a single string:

Field	Type	Description
name_len	`varuint32`	length of `name_str` in bytes
name_str	`bytes`	UTF-8 encoding of the name

Name Map

In the following subsections, a name_map is encoded as:

Field	Type	Description
count	`varuint32`	number of `naming` in names
names	`naming*`	sequence of `naming` sorted by index

where a naming is encoded as:

Field	Type	Description
index	`varuint32`	the index which is being named
name_len	`varuint32`	length of `name_str` in bytes
name_str	`bytes`	UTF-8 encoding of the name

Function names

The function names subsection is a name_map which assigns names to a subset of the function index space (both imports and module-defined).

Each function may be named at most once. Naming a function more than once results in the section being malformed.

However, names need not be unique. The same name may be given for multiple functions. This is common for C++ programs where the multiple compilation units that comprise a binary can contain local functions with the same name.

Local names

The local names subsection assigns name_maps to a subset of functions in the function index space (both imports and module-defined). The name_map for a given function assigns names to a subset of local variable indices.

Field	Type	Description
count	`varuint32`	count of `local_names` in funcs
funcs	`local_names*`	sequence of `local_names` sorted by index

where a local_name is encoded as:

Field	Type	Description
index	`varuint32`	the index of the function whose locals are being named
local_map	`name_map`	assignment of names to local indices

Function bodies consist of a sequence of local variable declarations followed by bytecode instructions. Instructions are encoded as an opcode followed by zero or more immediates as defined by the tables below. Each function body must end with the end opcode.

Field	Type	Description
body_size	`varuint32`	size of function body to follow, in bytes
local_count	`varuint32`	number of local entries
locals	`local_entry*`	local variables
code	`byte*`	bytecode of the function
end	`byte`	`0x0b`, indicating the end of the body

Local Entry

Each local entry declares a number of local variables of a given type. It is legal to have several entries with the same type.

Field	Type	Description
count	`varuint32`	number of local variables of the following type
type	`value_type`	type of the variables

Control flow operators (described here)

Name	Opcode	Immediates	Description
`unreachable`	`0x00`		trap immediately
`nop`	`0x01`		no operation
`block`	`0x02`	sig : `block_type`	begin a sequence of expressions, yielding 0 or 1 values
`loop`	`0x03`	sig : `block_type`	begin a block which can also form control flow loops
`if`	`0x04`	sig : `block_type`	begin if expression
`else`	`0x05`		begin else expression of if
`end`	`0x0b`		end a block, loop, or if
`br`	`0x0c`	relative_depth : `varuint32`	break that targets an outer nested block
`br_if`	`0x0d`	relative_depth : `varuint32`	conditional break that targets an outer nested block
`br_table`	`0x0e`	see below	branch table control flow construct
`return`	`0x0f`		return zero or one value from this function

The sig fields of block and if operators specify function signatures which describe their use of the operand stack.

The br_table operator has an immediate operand which is encoded as follows:

Field	Type	Description
target_count	`varuint32`	number of entries in the target_table
target_table	`varuint32*`	target entries that indicate an outer block or loop to which to break
default_target	`varuint32`	an outer block or loop to which to break in the default case

The br_table operator implements an indirect branch. It accepts an optional value argument (like other branches) and an additional i32 expression as input, and branches to the block or loop at the given offset within the target_table. If the input value is out of range, br_table branches to the default target.

Note: Gaps in the opcode space, here and elsewhere, are reserved for future extensions.

Call operators (described here)

Name	Opcode	Immediates	Description
`call`	`0x10`	function_index : `varuint32`	call a function by its index
`call_indirect`	`0x11`	type_index : `varuint32`, reserved : `varuint1`	call a function indirect with an expected signature

The call_indirect operator takes a list of function arguments and as the last operand the index into the table. Its reserved immediate is for future use and must be 0 in the MVP.

Parametric operators (described here)

Name	Opcode	Immediates	Description
`drop`	`0x1a`		ignore value
`select`	`0x1b`		select one of two values based on condition

Variable access (described here)

Name	Opcode	Immediates	Description
`get_local`	`0x20`	local_index : `varuint32`	read a local variable or parameter
`set_local`	`0x21`	local_index : `varuint32`	write a local variable or parameter
`tee_local`	`0x22`	local_index : `varuint32`	write a local variable or parameter and return the same value
`get_global`	`0x23`	global_index : `varuint32`	read a global variable
`set_global`	`0x24`	global_index : `varuint32`	write a global variable

Memory-related operators (described here)

Name	Opcode	Immediate	Description
`i32.load`	`0x28`	`memory_immediate`	load from memory
`i64.load`	`0x29`	`memory_immediate`	load from memory
`f32.load`	`0x2a`	`memory_immediate`	load from memory
`f64.load`	`0x2b`	`memory_immediate`	load from memory
`i32.load8_s`	`0x2c`	`memory_immediate`	load from memory
`i32.load8_u`	`0x2d`	`memory_immediate`	load from memory
`i32.load16_s`	`0x2e`	`memory_immediate`	load from memory
`i32.load16_u`	`0x2f`	`memory_immediate`	load from memory
`i64.load8_s`	`0x30`	`memory_immediate`	load from memory
`i64.load8_u`	`0x31`	`memory_immediate`	load from memory
`i64.load16_s`	`0x32`	`memory_immediate`	load from memory
`i64.load16_u`	`0x33`	`memory_immediate`	load from memory
`i64.load32_s`	`0x34`	`memory_immediate`	load from memory
`i64.load32_u`	`0x35`	`memory_immediate`	load from memory
`i32.store`	`0x36`	`memory_immediate`	store to memory
`i64.store`	`0x37`	`memory_immediate`	store to memory
`f32.store`	`0x38`	`memory_immediate`	store to memory
`f64.store`	`0x39`	`memory_immediate`	store to memory
`i32.store8`	`0x3a`	`memory_immediate`	store to memory
`i32.store16`	`0x3b`	`memory_immediate`	store to memory
`i64.store8`	`0x3c`	`memory_immediate`	store to memory
`i64.store16`	`0x3d`	`memory_immediate`	store to memory
`i64.store32`	`0x3e`	`memory_immediate`	store to memory
`current_memory`	`0x3f`	reserved : `varuint1`	query the size of memory
`grow_memory`	`0x40`	reserved : `varuint1`	grow the size of memory

The memory_immediate type is encoded as follows:

Name	Type	Description
flags	`varuint32`	a bitfield which currently contains the alignment in the least significant bits, encoded as `log2(alignment)`
offset	`varuint32`	the value of the offset

As implied by the log2(alignment) encoding, the alignment must be a power of 2. As an additional validation criteria, the alignment must be less or equal to natural alignment. The bits after the log(memory-access-size) least-significant bits must be set to 0. These bits are reserved for future use (e.g., for shared memory ordering requirements).

The reserved immediate to the current_memory and grow_memory operators is for future use and must be 0 in the MVP.

Constants (described here)

Name	Opcode	Immediates	Description
`i32.const`	`0x41`	value : `varint32`	a constant value interpreted as `i32`
`i64.const`	`0x42`	value : `varint64`	a constant value interpreted as `i64`
`f32.const`	`0x43`	value : `uint32`	a constant value interpreted as `f32`
`f64.const`	`0x44`	value : `uint64`	a constant value interpreted as `f64`

Comparison operators (described here)

Name	Opcode	Immediate	Description
`i32.eqz`	`0x45`
`i32.eq`	`0x46`
`i32.ne`	`0x47`
`i32.lt_s`	`0x48`
`i32.lt_u`	`0x49`
`i32.gt_s`	`0x4a`
`i32.gt_u`	`0x4b`
`i32.le_s`	`0x4c`
`i32.le_u`	`0x4d`
`i32.ge_s`	`0x4e`
`i32.ge_u`	`0x4f`
`i64.eqz`	`0x50`
`i64.eq`	`0x51`
`i64.ne`	`0x52`
`i64.lt_s`	`0x53`
`i64.lt_u`	`0x54`
`i64.gt_s`	`0x55`
`i64.gt_u`	`0x56`
`i64.le_s`	`0x57`
`i64.le_u`	`0x58`
`i64.ge_s`	`0x59`
`i64.ge_u`	`0x5a`
`f32.eq`	`0x5b`
`f32.ne`	`0x5c`
`f32.lt`	`0x5d`
`f32.gt`	`0x5e`
`f32.le`	`0x5f`
`f32.ge`	`0x60`
`f64.eq`	`0x61`
`f64.ne`	`0x62`
`f64.lt`	`0x63`
`f64.gt`	`0x64`
`f64.le`	`0x65`
`f64.ge`	`0x66`

Numeric operators (described here)

Name	Opcode	Immediate	Description
`i32.clz`	`0x67`
`i32.ctz`	`0x68`
`i32.popcnt`	`0x69`
`i32.add`	`0x6a`
`i32.sub`	`0x6b`
`i32.mul`	`0x6c`
`i32.div_s`	`0x6d`
`i32.div_u`	`0x6e`
`i32.rem_s`	`0x6f`
`i32.rem_u`	`0x70`
`i32.and`	`0x71`
`i32.or`	`0x72`
`i32.xor`	`0x73`
`i32.shl`	`0x74`
`i32.shr_s`	`0x75`
`i32.shr_u`	`0x76`
`i32.rotl`	`0x77`
`i32.rotr`	`0x78`
`i64.clz`	`0x79`
`i64.ctz`	`0x7a`
`i64.popcnt`	`0x7b`
`i64.add`	`0x7c`
`i64.sub`	`0x7d`
`i64.mul`	`0x7e`
`i64.div_s`	`0x7f`
`i64.div_u`	`0x80`
`i64.rem_s`	`0x81`
`i64.rem_u`	`0x82`
`i64.and`	`0x83`
`i64.or`	`0x84`
`i64.xor`	`0x85`
`i64.shl`	`0x86`
`i64.shr_s`	`0x87`
`i64.shr_u`	`0x88`
`i64.rotl`	`0x89`
`i64.rotr`	`0x8a`
`f32.abs`	`0x8b`
`f32.neg`	`0x8c`
`f32.ceil`	`0x8d`
`f32.floor`	`0x8e`
`f32.trunc`	`0x8f`
`f32.nearest`	`0x90`
`f32.sqrt`	`0x91`
`f32.add`	`0x92`
`f32.sub`	`0x93`
`f32.mul`	`0x94`
`f32.div`	`0x95`
`f32.min`	`0x96`
`f32.max`	`0x97`
`f32.copysign`	`0x98`
`f64.abs`	`0x99`
`f64.neg`	`0x9a`
`f64.ceil`	`0x9b`
`f64.floor`	`0x9c`
`f64.trunc`	`0x9d`
`f64.nearest`	`0x9e`
`f64.sqrt`	`0x9f`
`f64.add`	`0xa0`
`f64.sub`	`0xa1`
`f64.mul`	`0xa2`
`f64.div`	`0xa3`
`f64.min`	`0xa4`
`f64.max`	`0xa5`
`f64.copysign`	`0xa6`

Conversions (described here)

Name	Opcode	Immediate	Description
`i32.wrap/i64`	`0xa7`
`i32.trunc_s/f32`	`0xa8`
`i32.trunc_u/f32`	`0xa9`
`i32.trunc_s/f64`	`0xaa`
`i32.trunc_u/f64`	`0xab`
`i64.extend_s/i32`	`0xac`
`i64.extend_u/i32`	`0xad`
`i64.trunc_s/f32`	`0xae`
`i64.trunc_u/f32`	`0xaf`
`i64.trunc_s/f64`	`0xb0`
`i64.trunc_u/f64`	`0xb1`
`f32.convert_s/i32`	`0xb2`
`f32.convert_u/i32`	`0xb3`
`f32.convert_s/i64`	`0xb4`
`f32.convert_u/i64`	`0xb5`
`f32.demote/f64`	`0xb6`
`f64.convert_s/i32`	`0xb7`
`f64.convert_u/i32`	`0xb8`
`f64.convert_s/i64`	`0xb9`
`f64.convert_u/i64`	`0xba`
`f64.promote/f32`	`0xbb`

Reinterpretations (described here)

Name	Opcode	Immediate	Description
`i32.reinterpret/f32`	`0xbc`
`i64.reinterpret/f64`	`0xbd`
`f32.reinterpret/i32`	`0xbe`
`f64.reinterpret/i64`	`0xbf`