Briar: SerialisationThis document describes the serialisation format used by the Briar messaging protocol. The format is broadly similar to other space-efficient binary formats such as MessagePack.
The format's primitive types are boolean, uint7, int8, int16, int32, int64, float32, float64, string, raw and null. Lists, maps and structs can be constructed from these primitives, and can be nested.
All integers use big-endian two's complement representation, all floating point numbers use IEEE 754, and all strings use UTF-8.
TagsThe type of each serialised object can be identified by examining its first byte, which is called a tag:
- 0xFF: Boolean false.
- 0xFE: Boolean true.
- 0xFD: Int8. The following byte contains a signed 8-bit integer.
- 0xFC: Int16. The following two bytes contain a signed 16-bit integer.
- 0xFB: Int32. The following four bytes contain a signed 32-bit integer.
- 0xFA: Int64. The following eight bytes contain a signed 64-bit integer.
- 0xF9: Float32. The following four bytes contain a 32-bit floating point number.
- 0xF8: Float64. The following eight bytes contain a 64-bit floaing point number.
- 0xF7: String. The tag is followed by a length specification and the specified number of UTF-8 bytes. A length specification is an integer between 0 and 2^32 - 1, inclusive, which must be encoded using the shortest possible integer type; thus a length specification will always be encoded as a uint7, int16, or int32.
- 0xF6: Raw. The tag is followed by a length specification and the specified number of 'raw' bytes, which are not interpreted by the parser.
- 0xF5: List. The tag is followed by zero or more serialised objects, which are the list's elements, followed by an end tag (0xF3).
- 0xF4: Map. The tag is followed by zero or more pairs of serialised objects, which are the map's key-value pairs, followed by an end tag (0xF3).
- 0xF3: End. Used to mark the end of a list or map.
- 0xF2: Null. Used to indicate that an optional object is absent.
- 0xF1: Struct. The tag is followed by a byte identifying one of 256 possible structs. The interpretation of the following bytes, including the determination of where the struct ends, depends upon the struct's type definition (see below).
- 0x00 - 0x7F: Uint7. The least significant seven bits of the tag contain an unsigned 7-bit integer representing a value between 0 and 127 inclusive.
- 0x00: The integer zero, encoded as a uint7.
- 0xFC FFFF: The integer -1, encoded as an int16.
- 0xF7 00: The empty string, with its length of zero encoded as a uint7.
- 0xF7 01 20: The string " ", with its length of one encoded as a uint7.
- 0xF5 F700 F700 F3: A list containing two empty strings.
- 0xF5 F4 F70120 F2 F3 F3: A list containing a map containing a key of " " and a value of null.
StructsA struct is an ordered sequence of other types, which are called its components. A struct's type definition lists the order and types of its components, and whether each component is optional or mandatory. If an optional component is absent from a struct then its place must be marked by a null. The type definition may place constraints on the values of components.
Structs can contain primitive types, lists, maps, and other structs. Up to 256 structs can be defined, each of which is assigned an identifier between 0 and 255 inclusive.
As an example, we define two structs: cat and lolcat. A cat consists of a mandatory name, represented as a string of 1 - 30 UTF-8 bytes; an optional age in days, represented as a positive integer; and an optional sex, represented as a boolean, where true represents male and false represents female. A lolcat consists of a mandatory cat and a mandatory caption; the caption is represented as a string of 1 - 100 UTF-8 bytes. The struct cat is assigned the identifier 0, while lolcat is assigned the identifier 1.
- 0xF1 00 F703666F6F 7F F2: A cat with the name "foo", age 127 days and unknown sex.
- 0xF1 01 F1 00 F703666F6F 7F F2 F703626172: A lolcat consisting of the same cat with the caption "bar".
- 0xF1 00 F700 F2 F2: A cat with an empty name, unknown age and unknown sex. This object is invalid because the type definition specifies that the name must contain 1 - 30 UTF-8 bytes.
Compact EncodingsShort strings, raws, lists and maps can be represented in a compact way by encoding the length in the tag, so the object is only one byte larger than its contents.
Structs with identifiers less than 32 can be encoded in a similar way, so identifiers less than 32 should be used for structs that are expected to be used frequently.
- 0xC0 - 0xDF: Short struct. A compact encoding for the first 32 structs. The least significant five bits of the tag contain an unsigned 5-bit integer identifying a struct.
- 0xB0 - 0xBF: Short map. A compact encoding for maps with fewer than 16 key-value pairs. The least significant four bits of the tag contain an unsigned 4-bit integer representing the number of key-value pairs in the map.
- 0xA0 - 0xAF: Short list. A compact encoding for lists with fewer than 16 elements. The least significant four bits of the tag contain an unsigned 4-bit integer representing the number of elements in the list.
- 0x90 - 0x9F: Short raw. A compact encoding for raws of fewer than 16 bytes. The least significant four bits of the tag contain an unsigned 4-bit integer representing the number of raw bytes.
- 0x80 - 0x8F: Short string. A compact encoding for UTF-8 strings of fewer than 16 bytes. The least significant four bits of the tag contain an unsigned 4-bit integer representing the number of UTF-8 bytes.
- 0x80: The empty string, encoded as a short string.
- 0x81 20: The string " ", encoded as a short string.
- 0xA2 80 80: A list containing two empty strings.
- 0xA1 B1 8120 F2: A list containing a map containing a key of " " and a value of null.