akwizgran created page: BSP

2026-05-09 16:54:14 +02:00 · 2015-04-01 10:51:42 +00:00
parent aa80e8894e
commit 97bde15ee3
1 changed files with 39 additions and 12 deletions
@@ -10,29 +10,56 @@ We use || to denote concatenation, double quotes to denote an ASCII string, int(

 ### Crypto primitives

-BSP uses a cryptographic hash function, H(m), with an output length of HASH_LEN bytes.
+BSP uses a cryptographic hash function, H(m), with an output length of HASH_LEN bytes, and a random number generator, R(n), with an output length of n bytes. R(n) may be a true random number generator or a cryptographically secure pseudo-random number generator. We use H(m) to define a multi-argument hash function:
+
+HASH(x_1, ..., x_n) == H(len(x_1) || x_1 || ... || len(x_n) || x_n)
+

 ### Channel identifiers

-Each channel has a unique identifier HASH_LEN bytes long. This identifier is supplied by the application and is not interpreted by BSP. To prevent collisions, the identifier must either be random, or be the cryptographic hash of an application data structure describing the channel. If a hash is used, a random application identifier HASH_LEN bytes long must be prepended to the application data structure before hashing to prevent collisions between applications with similar data structures.
+Each channel has a unique identifier HASH_LEN bytes long. This identifier is supplied by the application and is not interpreted by BSP. However, to prevent collisions between applications, BSP specifies how channel identifiers may be generated. The channel identifier may be random:
+
+channel_id = R(HASH_LEN)
+
+Alternatively, the channel identifier may be the hash of a random application identifier HASH_LEN bytes long and an application data structure describing the channel:
+
+app_id = R(HASH_LEN)
+channel_id = HASH(app_id, app_data_structure)
+
+Including the application identifier in the hash prevents collisions between applications with similar data structures.

 ### Message format

-Each message consists of one or more blocks. Each block is BLOCK_LEN bytes long, except the last block of the message, which may be shorter. (We require that BLOCK_LEN <= 2^15 .) The blocks form the leaves of a binary hash tree. Each parent node consists of the concatenated hashes of its children. If the number of blocks in the message is not a power of two, some parent nodes will only have one child.
+Each message consists of one or more blocks. Each block is BLOCK_LEN bytes long, except the last block of the message, which may be shorter. (We require that BLOCK_LEN &lt;= 2^15 .)

-The message's unique identifier is calculated by hashing a message header concatenated with the root hash of the tree. The message header consists of the channel identifier, a timestamp, the message length and the message type.
+message = block_1 || ... || block_n
+
+The blocks form the leaves of a binary hash tree. Each parent node consists of the concatenated hashes of its children. If the number of blocks in the message is not a power of two, some parent nodes will only have one child. The hash of the root node is called the root hash of the tree.
+
+block_hash = HASH("BLOCK", block)
+parent_node = only_child_hash
+parent_node = left_child_hash || right_child_hash
+parent_hash = HASH("TREE", parent_node)
+
+The message's unique identifier is calculated by hashing a message header and the root hash. The message header consists of the channel identifier, a timestamp, the message length and the message type.
+
+message_header = channel_id || timestamp || message_length || message_type
+message_id = HASH(message_header, root_hash)

 The timestamp is a 64-bit integer representing seconds since the Unix epoch. (All integers in BSP are big-endian.) The message length is a 64-bit integer representing the length of the message in bytes. The message type is a single byte that is supplied by the application and is not interpreted by BSP.

-Each block has a unique identifier, which is calculated by hashing the message header concatenated with a block header and the hash of the block itself. The block header consists of the block number and a list of hashes called the path.
+Each block has a unique identifier, which is calculated by hashing the message header, a block header and the hash of the block itself. The block header consists of the block number and a list of hashes called the path.

-The block number is a 64-bit integer starting from zero for the first block of the message. The path consists of the hashes of the siblings of the block's ancestors in the hash tree. If the number of blocks in the message is not a power of two, some of the block's ancestors may not have siblings. The positions of any such ancestors can be calculated from the message length and the block number, so the number of hashes in the path does not need to be recorded explicitly.
+block_header = block_number || path_1 || ... || path_n
+block_id = HASH("BLOCK_ID", message_header, block_header, block_hash)
+
+The block number is a 64-bit integer starting from zero for the first block of the message. The path consists of the hashes of the siblings of the block's ancestors in the hash tree. If the number of blocks in the message is not a power of two, some of the block's ancestors may not have siblings. The positions of any such ancestors can be calculated from the message length and the block number.

 A block accompanied by its message and block headers is called a portable block. A portable block is valid if the message length, block number, path and block length are consistent with each other. A valid portable block contains all the information needed to calculate the message identifier. Any valid portable blocks that produce the same message identifier are guaranteed to be consistent with each other.

 ### Records

-The local and remote peers synchronise data by sending simplex streams of bytes to each other. A stream consists of a series of records, each of which starts with a record header with the following format:
+The local and remote peers synchronise data by sending simplex streams of bytes to each other. Each stream consists of a series of records. Each header starts with a record header that is 4 bytes long, with the following format:

 * Bits 0-7: Protocol version
 * Bits 8-15: Record type
@@ -40,7 +67,7 @@ The local and remote peers synchronise data by sending simplex streams of bytes

 A stream may contain records of any type in any order. If the recipient does not recognise a record's protocol version or record type, the recipient skips to the next record in the stream.

-The current version of the protocol is 1, with five record types:
+The current version of the protocol is 1, which has five record types:

 **0: OFFER** - The payload consists of one or more block identifiers. This record informs the recipient that the sender holds the listed blocks and asks the recipient whether to send them.

@@ -62,7 +89,7 @@ The local peer stores the following synchronisation state for each block it hold
 * Ack flag - Set to 1 if the local peer needs to acknowledge the block, otherwise 0
 * Request flag - Set to 1 if the remote peer has requested the block since it was last sent, otherwise 0
 * Send count - The number of times the block has been offered or sent to the remote peer
-* Send time - A timestamp indicating when the block can next be offered or sent to the remote peer; measured in seconds since the Unix epoch and initialised to 0
+* Send time - A timestamp indicating when the block can next be offered or sent to the remote peer, measured in seconds since the Unix epoch and initialised to 0

 The local peer also stores a list of message identifiers that have been offered by the remote peer and not yet acknowledged or requested by the local peer. The length of this list should be bounded, and the local peer should discard the oldest identifiers if the maximum length is reached.

@@ -70,7 +97,7 @@ The local peer also stores a list of message identifiers that have been offered

 The sender of each stream decides whether the stream will use interactive mode or batch mode. Interactive mode uses less bandwidth than batch mode, but needs two round-trips for synchronisation, whereas batch mode needs one. The sender's choice may depend on prior knowledge or measurements of the underlying transport. BSP does not specify how to decide which mode to use - the local and remote peers may use different criteria, and peers may change their criteria at any time.

-In interactive mode, blocks are offered before being sent. The sender does the following (in any order):
+In interactive mode, blocks are offered before being sent. The sender does the following, in any order:

 * Acknowledge any blocks sent by the recipient
 * Acknowledge any blocks offered by the recipient that the sender holds
@@ -78,7 +105,7 @@ In interactive mode, blocks are offered before being sent. The sender does the f
 * Offer any blocks that the sender does not know whether the recipient holds
 * Send any blocks requested by the recipient

-In batch mode, blocks are sent without being offered. The sender does the following (in any order):
+In batch mode, blocks are sent without being offered. The sender does the following, in any order:

 * Acknowledge any blocks sent by the recipient
 * Acknowledge any blocks offered by the recipient that the sender holds
@@ -90,7 +117,7 @@ A block is not offered or sent until its send time is reached.

 ### Retransmission

-Whenever the local peer offers or sends a block it updates the block's send count and send time. BSP does not specify how the send time should be updated, except that the amount by which it is updated should increase exponentially with the send count. The local peer may base the updates on measurements of the transport's round-trip time and round-trip time variance, as in TCP, or it may use any other method.
+Whenever the local peer offers or sends a block it increased the block's send count and send time. BSP does not specify how the send time should be increased, except that the amount by which it is increased should increase exponentially with the send count. The local peer may increase the send time based on measurements of the transport's round-trip time and round-trip time variance, as in TCP, or it may use any other method.

 ### Resetting