cip19
titleShwap Protocol
descriptionShwap - a new messaging framework for DA and sampling
authorHlib Kanunnikov (@Wondertan)
discussions-tohttps://forum.celestia.org/t/cip-shwap-protocol/1551
statusReview
typeStandards Track
categoryData Availability, Networking
created2024-02-02

Abstract

This document specifies Shwap (a portmanteau of share and swap) - the simple, expressive, and extensible messaging framework aiming to solve critical inefficiencies and standardize messaging of Celestia’s Data Availability p2p network.

Shwap defines a messaging framework to be exchanged around the DA p2p network in a trust-minimized way without enforcing transport (QUIC/TCP or IP) or application layer protocol semantics (e.g., HTTP/x). Using this framework, Shwap declares the most common messages and provides options for stacking them with lower-level protocols. Shwap can be stacked together with application protocol like HTTP/x, KadDHT, Bitswap or any custom protocol.

Motivation

The current Data Availability Sampling (DAS) network protocol is inefficient. A single sample operation takes log₂(k) network roundtrips (where k is the extended square size). This is not practical and does not scale for the theoretically unlimited data square that the Celestia network enables. The main motive here is a protocol with O(1) roundtrip for multiple samples, preserving the assumption of having 1/N honest peers connected possessing the data.

Initially, Bitswap and IPLD were adopted as the basis for the DA network protocols, including DAS, block synchronization (BS), and blob/namespace data retrieval (ND). They gave battle-tested protocols and tooling with pluggability to rapidly scaffold Celestia’s DA network. However, it came with the price of scalability limits and roundtrips, resulting in slower BS than block production. Before the network launch, we transitioned to the optimized ShrEx protocol for BS and integrated CAR and DAGStore-based storage optimizing BS and ND. However, DAS was left untouched, preserving its weak scalability and roundtrip inefficiency.

Shwap messaging stacked together with Bitswap protocol directly addresses described inefficiency and provides a foundation for efficient communication for BS, ND, and beyond.

Rationale

The atomic primitive of Celestia’s DA network is the share. Shwap standardizes messaging and serialization for shares. Shares are grouped together, forming more complex data types (Rows, Blobs, etc.). These data types are encapsulated in containers. For example, a row container groups the shares of a particular row. Containers can be identified with the share identifiers in order to request, advertise or index the containers. The combination of containers and identifiers provides an extensible and expressive messaging framework for groups of shares and enables efficient single roundtrip request-response communication.

Many share groups or containers are known in the Celestia network, and systemizing this is the main reason behind setting up this simple messaging framework. A single place with all the possible Celestia DA messages must be defined, which node software and protocol researchers can rely on and coordinate. Besides, this framework is designed to be sustain changes in the core protocol’s data structures and proving system as long shares stay the de facto atomic data type.

Besides, there needs to be systematization and a joint knowledge base with all the edge cases for possible protocol compositions of Shwap with lower-level protocols Bitswap, KadDHT, or Shrex, which Shwap aims to describe.

Specification

Terms and Definitions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Commonly used terms in this document are described below.

Shwap: The protocol described by this document. Shwap is a portmanteau name of words share and swap.

Share: The core data structure of DataSquare “swapped” between peers.

DataSquare: The DA square format used by Celestia DA network.

DAH: The Data Availability Header with Row and Column commitments.

Namespace: The namespace grouping sets of shares.

Peer: An entity that can participate in a Shwap protocol. There are three types of peers: client, server, and node.

Client: The Peer that requests content by share identifiers over Shwap.

Server: The Peer that responds with shares over Shwap.

Node: The peer that is both the client and the server.

Proof: A Merkle inclusion proof of the data in the DataSquare.

Message Framework

This section defines Shwap’s messaging framework. Every group of shares that needs to be exchanged over the network MUST define its share identifier and share container and follow their described rules. Every identifier and container MUST define its serialization format, which MAY NOT be consistent with other identifiers and containers.

Share Containers

Share containers encapsulate a set of data shares with proof. Share containers are identified by share identifiers.

Containers SHOULD contain shares within a single DataSquare and MAY NOT be adjacent. Containers MUST have a DAH inclusion proof field defined.

Serialization for Share Containers

Share containers are RECOMMENDED to use protobuf (proto3) encoding, and other formats MAY be used for serialization. A container MAY define multiple serialization formats.

Share Identifiers

Share identifiers identify share containers. Identifiers are not collision-resistant and there MAY be different identifiers referencing the same container. There SHOULD NOT be multiple containers identified by the same identifier.

Identifiers MAY embed each other to narrow down the scope of needed shares. For example, SampleID embeds RowID as every sample lay on a particular row.

Serialization for Share Identifiers

Share identifiers MUST be serialized by concatenating big-endian representations of fields in the order defined by their respective formatting section. Serialized identifiers SHOULD have constant and predetermined lengths in bytes.

Versioning

If a defined share container or identifier requires an incompatible change, the new message type MAY be introduced suffixed with a new major version starting from v1. E.g., if the Row message needs a revision, RowV1 is created.

Messages

This section defines all the supported Shwap messages, including share identifiers and containers. All the new future messages should be described in this section.

EdsID

EdsID identifies the DataSquare.

EdsID identifiers are formatted as shown below:

EdsID {
    Height: u64;
}

The fields with validity rules that form EdsID are:

Height: A uint64 representing the chain height with the data square. It MUST be greater than zero.

Serialized EdsID MUST have a length of 8 bytes.

Eds Container

Eds containers encapsulate the DataSquare. Internally, they only keep the original data (1st quadrant) of the EDS with redundant data (2nd, 3rd and 4th quadrants) computable from the original data.

Eds containers MUST be formatted by serializing ODS left-to-right share-by-share in the row-major order.

Due to ever-growing nature of DataSquare, the Eds containers SHOULD be streamed over reliable links in the share-by-share formatting above.

RowID

RowID identifies the Row shares container in a DataSquare.

RowID identifiers are formatted as shown below:

RowID {
    EdsID;
    RowIndex: u16;
}

The fields with validity rules that form RowID are:

EdsID: A EdsID of the Row Container. It MUST follow EdsID formatting and field validity rules.

RowIndex: An uint16 representing row index points to a particular row. The 16 bit limit fits data squares up to 2TB. It MUST not exceed the number of DAH Row roots reduced by one.

Serialized RowID MUST have a length of 10 bytes.

Row Container

Row containers encapsulate the rows of the DataSquare. Internally, they only keep the left (original) half of the row with right (redundant) half recomputable from the left half.

Row containers are protobuf formatted using the following proto3 schema:

syntax = "proto3";

message Row {
  repeated Share shares_half = 1;
  HalfSide half_side= 2;
  
  enum HalfSide {
    LEFT = 0;
    RIGHT = 1;
  }
}

The fields with validity rules that form Row containers are:

SharesHalf: A variable size Share array representing either left or right half of a row. Which half side is defined by HalfSide field. Its length MUST be equal to the number of Column roots in DAH divided by two. The opposite half is computed using Leopard Reed-Solomon erasure-coding. The Leopard algorithm must operate over 8-bit Galois Fields for rows of total size less than or equal to 256 shares or 16-bit GF otherwise. Afterward, the NMT is built over both halves, and the computed NMT root MUST be equal to the respective Row root in DAH.

HalfSide: An enum defining which side of the row SharesHalf field contains. It MUST be either LEFT or RIGHT.

SampleID

SampleID identifies a Sample container of a single share in a DataSquare.

SampleID identifiers are formatted as shown below:

SampleID {
    RowID;
    ColumnIndex: u16; 
}

The fields with validity rules that form SampleID are:

RowID: A RowID of the sample. It MUST follow RowID formatting and field validity rules.

ColumnIndex: A uint16 representing the column index of the sampled share; in other words, the share index in the row. The 16 bit limit fits data squares up to 2TB. It MUST stay within the number of DAH Column roots reduced by one.

Serialized SampleID MUST have a length of 12 bytes.

Sample Container

Sample containers encapsulate single shares of the DataSquare.

Sample containers are protobuf formatted using the following proto3 schema:

syntax = "proto3";

message Sample {
    Share share = 1;
    Proof proof = 2;
    AxisType proof_type = 3;
}

The fields with validity rules that form Sample containers are:

Share: A Share of a sample.

Proof: A protobuf formated NMT proof of share inclusion. It MUST follow NMT proof verification and be verified against the respective root from the Row or Column axis in DAH. The axis is defined by the ProofType field.

AxisType: An enum defining which axis root the Proof field is coming from. It MUST be either ROW or COL.

RowNamespaceDataID

RowNamespaceDataID identifies namespace Data container of shares within a single Row. That is, namespace shares spanning over multiple Rows are identified with multiple identifiers.

RowNamespaceDataID identifiers are formatted as shown below:

RowNamespaceDataID {
    RowID;
    Namespace;
}

The fields with validity rules that form RowNamespaceDataID are:

RowID: A RowID of the namespace data. It MUST follow RowID formatting and field validity rules.

Namespace: A fixed-size 29 bytes array representing the Namespace of interest. It MUST follow Namespace formatting and its validity rules.

Serialized RowNamespaceDataID MUST have a length of 39 bytes.

RowNamespaceData Container

RowNamespaceData containers encapsulate user-submitted data under namespaces within a single DataSquare row.

RowNamespaceData containers are protobuf formatted using the following proto3 schema:

syntax = "proto3";

message RowNamespaceData {
    repeated Share shares = 1;
    Proof proof = 2;
}

The fields with validity rules that form Data containers are:

Shares: A variable size Share array representing data of a namespace in a row.

Proof: A protobuf formated NMT proof of share inclusion. It MUST follow NMT proof verification and be verified against the respective root from the Row root axis in DAH.

Namespace data may span over multiple rows, in which case all the data is encapsulated in multiple RowNamespaceData containers. This enables parallelization of namespace data retrieval and certain compositions may get advantage of that by requesting containers of a single namespace from multiple servers simultaneously.

Core Structures

This section is purposed for messages that do not fit into Identifier or Container categories, but have to be strictly specified to be used across the categories and beyond.

AxisType

The data square consists of rows and columns of shares that are committed in NMT merkle trees. Subsequently, we have two commitments over any share in the square. AxisType helps to point to either of those in different contexts.

syntax = "proto3";

enum AxisType {
    ROW = 0;
    COL = 1;
}

Share

Share defines the atomic primitive of the data square.

syntax = "proto3";

message Share {
   bytes data = 1;
}

The fields with validity rules that form Data containers are:

Data: A variable size byte array representing a share. It MUST follow share formatting and validity rules.

Protocol Compositions

This section specifies compositions of Shwap with other protocols. While Shwap is transport agnostic, there are rough edges on the protocol integration, which every composition specification has to describe.

Bitswap

Bitswap is an application-level protocol for sharing verifiable data across peer-to-peer networks. Bitswap operates as a dynamic want-list exchange among peers in a network. Peers continuously update and share their want lists of desired data in real time. It is promptly fetched if at least one connected peer has the needed data. This ongoing exchange ensures that as soon as any peer acquires the sought-after data, it can instantly share it with those in need.

Shwap is designed to be synergetic with Bitswap, as that is the primary composition to be deployed in Celestia’s DA network. Bitswap provides the 1/N peers guarantee and can parallelize fetching across multiple peers. Both of these properties significantly contribute to Celestia’s efficient DAS protocol.

Bitswap runs over the libp2p stack, which provides QUIC transport integration. Subsequently, Shwap will benefit from features libp2p provides together with transport protocol advancements introduced in QUIC.

Multihashes and CIDs

Bitswap is tightly coupled with Multihash and CID notions, establishing the content addressability property. Bitswap operates over Blocks of data that are addressed and verified by CIDs. Based on that, Shwap integrates into Bitswap by complying with both of these interfaces. The Share Containers are Blocks that are identified via Share Identifiers.

Even though Shwap takes inspiration from content addressability, it breaks free from the hash-based model to optimize message sizes and data request patterns. In some way, it hacks into multihash abstraction to make it contain data that is not, in fact, a hash. Furthermore, the protocol does not include hash digests in the multihashes. The authentication of the messages happens using externally provided data commitment.

However, breaking-free from hashes creates issues necessary to be solved on the implementation level, particularly in the reference Golang implementation, if forking and substantially diverging from the upstream is not an option. CIDs are required to have fixed and deterministic sizes. Making share identifiers compliant with CID prevents protobuf usage due to its reliance on varints and dynamic byte arrays serialization.

The naive question would be: “Why not make content verification after Bitswap provided it back over its API?” Intuitively, this would simplify much and would not require “hacking” CID. However, this has an important downside - the Bitswap, in such a case, would consider the request finalized and the content as fetched and valid, sending a DONT_WANT message to its peers. In contrast, the message might still be invalid according to the verification rules.

Bitswap still requires multihashes and CID codecs to be registered. Therefore, we provide a table for the required share identifiers with their respective multihash and CID codec codes. This table should be extended whenever any new share identifier or new version of an existing identifier is added.

NameMultihashCodec
EdsID*N/AN/A
RowID0x78010x7800
SampleID0x78110x7810
RowNamespaceDataID0x78210x7820

*EdsID and container are excluded from Bitswap composition. Bitswap is limited to messages of size 256kb, while EDSes are expected to be bigger. Also, it is more efficient to parallelize EDS requesting by rows.

Blocks

Bitswap operates over IPFS blocks (not to mix with Celestia or other blockchain blocks). An IPFS block is a blob of arbitrary bytes addressed and identified with a CID. An IPFS block must have a CID encoded into it, s.t. the CID can either be computed by hashing the block or by extracting it out of the block data itself.

In order for the composition to work, Shwap has to comply with the block format and for this we introduce general adapter block type. As Shwap container identifiers are not hash-based and aren’t computable, we have to encode CIDs into the block adapter for the containers.

The block adapter is protobuf encoded with the following schema:

syntax = "proto3";

message Block {
  bytes cid = 1;
  bytes container = 2;
}

The fields with validity rules that form the Block are:

CID: A variable size byte array representing a CID. It MUST follow CIDv1 specification. The encoded multihash and codec codes inside of the CID MUST be from one of the registered IDs defined in the table.

Container: A variable size byte array representing a protobuf serialized Shwap Container. It MUST be of a type defined by multihash and codec in the CID field. It MUST be validated according to validity rules of the respective Shwap Container.

Backwards Compatibility

Shwap is incompatible with the old sampling protocol.

After rigorous investigation, the celestia-node team decided against implementing backward compatibility with the old protocol into the node client due to the immense complications it brings. Instead, the simple and time-efficient strategy is transiently deploying infrastructure for old and new versions, allowing network participants to migrate gradually to the latest version. We will first deprecate the old version, and once the majority has migrated, we will terminate the old infrastructure.

Considerations

Security

Shwap does not change the security model of Celestia’s Data Availability network and changes the underlying protocol for data retrieval.

Essentially, the network and its codebase get simplified and require less code and infrastructure to operate. This in turn decreases the amount of implementation vulnerabilities, DOS vectors, message amplification, and resource exhaustion attacks. However, new bugs may be introduced, as with any new protocol.

Protobuf Serialization

Protobuf is recommended used to serialize share containers. It is a widely adopted serialization format and is used within Celestia’s protocols. This was quite an obvious choice for consistency reasons, even though we could choose other more efficient and advanced formats like Cap’n Proto.

Constant-size Identifier Serialization

Share identifiers should be of a constant size according to the spec. This is needed to support Bitswap composition, which has an implementation level limitation that enforces constant size identifiers. Ideally, this should be avoided as Shwap aims to be protocol agnostic, and future iterations of Shwap may introduce dynamically sized identifiers if constant ever becomes problematic.

Sampling and Reconstruction

Shwap deliberately avoids specifying sampling and reconstruction logic. The sampling concerns on randomness selection and sample picking are out of Shwap’s scope and a matter of following CIPs. Shwap only provides messaging for sampling(via SampleID and Sample container).

Reference Implementation

Copyright and related rights waived via CC0.