-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: Shwap #3205
base: main
Are you sure you want to change the base?
spec: Shwap #3205
Conversation
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3205 +/- ##
==========================================
- Coverage 51.92% 51.59% -0.34%
==========================================
Files 178 178
Lines 11316 11316
==========================================
- Hits 5876 5838 -38
- Misses 4942 4971 +29
- Partials 498 507 +9 ☔ View full report in Codecov by Sentry. |
specs/src/shwap/spec.md
Outdated
Identifiers are embeddable to narrow down to the needed content. (TODO: Describe better) | ||
|
||
Identifiers MUST have a fixed size for their fields. Subsequently, protobuf can't be used for CID serialization due to | ||
varint usage. Instead, identifiers use simple binary big endian serialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your current Shwap PR little endian is used. We also implemented it with little endian. Which one should we use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will use the big endian. I did a slight research on that and found out that its is used in IP and QUIC with rationale that its just faster on most architectures, so decided to follow that here.
what would you say for enforcing the |
@zvolin, I find it unnecessary, but might missing some context from other languages. |
so it usually affects the namespace in which the generated types will be. In our case we are generating and exposing all protos in a single crate ( |
specs/src/shwap/spec.md
Outdated
### Bitswap CID integration | ||
|
||
The naive question would be: "Why not to make content verification after Bitswap provided it back over its API?" | ||
Intuitively, this would simplify a lot and wouldn't require "hacking" CID. However, this has an important downside - | ||
the Bitswap in such case would consider the content as fetched and valid, sending DONT_WANT message to its peers, while | ||
the message might be invalid according to the verification rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the verification is done in the Bitswap level, the only way knowing that data sampling failed and block should be rejected, is to have a timeout. So, how many seconds the timeout should be? Should we define it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought. I think this timeout is out of Shwap's scope. The actual sampling logic - how, how many samples are selected, and what the timeout for sampling deserves as separate spec/cip that depends on Shwap protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shwap is a transport for specialized DA messaging. The sampling logic/details are out of its scope
@zvolin, my hesitation with Making all the proto follow the same package forces all the types to be generated in a single place, like what the core and lumina are doing. Node on the other hand is more modular and we keep proto-types together with the protocols, where each protocol is its own independent module/package. This is an example of an architecture decision that implementers are free to make and enforcing package structure through It's nice to be consistent and I see why consistency is good to follow in general, but not in this case. |
specs/src/shwap/spec.md
Outdated
The current Data Availability Sampling (DAS) network protocol is inefficient. A _single_ sample operation takes log2(k) network | ||
round-trips(where k is the square size). This is not practical and does not scale for the theoretically unlimited data | ||
square that the Celestia network enables. The main motive here is a protocol with O(1) round-trip for _multiple_ samples, preserving | ||
the assumption of having 1/n honest peers connected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "1/n" means that there needs to be at least one peer out of N, or something else?
Also, clarification is needed on whether peers must be both honest and possess the requested data shares. For example, an honest light node might lack the relevant data shares, impacting sampling efficiency. Clarifying these points would strengthen the protocol's specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "1/n" means that there needs to be at least one peer out of N, or something else?
1 honest peer.
Also, clarification is needed on whether peers must be both honest and possess the requested data shares
1/n is a pretty common assumption in p2p networks. In this context honesty assumes possession. Nevertheless, clarifying it makes. Thanks for pointing this out.
specs/src/shwap/spec.md
Outdated
### Multihashes and CID | ||
|
||
Shwap takes inspiration from content addressability, but breaks-free from hash-based only model to optimize message sizes | ||
and data request patterns. In some way, it hacks into multihash abstraction to make it contain data that isn't in fact a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and data request patterns. In some way, it hacks into multihash abstraction to make it contain data that isn't in fact a | |
and data request patterns. In some way, it hacks into multihash abstraction to make it contain arbitrary message fields that isn't in fact a |
We might want to be more specific here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this change anything to message I tried to convery. Arbitrary message fields are also quite general and can mean anything.
Maybe something like "identification data" may improve it
specs/src/shwap/spec.md
Outdated
|
||
Identifiers are embeddable to narrow down to the needed content. (TODO: Describe better) | ||
|
||
Identifiers MUST have a fixed size for their fields. Subsequently, protobuf can't be used for CID serialization due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is worth to explain that the reason why Identifiers MUST have a fixed size is bitswap internal architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I really would like not to have bitswap as a rationale for this, as this breaks Shwap protocol independence. Ideally, we would find a way to avoid this limitation until the CIP is finalized or find another rationale for why this is necessary.
specs/src/shwap/spec.md
Outdated
| Name | Multihash | Codec | | ||
|----------|-----------|--------| | ||
| RowID | 0x7811 | 0x7810 | | ||
| SampleID | 0x7801 | 0x7800 | | ||
| DataID | 0x7821 | 0x7820 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for nice aligment, why not those not ordered by desc multihash/codec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I wanted to keep the order of IDs as defined in the spec and I didn't want to break what was already implemented, but as we break other things, it may not hurt to break this one.
specs/src/shwap/spec.md
Outdated
|
||
Shwap takes inspiration from content addressability, but breaks-free from hash-based only model to optimize message sizes | ||
and data request patterns. In some way, it hacks into multihash abstraction to make it contain data that isn't in fact a | ||
hash. Furthermore, the protocol does not include hash digests in the multihashes. The authentication of the messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash. Furthermore, the protocol does not include hash digests in the multihashes. The authentication of the messages | |
hash. Furthermore, the protocol does not include hash digests in the multihashes. The verification of the messages |
My understanding is that authentication is verifying who provided data, rather than content in response. What do you think to replace authentication -> verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both terms work in this case. I've seen data/message authentication used to describe data that is not being tampered with and is coming from the right source. In this case, we authenticate messages against DAH, which assures us the data is correct and is coming from/produced by consensus/validators
Noting that spec fully migrated to celestiaorg/CIPs#77, so all the comments and discussions should move there as well. Check the description for more on new changes here |
@@ -1 +1,2 @@ | |||
book | |||
theme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rootulp, the theme dir can be similarly .gitignored in the app's repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Created celestiaorg/celestia-app#3151
Introduces Shwap page that points to the CIP. This PR integrates it nicely with mdbook engine like it is a local document in the node. It integrates multiple mdbook preprocessors into rendering pipeline and GH CI.
Rendered on GH pages