Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDM with MMR proofs: 1 #2508

Merged
merged 10 commits into from
Feb 23, 2024
Merged

XDM with MMR proofs: 1 #2508

merged 10 commits into from
Feb 23, 2024

Conversation

vedhavyas
Copy link
Member

@vedhavyas vedhavyas commented Feb 5, 2024

This is the first of multiple PRs that updates XDM protocol to use MMR proofs for verification. This PR brings more if not all the type changes required for complete migration to MMR proofs. The goal was to introduce new types and add necessary infra to verify and generate the XDM proofs while the Domains are not yet enabled..

This PR also handles the verification part of the XDM but I did not update the tests yet since plan was to do focus on the infra.

Some notes:

  • Storage changes to pallet-domains
  • Host function updates and new additions for Consensus chain and Domains
  • Added a new crate for messenger host functions instead of sp-messenger due to cyclic depenedencies and changing that would be too invasive which I'm not quiet excited about yet. Once we have initial changes in, we can think of refactoring if need arises.

Overall, recommend going commit by commit for a better reviewing expereince

Removed an existing now redundant storage

Code contributor checklist:

@vedhavyas vedhavyas added execution Subspace execution need to audit This change needs to be audited labels Feb 5, 2024
@vedhavyas vedhavyas force-pushed the xdm_mmr branch 6 times, most recently from 2f2ace6 to 5ee10ec Compare February 6, 2024 08:01
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea here, but I do not believe host functions are necessary. Left some other minor comments and didn't review last 3 commits at all.

crates/pallet-domains/src/block_tree.rs Outdated Show resolved Hide resolved
crates/pallet-domains/src/lib.rs Outdated Show resolved Hide resolved
@@ -1231,9 +1233,10 @@ where

#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not about this PR, but tests should generally be in their own module. It is a better dev experience (library doesn't recompile when tests change) and cleaner overall.

domains/primitives/messenger/src/messages.rs Show resolved Hide resolved
domains/primitives/messenger/src/messages.rs Outdated Show resolved Hide resolved
opaque_leaf: EncodableOpaqueLeaf,
proof: Proof<mmr::Hash>,
) -> Option<Hash> {
let leaf: mmr::Leaf = opaque_leaf.into_opaque_leaf().try_decode()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it is written works and is correct, but I find placing types where they come from rather than making compiler infer them backwards easier to read, looks more linear:

Suggested change
let leaf: mmr::Leaf = opaque_leaf.into_opaque_leaf().try_decode()?;
let leaf = opaque_leaf.into_opaque_leaf().try_decode::<mmr::Leaf>()?;

There are also other places like this in the code.

This is similarly to let x = X::from(y) being more readable than let x: X = y.into(), just try to read it out loud, it is harder to do smoothly with .into(). In most cases : Type is not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that is fair. For me this approach always made a lot of sense from reading the perspective since I know what is the final type that is being assigned to the variable instead of going through till the end to know what this type is.

This is similarly to let x = X::from(y) being more readable than let x: X = y.into(),

This I agree since the position of the final type is same in both the scenarios and readability wise, its much more cleaner to read.

Having said that, I dont have a strong opinion but rather a personal preference

let leaf: MmrLeaf<ConsensusBlockNumber, ConsensusBlockHash> =
opaque_leaf.into_opaque_leaf().try_decode()?;
let state_root = leaf.state_root();
verify_mmr_proof(vec![EncodableOpaqueLeaf::from_leaf(&leaf)], proof.encode())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using Mmr::verify_leaves here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is domain runtime and they do not have access to Mmr::verify_leaves unlike Consensus chain runtime. Hence, we have the host functions to verify such proofs for domains

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not have right now, but what prevents us from including it as a dependency in domain runtime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely follow you here. I have explained above in other comment why we I chose to go this route of using host function to verify the proof

crates/sp-domains/src/lib.rs Outdated Show resolved Hide resolved
domains/pallets/messenger/src/benchmarking.rs Outdated Show resolved Hide resolved
Comment on lines +93 to +95
StorageKeyRequest::ConfirmedDomainBlockStorageKey(domain_id) => runtime_api
.confirmed_domain_block_storage_key(best_hash, domain_id)
.map(Some),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this host function either, comment is the same as for previous hash functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This host function dates back some time when we had a discussion on assumed storage keys where we were defining the mocked storages and deriving their storage keys to verify the storage proof.

Here in messenger, when a message comes from Domain_a to Domain_b, previously Domain_b was deriving the storage key from the its own storage. This made to have the pallet messenger definition same across all the different runtimes.

What this host function does here is, Domain_b uses the the Domain_a runtime from the consensus chain and then use its Stateless API to get the storage key. This approach does not hold us back with such strict assumptions where pallets are defined exactly the same across runtimes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the benefits of decoupling, but I'm confused about what this has to do with domains, this seems to call consensus client runtime API 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the domain_proof is a storage proof derived from LatestConfirmedDomainBlock on consensus runtime. We are using consensus runtime api to get that storage key while verifying the domain_proof and the use src domain runtime to fetch the Outbox and InboxResponse storage keys to complete the XDM message verification

Copy link
Member Author

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe host functions are necessary.

I tried to explain why the host functions are required over the comments itself but we can do a Sync on this PR specifically if it helps you

crates/pallet-domains/src/lib.rs Outdated Show resolved Hide resolved
crates/sp-domains/src/lib.rs Outdated Show resolved Hide resolved
@@ -56,4 +63,16 @@ where
extrinsics_root: H256::from_slice(header.extrinsics_root().as_ref()),
})
}

fn verify_mmr_proof(&self, leaves: Vec<EncodableOpaqueLeaf>, encoded_proof: Vec<u8>) -> bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why host function is needed for proof generation, but why is it needed for proof verification and why for proof verification you're calling runtime API anyway?

Okay I see the confusion here. I have decided to re-use this implementation of extension for Domain host function. See this definition DomainMmrRuntimeInterface. The proof verification for consensus chain is directly done on the consensus chain itself. But the proof verification for Domains on the other hand would require host function that uses consensus chain runtime_api to do such verification.

@@ -56,4 +63,16 @@ where
extrinsics_root: H256::from_slice(header.extrinsics_root().as_ref()),
})
}

fn verify_mmr_proof(&self, leaves: Vec<EncodableOpaqueLeaf>, encoded_proof: Vec<u8>) -> bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, you're requesting something about best block, which may not even belong to the same fork as the block that made host function call.

For MMR, the proof is strictly self contained in the sense we do not need to use the same consensus chain hash that generated the proof. The reason for such is the proof hold the total leaf count at the time of proof generation. So while verifying, the mmr root is calculated with that leaf count since it is strictly appending. Also, the leaves before a block is finalized will be stored in fork aware fashion and runtime picks what it assumes to be the canonical fork while verifying. So proof will either be valid or invalid.

opaque_leaf: EncodableOpaqueLeaf,
proof: Proof<mmr::Hash>,
) -> Option<Hash> {
let leaf: mmr::Leaf = opaque_leaf.into_opaque_leaf().try_decode()?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that is fair. For me this approach always made a lot of sense from reading the perspective since I know what is the final type that is being assigned to the variable instead of going through till the end to know what this type is.

This is similarly to let x = X::from(y) being more readable than let x: X = y.into(),

This I agree since the position of the final type is same in both the scenarios and readability wise, its much more cleaner to read.

Having said that, I dont have a strong opinion but rather a personal preference

Comment on lines +93 to +95
StorageKeyRequest::ConfirmedDomainBlockStorageKey(domain_id) => runtime_api
.confirmed_domain_block_storage_key(best_hash, domain_id)
.map(Some),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This host function dates back some time when we had a discussion on assumed storage keys where we were defining the mocked storages and deriving their storage keys to verify the storage proof.

Here in messenger, when a message comes from Domain_a to Domain_b, previously Domain_b was deriving the storage key from the its own storage. This made to have the pallet messenger definition same across all the different runtimes.

What this host function does here is, Domain_b uses the the Domain_a runtime from the consensus chain and then use its Stateless API to get the storage key. This approach does not hold us back with such strict assumptions where pallets are defined exactly the same across runtimes

domains/primitives/messenger/src/messages.rs Show resolved Hide resolved
domains/primitives/messenger/src/messages.rs Outdated Show resolved Hide resolved
domains/primitives/messenger/src/messages.rs Outdated Show resolved Hide resolved
let leaf: MmrLeaf<ConsensusBlockNumber, ConsensusBlockHash> =
opaque_leaf.into_opaque_leaf().try_decode()?;
let state_root = leaf.state_root();
verify_mmr_proof(vec![EncodableOpaqueLeaf::from_leaf(&leaf)], proof.encode())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is domain runtime and they do not have access to Mmr::verify_leaves unlike Consensus chain runtime. Hence, we have the host functions to verify such proofs for domains

@vedhavyas
Copy link
Member Author

Seems like this PR might have broken the operator tests though I'm not sure if that is the case yet. For now cancelling these CI runs until I can figure out what is what

@vedhavyas
Copy link
Member Author

@nazar-pc I was just verifying how the state_pruning and block_pruning works live with @jfrank-summit. This is currently jeremy's setup

state_pruning: archive-canonical
block_pruning: 256

We decided to pick block #10 on gemini-3h and as expected, the state for block #10 was available but block body was not available. Header was still available and this reminds me of this issue from you paritytech/substrate#14758 and since shamil is handling the pruning of headers as well, so reference implementation will have a growing size of state from each state transition. I'm assuming this was unintentional but I was under the assumption that it was indeed intentional.

Now coming to the crust of the PR and our discussion earlier, we can make the mmr verification stateless by storing the mmr roots on the consensus runtime and prune them on a rolling window, like you pointed out there are some edge cases when it comes to domains where they would need to access such mmr roots to verify these proofs which might not be available when they are required if for example domain halted for long enough period while consensus chain continued to progress.

Overall, looks like we would need some form of time limit for XDM by which they should included but if they are not, then that particular channel is completely halted. There will be way to unhalt it by submitting some extension proofs but have not discussed or spec'ed this out yet.

I'll try to put some more points again looking at other options but at the moment, I feel the usage of host functions to fetch the offchain data for verification seems more viable.

@nazar-pc
Copy link
Member

nazar-pc commented Feb 6, 2024

We decided to pick block #10 on gemini-3h and as expected, the state for block #10 was available but block body was not available

Hm... this is very problematic then 😞 We'll need to fix it. And this also explains a lot, thanks!

I'm assuming this was unintentional but I was under the assumption that it was indeed intentional.

This was not intentional, but there is no API in Substrate to deliberately prune things we want to prune, rather than everything at a fixed distance from the tip of the chain.

Now coming to the crust of the PR and our discussion earlier, we can make the mmr verification stateless by storing the mmr roots on the consensus runtime and prune them on a rolling window, like you pointed out there are some edge cases when it comes to domains where they would need to access such mmr roots to verify these proofs which might not be available when they are required if for example domain halted for long enough period while consensus chain continued to progress.

Overall, looks like we would need some form of time limit for XDM by which they should included but if they are not, then that particular channel is completely halted. There will be way to unhalt it by submitting some extension proofs but have not discussed or spec'ed this out yet.

I'll try to put some more points again looking at other options but at the moment, I feel the usage of host functions to fetch the offchain data for verification seems more viable.

I think time limit is necessary either way due to the fact that we are pruning MMR data in offchain storage anyway and we will prune state once we can as well.

@NingLin-P
Copy link
Member

NingLin-P commented Feb 6, 2024

I have spent some time to understand how MMR proof generation and verification work in more detail and catch up with the above discussion.

I think host function is inevitable for verifying XDM on the domain, because the dest domain has to verify the XDM comes from a confirmed block of the src domain, and only the consensus chain have knowledge about whether a domain block (or ER) is confirmed thus domain have to query from the consensus chain when verifying the XDM.

we can make the mmr verification stateless by storing the mmr roots on the consensus runtime and prune them on a rolling window

I'm not sure the whole idea of storing multiple MMR roots on the consensus runtime but it feels like we can archive the same by using an old consensus hash (instead of the best hash) to call the consensus runtime API and get the expected MMR root.

@vedhavyas
Copy link
Member Author

I'm not sure the whole idea of storing multiple MMR roots on the consensus runtime but it feels like we can archive the same by using an old consensus hash (instead of the best hash) to call the consensus runtime API and get the expected MMR root.

You are making an assumption here that there exists the state for such old hash for every consensus node that is currently importing / doing state transition. This is not required to be true and current state_pruning being either archive/archive-canonical just happens to be unintentional change like we discussed above.

I think time limit is necessary either way due to the fact that we are pruning MMR data in offchain storage anyway and we will prune state once we can as well.

True but we can make this change specifically for MMR offchain rather than relying on state to be available since this would be self-contained unlike dependency on state to be available

These todos are cleared in the next coming commits
@vedhavyas
Copy link
Member Author

@nazar-pc while tracking MMR roots might be one possible solution, doing such introduces bunch of edge cases, some we already know and some we might not have known yet, and also defeats the purpose of introducing MMR since if we are tracking those roots, then we would rather just track state_roots instead and completely eliminate MMR for XDM. Personally, not super excited about this though.

As for the expiration time for XDM, it would still remain the same and we just need to prune the offchain leaves. I initially thought it would be invasive but it actually quite straightforward to prune a leaf and this is self contained would not need to relay on state or block body for any verification.

I have updated the PR with review feedback earlier. PTAL and let me know your thoughts ?

@vedhavyas vedhavyas requested a review from nazar-pc February 14, 2024 07:22
@nazar-pc
Copy link
Member

@nazar-pc while tracking MMR roots might be one possible solution, doing such introduces bunch of edge cases, some we already know and some we might not have known yet, and also defeats the purpose of introducing MMR since if we are tracking those roots, then we would rather just track state_roots instead and completely eliminate MMR for XDM. Personally, not super excited about this though.

I believe MMR is useful beyond immediate need of XDM. While I also think storing roots in the runtime is preferred comparing to the client (might allow for some interesting proofs in the future), it is not a blocker.

Please don't let this PR be blocked by me, @NingLin-P can you take a look here, please?

@vedhavyas
Copy link
Member Author

Conflicts are not a blocker for PR review. Will fix conflicts when in a moment. Thanks!

nazar-pc
nazar-pc previously approved these changes Feb 22, 2024
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense overall, but the way force pushes were done after initial review with zero explanation (especially the first one) makes it really hard to understand what has actually changed since last time I reviewed earlier version of this PR.

sp-transaction-pool = { version = "4.0.0-dev", git = "https://github.com/subspace/polkadot-sdk", rev = "d6b500960579d73c43fc4ef550b703acfa61c4c8" }
subspace-core-primitives = { version = "0.1.0", path = "../../crates/subspace-core-primitives" }
subspace-runtime-primitives = { version = "0.1.0", path = "../../crates/subspace-runtime-primitives" }
subspace-service = { version = "0.1.0", path = "../../crates/subspace-service" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like this change. This adds a huge subspace-service to domains-service significantly reducing compilation parallelism and likely increasing compilation time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is fair. If not this, then I would have to introduce another crate just for Domain specific host functions and extension factory. Did not feel justified enough but if that is preferrable, I can do that instead 👍🏼

NingLin-P
NingLin-P previously approved these changes Feb 22, 2024
Copy link
Member Author

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way force pushes were done after initial review with zero explanation (especially the first one) makes it really hard to understand what has actually changed since last time I reviewed earlier version of this PR

Apologies for no explanation. I picked out some commits, specifically first one, in a seperate PR that was already landed in main. Github's diff was all messed up when I initially went with a merge so I ended rebasing it for a cleaner diff while taking the opportunity to address the feedback provided earlier.

sp-transaction-pool = { version = "4.0.0-dev", git = "https://github.com/subspace/polkadot-sdk", rev = "d6b500960579d73c43fc4ef550b703acfa61c4c8" }
subspace-core-primitives = { version = "0.1.0", path = "../../crates/subspace-core-primitives" }
subspace-runtime-primitives = { version = "0.1.0", path = "../../crates/subspace-runtime-primitives" }
subspace-service = { version = "0.1.0", path = "../../crates/subspace-service" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is fair. If not this, then I would have to introduce another crate just for Domain specific host functions and extension factory. Did not feel justified enough but if that is preferrable, I can do that instead 👍🏼

@vedhavyas vedhavyas dismissed stale reviews from NingLin-P and nazar-pc via e4d9d28 February 23, 2024 04:49
@nazar-pc
Copy link
Member

Apologies for no explanation. I picked out some commits, specifically first one, in a seperate PR that was already landed in main. Github's diff was all messed up when I initially went with a merge so I ended rebasing it for a cleaner diff while taking the opportunity to address the feedback provided earlier.

It was typically done in the service file, but I see that you included it in subspace-service because it is needed there as well, not just in domain's service. Separate crate works for me.

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense, but I'd like to avoid re-exports both in subspace-service and in domains-service.

@vedhavyas
Copy link
Member Author

The changes make sense, but I'd like to avoid re-exports both in subspace-service and in domains-service.

I'm not sure why that is preferable here since this is mono-repo crate and would not have any issues with semvar bumps unlike the external crate but I don't have a strong stance on this. Should be updated now

@vedhavyas vedhavyas requested a review from nazar-pc February 23, 2024 12:51
@vedhavyas vedhavyas enabled auto-merge February 23, 2024 12:51
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why that is preferable here since this is mono-repo crate and would not have any issues with semvar bumps unlike the external crate but I don't have a strong stance on this. Should be updated now

That way it is clearer what are explicit dependencies, also public re-exports sometimes result in dependencies that remain even when no longer used simply due to public re-export. IMO it is a good default, the only exception I'm aware of in our repo is the whole libp2p re-exported from subspace-networking, but arguably it makes a lot of sense there.

@vedhavyas vedhavyas added this pull request to the merge queue Feb 23, 2024
Merged via the queue into main with commit ab65eba Feb 23, 2024
11 checks passed
@vedhavyas vedhavyas deleted the xdm_mmr branch February 23, 2024 14:03
BlockInfo {
block_number: number,
block_hash: hash,
// TODO: Derive correct domain proof
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there tracking for this TODO? please put a link if so. if not - well there should be :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanhauser-thc
Copy link
Collaborator

LGTM on the surface but we will review XDM once the implementation is stable.

@vanhauser-thc vanhauser-thc added audited This change was audited and removed need to audit This change needs to be audited labels Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audited This change was audited execution Subspace execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants