Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily process bad ER after fraud proof is submitted #2560

Merged
merged 7 commits into from
Mar 4, 2024

Conversation

NingLin-P
Copy link
Member

close #2538

This PR implements #2538, it contains changes of:

  • Process only the first bad ER submit_fraud_proof and lazily process other descendant bad ERs in submit_bundle
  • Introduce a PendingSlash operator status and reject the bundle/staking request of the operator with such status
  • Reject the fraud proof that targetting the bad ER that is already reported by a previous fraud proof and is under lazily processing (i.e. the descendant bad ER)
  • Reject the switch_domain request if the operator has unconfirmed ER in the current domain
    • This is not mentioned in Lazily process bad ER after fraud proof is submitted #2538 but is necessary, because the LatestSubmittedER that we used to determine the pending slash operator is indexed by (domain_id, operator_id) and can only check if the operator has submitted bad ER in the current domain, if the operator switch domain then we can't slash the operator for bad ER it submitted in the previous domain.

Code contributor checklist:

…s the descendants ER

Signed-off-by: linning <linningde25@gmail.com>
…er submitted by an operator for a specific domain

Signed-off-by: linning <linningde25@gmail.com>
…ending to slash

Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
Error::OperatorNotRegistered
);

// Reject switching domain if there is unconfirmed ER submitted by this operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC to be able to switch the operator has to basically go offline (stop producing bundles, but still stake) for a whole challenge period, and then come online, submit switch extrinsic and go offline again until it's processed?
If so, that is quite subpar. I imagine, ideally, we would accept the switch extrinsic, "freeze" the operator stake on the current domain, so they don't produce new bundles and don't reduce other operators' chances and transfer the stake after their last block is out of challenge period.
This said, it can be handled in a separate issue.

Copy link
Member

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the changes but left some questions/nits

// Construct and add a new domain block to the block tree
let er_hash = execution_receipt.hash::<DomainHashingFor<T>>();
let domain_block_number = execution_receipt.domain_block_number;

ensure!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check feels unnecessary since the codepath ensure the Node is pruned and this is not exposed outside of this crate. No strong opinion though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is unnecessary but just try to make it robust in case any accidental change breaks the assumption.

crates/pallet-domains/src/block_tree.rs Outdated Show resolved Hide resolved
/// not assigned to this field directlt, thus MUST use the `status()` method to query the status
/// instead.
/// TODO: update the filed to `_status` to avoid accidental access in next network reset
status: OperatorStatus<DomainBlockNumber>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why leave a TODO to do it later. Can't we not update the var name here already ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid migration code to keep compatible with 3h


#[cfg(test)]
impl<Balance: Zero, Share: Zero, DomainBlockNumber> Operator<Balance, Share, DomainBlockNumber> {
pub(crate) fn dummy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLease do not leave such conditional compiled code everytwhere. We seem to do this all the time. Instead define the necessary dummy function where it is used instead. Would also appreciate if you remove the other instances that we created for benchmarks too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead define the necessary dummy function where it is used instead.

The reason to introduce fn dummy in this PR is the Operator:: status field is changed to private now, thus we have to define the func here.

Would also appreciate if you remove the other instances that we created for benchmarks too.

Sure, will do that in the incoming PR for updating benchmark result of submit_fraud_proof.

// Reject switching domain if there is unconfirmed ER submitted by this operator
// on the `current_domain_id`
ensure!(
!LatestSubmittedER::<T>::contains_key((operator.current_domain_id, operator_id)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reason why we do not want to allow switching the operators? If the operator continues to produce the block, then essentially, they will never be able to switch unless they go offline for a good period like daria mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this further, there are two things we can do.

  • Get operators for all unconfirmed ERs, either through Host function or through storage and then slash them immediately. We still prune the bad ers lazily though
  • Or completely remove switching feature between domains

@jfrank-summit i'm leaning towards removing this switching the domains since with this change, its no different from operator deregistring, waiting and then registering to new domain unless we have a hard requirement to have it though, then we need to do what I suggested in the first point of slashing all operators immediately so that Lazy slashing wont be necessary

Copy link
Member Author

@NingLin-P NingLin-P Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reason why we do not want to allow switching the operators?

We are using the LatestSubmittedER to check if an operator is pending to slash, LatestSubmittedER is indexed by (domain_id, operator_id) we can only check if the operator has submitted bad ER in the current domain but can't know if the operator has submitted bad ER in the previous domain.

I planned to handle the switching domain in a separate PR after this one is merged with what Daria suggested in #2560 (comment), but if we plan to remove switching domain then we are all good and it will be great.

then we need to do what I suggested in the first point of slashing all operators immediately so that Lazy slashing wont be necessary

To slash operator immediately we need to iterate at most 14_400 number of ERs, duplicate the operator, and then slash all of them. It will still cause a lot of weight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with removing domain switching for operators.

…mains

Signed-off-by: linning <linningde25@gmail.com>
switch_domain is not supported currently due to incompatible with lazily slashing, this
commit only remove the entry of switch_domain and comment out related tests/benchmark to
avoid invasive change to the rest of the code and also we may support it in the future.

Signed-off-by: linning <linningde25@gmail.com>
@NingLin-P NingLin-P added this pull request to the merge queue Mar 4, 2024
Merged via the queue into main with commit e4bf312 Mar 4, 2024
9 checks passed
@NingLin-P NingLin-P deleted the lazy-handling-bad-er branch March 4, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lazily process bad ER after fraud proof is submitted
4 participants