-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow second sync committee message to propagate if first one is stale #4015
base: dev
Are you sure you want to change the base?
Conversation
The sync committee message signature does not sign over the slot, making it possible for stale messages to be adjusted and replayed, possibly preventing new sync committee messages to propagate. If such a message for an older head is received, allow a second message to propagate that refers to the current head as selected by fork choice, overriding the previous one. Also add a couple sanity checks, as in, disallowing messages referring to unknown or invalid blocks to spread, in line with existing rules for blob sidecars, parent block roots and attestations.
Implemented in Nimbus: status-im/nimbus-eth2#4953 |
- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index` | ||
- _[IGNORE]_ The block being signed (`sync_committee_message.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee messages for processing once block is retrieved). | ||
- _[REJECT]_ The block being signed (`sync_committee_message.beacon_block_root`) passes validation. | ||
- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`, unless the block being signed (`beacon_block_root`) matches the local head as selected by fork choice, and the earlier valid sync committee message does not match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about slightly different rule? Unless the beacon_block_root
refers to a block with a slot greater than the one that has already been seen. This could remove the dependency on the local view to the head of the chain and make it more resilient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that allow spamming, i.e., by sending a message with slot 5, then 6, then 7, then 8.
And in some reorgs, the parent block of the next block may have a lower slot than the highest one observed across all branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that allow spamming
There is the block.slot
that should increase to avoid discarding the message, not the message.slot
. I don’t think it has any room for spamming as the beacon_block_root
is signed.
And in some reorgs, the parent block of the next block may have a lower slot than the highest one observed across all branches.
Right, but in this case the next block should have a greater slot. There are edge cases when re-org happens to a block with a lower slot than the slot of the previous head, but this should be a rare case on mainnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, block.slot
would work there as the spam protection as well. The edge cases with reorg to a lower parent slot would get worse, though, compared to proposed solution. It's also still depending on the local view of the chain as a whole, as in, one still needs to have the block available for the second message to propagate. From a reward perspective, the only message that has any value is one that signs the correct parent root of the next slot's block.
I guess to decide which strategy is best, maybe @potuz ? Maybe there's an even better way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no, the spam problem is still there. Someone could go extreme and sign beacon block from 2 weeks ago with message.slot
set to the current slot. and then send another message that signs the beacon block 1 after that. and so on, until the present. and all of those messages would propagate, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split view is not really prevented though, as long as the fix incorporates that the block header being signed has to be known. The honest signatures should only come ~4s into the slot (or at the very least after the block, similar to attestations, there are incentives to vote for the most likely head as selected by forkchoice); so at that stage fork choice is already done.
btw, as your suggestion doesn't support the reorg to lower slot case, couldn't it be simplified to, instead of requiring monotonical increase, just allow a second message to sign a block header for the current slot? that would fix the spam potential but keep the edge case problem open.
If one wants to get rid of fork choice dependency, the rules could also be changed to allow a vote for a single additional unknown/invalid block root to propagate. i.e., allow an initial message to spread (whatever), potentially allow a second message to spread if it refers to an unknown/invalid block root (to cover the split view case), and allow a final message to spread if it pertains to the head root (as selected by forkchoice) as proposed (to cover reorg to lower slot etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose, there is a block A
with two children B
and C
, C.slot = B.slot + 1
. C
is timely only for a half of nodes and weights are distributed in a way that C
is the winner for those who has received it in time (proposer_boost_weight > get_weight(B)
), so a half of sync committee participants sends committee message with C
as the head and another half with B
. Both block are eventually seen and correct but nodes with head=B
can’t propagate the messages with beacon_block_root = htr(C)
and vice versa.
doesn't support the reorg to lower slot case, couldn't it be simplified to, instead of requiring monotonical increase, just allow a second message to sign a block header for the current slot
It’s not strictly the same as validator could send a previous message for a block in slot N-2
and now the head is the block from slot N-1
. In this case monotonically increasing constraint would allow the second message while same slot rule would not.
I think ideally there should be a SignedSyncCommitteeMessage
to prevent slot
tampering. So the original problem is resolved without a workaround, as it seems any workaround has its own corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. btw the workaround only triggers if there is active slot
tampering going on. In any case, the initial message per validator propagates regardless of restrictions.
Your example could be mitigated by allowing in each slot for (1) the first message to spread, (2) a message pertaining to an unknown root to spread, (3) a message pertaining to the local head to spread.
Your case is essentially triggering (2), as the local node does not know the slot of the signed root at the time it receives the sync message in that scenario. So at that stage, it would result in IGNORE and the message is already stopped before the blocks end up being seen. And IGNORE is cached on the libp2p message-id, so the message still won't propagate lateron once the block is seen.
The problem with (2) is, that one has to also cache a list of all unknown roots that have been signed. Otherwise, if there is an old orphaned block somewhere that is not available in local DB and also is no longer available on the network, an attacker could just replay that outdated message with tampered slot
. So, the attacker could waste both extra messages with slot
tampering (the initial one for free, and the second one by tampering a sig for an unknown orphaned root) and then the correct message may still not propagate (as it does not pertain to the local head).
Another interesting case is checkpoint sync, where the old data may be locally unavailable even if canonical; but that's at least not affecting a large count of validators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. There seem to be edge cases in whatever solution, but the proposed solution is a strict improvement regardless. So if we can’t have validator to sign an envelope to prevent slot
tampering then verifying a block against the local head is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I mean, we can fix it in a future fork, but it still needs to be fixed / documented for current mainnet forks regardless, and it's probably good enough with one of the proposed fixes as it sufficiently disincentivizes the slot
tampering attack, at least until sync committees are removed with beam anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! But I’d like to see approves from other CL devs before merging
specs/altair/p2p-interface.md
Outdated
@@ -161,7 +163,9 @@ The following validations MUST pass before forwarding the `sync_committee_messag | |||
- _[IGNORE]_ The message's slot is for the current slot (with a `MAXIMUM_GOSSIP_CLOCK_DISPARITY` allowance), i.e. `sync_committee_message.slot == current_slot`. | |||
- _[REJECT]_ The `subnet_id` is valid for the given validator, i.e. `subnet_id in compute_subnets_for_sync_committee(state, sync_committee_message.validator_index)`. | |||
Note this validation implies the validator is part of the broader current sync committee along with the correct subcommittee. | |||
- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index` | |||
- _[IGNORE]_ The block being signed (`sync_committee_message.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee messages for processing once block is retrieved). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably pedantic, but via both gossip and non-gossip
i think more accurately would be reflected as via gossip or non-gossip sources
I can't imagine a case where we'd ever want fetching every block that we receive over gossip also over RPC which this currently kind of implies...
This is also repeated above, I think we just change it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i'll raise a separate PR for that, I still think it's misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the other instances in #4049
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, have synchronized the wording to match the one from dev
.
specs/altair/p2p-interface.md
Outdated
@@ -140,6 +140,8 @@ def get_sync_subcommittee_pubkeys(state: BeaconState, subcommittee_index: uint64 | |||
- _[REJECT]_ `contribution_and_proof.selection_proof` selects the validator as an aggregator for the slot -- i.e. `is_sync_committee_aggregator(contribution_and_proof.selection_proof)` returns `True`. | |||
- _[REJECT]_ The aggregator's validator index is in the declared subcommittee of the current sync committee -- | |||
i.e. `state.validators[contribution_and_proof.aggregator_index].pubkey in get_sync_subcommittee_pubkeys(state, contribution.subcommittee_index)`. | |||
- _[IGNORE]_ The block being signed (`contribution_and_proof.contribution.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee contributions for processing once block is received) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about adding this in as a condition. It's not uncommon to see a fair bit of spread around the time that a block is received across different beacon nodes on the same server, let alone across multiple servers. Generating and signing sync committee messages is a pretty quick process, so if a beacon node discards these messages because it has yet to see the relevant block it could well be failing to hold on to a correct message.
How about if the beacon node will only ignore the message only if it has seen a block for the current slot? That gives protection against this common situation. If there is a concern about flooding the network with bogus messages by slower nodes then limit it to a single unknown beacon block root per beacon node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same condition as for attestations, aggregates, and EIP-7732 payloads / attestations.
consensus-specs/specs/phase0/p2p-interface.md
Lines 464 to 465 in 001caab
- _[IGNORE]_ The block being voted for (`attestation.data.beacon_block_root`) has been seen (via both gossip and non-gossip sources)
In practice, sync messages and attestations are sent around ~4s into the slot, and sync contributions and aggregates are sent around ~8s into the slot.
If the condition is a problem here, it is also a problem for attestations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is not the specific time that sync messages show up, it's that there can be a fair spread of time over which some nodes have the current slot's block and others do not.
From my understanding, at least some of the beacon nodes queue attestations when they do not have the block for the current slot rather than discard them (not sure if they forward them or not, that's another question). But having nodes ignore sync committee messages for the current slot when the node doesn't yet have the current slot's block doesn't seem to be an optimal solution to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is only that the processing should be done consistently for attestations as well as sync messages. If this is a problem for sync messages, it is also a problem for attestations.
For attestations, the current spec wording has been in place since phase0. Implementations for example ensure that attestations are delayed a bit to ensure block propagation; and they use unknown roots as a trigger to req/resp the unknown block. Sync messages follow the same code path, as in, are practically sent with a delay after processing the block, or at 4s into the slot. It could be that the wording for attestations also does not properly reflect the reality; but the PR here solely ensures consistent handling of the two messages (attestations / sync messages) that sign over the head, it does not introduce new problems that may or may not be present with the existing logic.
The sync committee message signature does not sign over the slot, making it possible for stale messages to be adjusted and replayed, possibly preventing new sync committee messages to propagate. If such a message for an older head is received, allow a second message to propagate that refers to the current head as selected by fork choice, overriding the previous one.
Also add a couple sanity checks, as in, disallowing messages referring to unknown or invalid blocks to spread, in line with existing rules for blob sidecars, parent block roots and attestations.