You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 23, 2024. It is now read-only.
SignCTRL automatically shuts itself down if the height contained in the SignVoteRequest/SignProposalRequest differs too much ({threshold}+1 to be precise) from the height persisted in the signctrl_state.json file. This is a safety measure in order to prevent validators from starting on a rank, that might have become outdated while they were unavailable and missed blocks. This means that recovering a node that has shut itself down requires some manual intervention.
Goal
The goal is to reduce manual intervention as much as possible. SignCTRL should be able to figure out by itself whether its last persisted rank is still up to date or needs updating. So, SignCTRL would have to look into each block's commit signatures it missed and update its rank accordingly.
Implementation
Here's my idea on how to implement this:
If SignCTRL gets a sign request, it should check whether the difference between its last received rank and the currently requested rank is greater than 1.
Example
The last sign request SignCTRL received was for height 5.
SignCTRL receives a new sign request from the validator for height 6.
In this case, requested_height - last_height > 1 (6-5 = 1 > 1) is false, so no blocks were missed.
Now, should the validator become unavailable for some reason, and thus miss blocks, SignCTRL notices this.
Example
The last sign request SignCTRL received was for height 6.
SignCTRL receives a new sign request from the validator for height 10.
In this case, requested_height - last_height > 1 (10-6 = 4 > 1) is true, so at least one block was missed.
It will then have to check the blocks it missed for the validator entity's signature and update its rank accordingly. Instead of checking each and every block, the process can be sped up significantly by checking only every {threshold}th block, and then checking the range between block height {threshold}-2 and {threshold}+2.
Example
If the last sign request was from height 6, and the latest sign request is for height 15, that means that blocks 7-15 were missed, so 9 in total.
Let's say we have a threshold of 3 missed blocks. SignCTRL now has to look into every third block from its current height, so 9, 12 and 15.
SignCTRL notices that block 12 was missed. It now needs to check blocks 10-14 and see if the two blocks before or after block 9 were also missed. If so, it needs to update its counter for blocks missed in a row and update its rank.
There is an edge case, where there would be so many blocks missed by that one node, that the time it would take to go back and check each and every one of them exceeds the block time, which essentially means that a new block is created before SignCTRL is done checking the previous blocks.
Example
The last sign request SignCTRL received was for height 11.
SignCTRL receives a new sign request from the validator for height 1,000.
While SignCTRL checks blocks 12-1,000, a bunch of new sign requests have come in, from height 1,001 to 1,020.
So, once SignCTRL is done checking one set of missed blocks, it needs to check if any additional sign requests have come in and check those too.
The text was updated successfully, but these errors were encountered:
Current Status
SignCTRL automatically shuts itself down if the height contained in the
SignVoteRequest/SignProposalRequest
differs too much ({threshold}+1
to be precise) from the height persisted in thesignctrl_state.json
file. This is a safety measure in order to prevent validators from starting on a rank, that might have become outdated while they were unavailable and missed blocks. This means that recovering a node that has shut itself down requires some manual intervention.Goal
The goal is to reduce manual intervention as much as possible. SignCTRL should be able to figure out by itself whether its last persisted rank is still up to date or needs updating. So, SignCTRL would have to look into each block's commit signatures it missed and update its rank accordingly.
Implementation
Here's my idea on how to implement this:
If SignCTRL gets a sign request, it should check whether the difference between its last received rank and the currently requested rank is greater than 1.
Now, should the validator become unavailable for some reason, and thus miss blocks, SignCTRL notices this.
It will then have to check the blocks it missed for the validator entity's signature and update its rank accordingly. Instead of checking each and every block, the process can be sped up significantly by checking only every
{threshold}
th block, and then checking the range between block height{threshold}-2
and{threshold}+2
.There is an edge case, where there would be so many blocks missed by that one node, that the time it would take to go back and check each and every one of them exceeds the block time, which essentially means that a new block is created before SignCTRL is done checking the previous blocks.
So, once SignCTRL is done checking one set of missed blocks, it needs to check if any additional sign requests have come in and check those too.
The text was updated successfully, but these errors were encountered: