Skip to content
This repository has been archived by the owner on Mar 23, 2024. It is now read-only.

Autonomous rank updates on resync #23

Open
freshfab opened this issue Apr 13, 2021 · 0 comments
Open

Autonomous rank updates on resync #23

freshfab opened this issue Apr 13, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@freshfab
Copy link
Contributor

freshfab commented Apr 13, 2021

Current Status

SignCTRL automatically shuts itself down if the height contained in the SignVoteRequest/SignProposalRequest differs too much ({threshold}+1 to be precise) from the height persisted in the signctrl_state.json file. This is a safety measure in order to prevent validators from starting on a rank, that might have become outdated while they were unavailable and missed blocks. This means that recovering a node that has shut itself down requires some manual intervention.

Goal

The goal is to reduce manual intervention as much as possible. SignCTRL should be able to figure out by itself whether its last persisted rank is still up to date or needs updating. So, SignCTRL would have to look into each block's commit signatures it missed and update its rank accordingly.

Implementation

Here's my idea on how to implement this:

If SignCTRL gets a sign request, it should check whether the difference between its last received rank and the currently requested rank is greater than 1.

Example

  1. The last sign request SignCTRL received was for height 5.
  2. SignCTRL receives a new sign request from the validator for height 6.
    In this case, requested_height - last_height > 1 (6-5 = 1 > 1) is false, so no blocks were missed.

Now, should the validator become unavailable for some reason, and thus miss blocks, SignCTRL notices this.

Example

  1. The last sign request SignCTRL received was for height 6.
  2. SignCTRL receives a new sign request from the validator for height 10.
    In this case, requested_height - last_height > 1 (10-6 = 4 > 1) is true, so at least one block was missed.

It will then have to check the blocks it missed for the validator entity's signature and update its rank accordingly. Instead of checking each and every block, the process can be sped up significantly by checking only every {threshold}th block, and then checking the range between block height {threshold}-2 and {threshold}+2.

Example

  1. If the last sign request was from height 6, and the latest sign request is for height 15, that means that blocks 7-15 were missed, so 9 in total.
  2. Let's say we have a threshold of 3 missed blocks. SignCTRL now has to look into every third block from its current height, so 9, 12 and 15.
  3. SignCTRL notices that block 12 was missed. It now needs to check blocks 10-14 and see if the two blocks before or after block 9 were also missed. If so, it needs to update its counter for blocks missed in a row and update its rank.

There is an edge case, where there would be so many blocks missed by that one node, that the time it would take to go back and check each and every one of them exceeds the block time, which essentially means that a new block is created before SignCTRL is done checking the previous blocks.

Example

  1. The last sign request SignCTRL received was for height 11.
  2. SignCTRL receives a new sign request from the validator for height 1,000.
  3. While SignCTRL checks blocks 12-1,000, a bunch of new sign requests have come in, from height 1,001 to 1,020.

So, once SignCTRL is done checking one set of missed blocks, it needs to check if any additional sign requests have come in and check those too.

@freshfab freshfab added the enhancement New feature or request label Apr 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant