Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus halt fix #94

Merged
merged 1 commit into from
Apr 24, 2024
Merged

Consensus halt fix #94

merged 1 commit into from
Apr 24, 2024

Conversation

goran-ethernal
Copy link
Collaborator

@goran-ethernal goran-ethernal commented Apr 23, 2024

Description

Some clients reported an issue where consensus will halt from time to time when nodes are under heavy load.

What happened was that, round lasted longer, almost the same as the round timeout, because the nodes were under stress test, so round expired and insert proposal happened at the same time. Unfortunatelly, round expired channel got notified while proposal was being inserted into chain. This created a strange situation, where block was inserted to chain, but rounds kept changing (round change was happening after this), even though porposal got inserted. This caused consensus to halt, and stop producing blocks.

The solution was provided by @lazartravica, and the fix was quite easy. When the quorum for commit messages is hit, and the node is in fin state (meaning it accepted the proposal, and will now insert it to chain), roundDone channel will be hit immediately, and porposal will be inserted after that. This means that round expired will not be hit while we are inserting proposal, since before this fix roundDone was hit only after the block was inserted, so if InsertProposal lasts a bit longer there is a chance that round timeout can be hit while the node is inserting block.

Changes include

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)

Checklist

  • I have assigned this PR to myself
  • I have added at least 1 reviewer
  • I have added the relevant labels
  • I have added sufficient documentation in code

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@goran-ethernal goran-ethernal self-assigned this Apr 23, 2024
@goran-ethernal goran-ethernal added the bug fix Functionality that fixes a bug label Apr 23, 2024
@goran-ethernal goran-ethernal merged commit 00f7637 into main Apr 24, 2024
5 checks passed
@goran-ethernal goran-ethernal deleted the consensus-halt-issue branch April 24, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Functionality that fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants