Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus liveness of the implementation #36

Open
Lbqds opened this issue Jun 14, 2021 · 1 comment
Open

consensus liveness of the implementation #36

Lbqds opened this issue Jun 14, 2021 · 1 comment

Comments

@Lbqds
Copy link

Lbqds commented Jun 14, 2021

Hi, I'm diving into the implementation of the rapid membership protocol, and I have a question about the consensus liveness. As described below:

There are 9 nodes in the cluster, the FastPaxos need at least 7 votes, let's assume that 3 nodes are unreachable, which means every consensus will fallback to classic paxos. Now:

  1. one of the node(let's say node A) start the classic paxos because of fast round timeout, and assume that node A have
    largest node index in the remain 6 nodes
  2. node A send phase1a messages to other nodes
  3. other nodes handle phase1a message and set the rnd to Rank(2, A.NodeIndex)
  4. now node A crashed because of unknown reason, and we have 5 nodes in the cluster
  5. the remain 5 nodes will continue the classic paxos, but no one accept the phase1a message at here because of the node have larger rnd, and the consensus get stuck

Am I misunderstanding something?

@lalithsuresh
Copy link
Owner

@Lbqds good catch! You are correct that the retry forever part isn't implemented yet.

This piece of code here is where a process would need to keep retrying with new rounds until a decision is reached. Right now, it starts just one round and hopes that it will succeed:

void startClassicPaxosRound() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants