You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm diving into the implementation of the rapid membership protocol, and I have a question about the consensus liveness. As described below:
There are 9 nodes in the cluster, the FastPaxos need at least 7 votes, let's assume that 3 nodes are unreachable, which means every consensus will fallback to classic paxos. Now:
one of the node(let's say node A) start the classic paxos because of fast round timeout, and assume that node A have
largest node index in the remain 6 nodes
node A send phase1a messages to other nodes
other nodes handle phase1a message and set the rnd to Rank(2, A.NodeIndex)
now node A crashed because of unknown reason, and we have 5 nodes in the cluster
the remain 5 nodes will continue the classic paxos, but no one accept the phase1a message at here because of the node have larger rnd, and the consensus get stuck
Am I misunderstanding something?
The text was updated successfully, but these errors were encountered:
@Lbqds good catch! You are correct that the retry forever part isn't implemented yet.
This piece of code here is where a process would need to keep retrying with new rounds until a decision is reached. Right now, it starts just one round and hopes that it will succeed:
Hi, I'm diving into the implementation of the rapid membership protocol, and I have a question about the consensus liveness. As described below:
There are 9 nodes in the cluster, the FastPaxos need at least 7 votes, let's assume that 3 nodes are unreachable, which means every consensus will fallback to classic paxos. Now:
largest node index in the remain 6 nodes
rnd
to Rank(2, A.NodeIndex)rnd
, and the consensus get stuckAm I misunderstanding something?
The text was updated successfully, but these errors were encountered: