consensus liveness of the implementation #36

Lbqds · 2021-06-14T09:35:11Z

Hi, I'm diving into the implementation of the rapid membership protocol, and I have a question about the consensus liveness. As described below:

There are 9 nodes in the cluster, the FastPaxos need at least 7 votes, let's assume that 3 nodes are unreachable, which means every consensus will fallback to classic paxos. Now:

one of the node(let's say node A) start the classic paxos because of fast round timeout, and assume that node A have
largest node index in the remain 6 nodes
node A send phase1a messages to other nodes
other nodes handle phase1a message and set the rnd to Rank(2, A.NodeIndex)
now node A crashed because of unknown reason, and we have 5 nodes in the cluster
the remain 5 nodes will continue the classic paxos, but no one accept the phase1a message at here because of the node have larger rnd, and the consensus get stuck

Am I misunderstanding something?

The text was updated successfully, but these errors were encountered:

lalithsuresh · 2021-06-14T17:14:45Z

@Lbqds good catch! You are correct that the retry forever part isn't implemented yet.

This piece of code here is where a process would need to keep retrying with new rounds until a decision is reached. Right now, it starts just one round and hopes that it will succeed:

rapid/rapid/src/main/java/com/vrg/rapid/FastPaxos.java

Line 189 in 0aef896

void startClassicPaxosRound() {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consensus liveness of the implementation #36

consensus liveness of the implementation #36

Lbqds commented Jun 14, 2021

lalithsuresh commented Jun 14, 2021

consensus liveness of the implementation #36

consensus liveness of the implementation #36

Comments

Lbqds commented Jun 14, 2021

lalithsuresh commented Jun 14, 2021