-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition detection #27
Comments
I'm leaning towards the core idea behind option 1 (the node leaving), but in a different way: I'd much rather the failure detector implementation have its own custom logic for a node to leave based on some heuristics. Mainly because two-way partition detection seems to be a very specific kind of failure to detect -- so it shouldn't be part of the core protocol. The model for Rapid only assumes that messages between correct processes will eventually be delivered. Nodes in a minority partition are by definition not correct processes. We therefore can't assume the minority partition will be able to send or receive any messages at all (alerts, Fast Paxos, fallback Paxos messages or any additional messages at all). This makes it so that we can't rely on a timeout around the consensus steps failing (what if the node never even gets to the consensus step?), nor expect messages that minority partition nodes can send/receive to determine which partition they are in (because every node in the minority partition could be isolated into its own island). The extensions to the reinforcement mechanism (like observing the number of reinforced alerts) -- IIUC -- has the same drawback: it still requires that the minority nodes are are able to receive broadcasts or other messages. Back to leaving based on heuristics: I think we should extend the failure detector API with the ability to define custom "should I Leave" logic. As you say, this logic should be a singleton for each node (and not in separate edge failure detectors). Rapid users can experiment with any heuristic they want in there. A simple starting point for any implementation would be to do something like "I haven't heard from my observers nor subjects in a while, so maybe i'll probe some other nodes and if the situation looks bad** -- i'll voluntarily leave" -- ** for some application-specific idea of bad. |
Yes, having that mechanism be pluggable definitely is something I had in mind as well. The reason I want to involve the multi-node cut detector is because I like the fact that it acts as a qualitative filter to instability. As I see it, using the state of the multi-node cut detector as a trigger to the "should I leave" logic would be a superior approach to simply relying on the single-node view of observers and subjects which may be subject to glitches at times. From there on, the "should I leave" logic could be provided with:
in the last 2* Based on that input, the custom logic could apply its own filter to decide whether to probe further. This could be triggered as soon as a subject has been in unstable reporting mode for some timeout (for example, the same timeout as for starting reinforcement). |
Sounds good. One way to do this is to make the "should I leave logic" get a notification when an alert modifies the cut-detector (the notification can be parameterized by the alert message itself, the configuration ID and a read-only view of the cut-detector). |
Summarizing the discussion I've had with @lalithsuresh per e-mail on this topic so far:
In the event of a 2-way network partition, consensus can be reached on the majority but not on the minority side. Note that in order to do so reliable, the reinforcement mechanism described in the last paragraph of section 4.2 of the paper needs to be implemented.
On the minority side, no consensus can be reached. I currently have two ideas that would allow to make progress (where progress means informing the application that we are out).
Consensus timeout
After fast-paxos times out and paxos times out, conclude that we can't reach consensus fast enough and leave. Note that I am not sure yet that even with the reinforcement mechanism as described in the paper there'd be a way to reach a cut detection under those circumstances.
Extend the reinforcement mechanism to include partition detection
I think it may be possible to evolve the reinforcement mechanisms so that:
I think there could be a way for the system to detect that there has been a partition and go into "partition consensus" mode. For example I think that by counting the reinforced-REMOVEs it would be possible to confirm that there's only as many nodes left in the same side of the partition as inferred by the broadcast count. I.e. if I broadcast to m nodes and I also receive at most m reinforced-REMOVE alerts then that should give me a high confidence that we're the only ones left on that side. If I receive more than m, then a two-way partition cannot be assumed. (note that I think that
reinforced-REMOVE
needs to become its own alert type to be able to distinguish late REMOVEs from the reinforced ones).Once a minority partition has been detected the minority side can proceed to shut down.
I have not yet thought this through for 3-way partitions.
The text was updated successfully, but these errors were encountered: