spec: revisit the "trust" relation between consensus logic and driver #600
Replies: 2 comments
-
The design we have chosen has a clear separation of concerns
Indeed, one is able to implement a different driver and re-use the same consensus state machine. But there is a contract between the driver and the consensus. and we have not been incredibly explicit about. For instance, for safety:
You have rightly observed that it is much easier to check that the consensus state machine fulfills its side of the contract. For the driver it is much harder due to the complexity. However, I don't think the solution should be to add more checks to consensus. Rather, I would like to add more confidence about our driver design being correct. Interestingly, this is one of the motivations of #143. The first example transition invariant that is mentioned there is
The idea is to specify the contract between consensus and the driver in several invariants. I think these invariants are super easy to verify. Perhaps we are even able to add assertions to the (Rust) code that encode these invariants. Not sure what is the best software engineering approach there. So, I agree with your concern. This is why I would like to eventually start to address #143, or a version of that issue that is more focussed around the consensus/driver interface requirements. |
Beta Was this translation helpful? Give feedback.
-
Thank for your comments. In the examples you provided ( I do not propose to have consensus processing individual events and the driver should indeed have an important role and be correct. But I think that, also for the sake of validating the implementation, consensus could store some "pending" events (e.g., a value that is valid but we don't have yet enough votes for it) then check if the "composite" event is indeed valid. |
Beta Was this translation helpful? Give feedback.
-
In Brussels, some people asked us why we decided to split the consensus logic in two components, the
consensus
state-machine, and the so-calleddriver
.In my mind, the idea of the driver is to collect single events received from the network (e.g., single votes) and deliver them to consensus when they become a complex event, composed by multiple single events (e.g., votes from a quorum of validators).
But as the modelling and implementation has evolved, I noticed that the role of the consensus component has became less and less relevant because the "complex event" have became more and more complex. For instance, if we see a
Polka
on the driver, we don't deliver it to consensus if we don't have the associated full-value, because otherwise it will not trigger a state transition (it might schedule a timeout, true, but this is not my point here).My point here is that the events provided by the driver to the consensus component have become very opaque. It is like the consensus module (blindly?) trust that we have a
Polka
forv
at roundr
without having any access to the actual set ofPrevote
messages that produced thePolka
forid(v)
not even the proof that the full valuev
was received and validated.And this somehow worries me. It is not that I want a new Comet with 2893748 levels of checking for the same information (we really have a quorum, is this really a valid value?) but I think that the consensus module should be less lenient to possible bugs in the driver implementation. By the way, my rationale here is based on the Quint modelling, not in the actual Rust code.
I brought that discussion to @josef-widder, that pointed me out to something really relevant in this separation of concerns. The consensus pseudo-code contains multiple
upon
causes that might become true at the same time, and the role of the driver, on one of which, is to prioritize events. For instance, if we can decide a value, why to bother the consensus model with information from previous rounds or round steps? This is very good point.But from this perspective, we might have to consider multiple possible implementations of the driver, in terms of how it handles priorities. And I wonder whether the consensus module has enough logic and information to work properly in the case the driver implementation is not optimal, or, even worse, it is wrong by design or have a bug.
So my general question is: Shouldn't we be able to test the consensus logic by providing to it events that our current design of the driver will not deliver to it (at a given state)? Should the consensus logic store some partial information, that by itself does not produce a state transition (e.g., a
Polka
forid(v)
) and be able to use this partial information to produce state transition (e.g., if afterwards it receives an event indicating thatv
was received and it is valid)?This separation of concerns is worrying me somehow. As I see how complex the logic of the driver is (again, having Quint models in mind) and how simple (can I say naïve or even dumb) is the consensus logic. More specifically here, it appears that we have the driver producing all events that lead to a pseudo-code
upon
clause to be processed. And the consensus logic just... applies the complex event, without basically any check, and produces the state-transition.I hope I am wrong on this rationale and only don't have enough knowledge of the model and implementation. If so, we can just close this issue. Otherwise, we should consider that the driver may have issues and that consensus is able to identify them.
Beta Was this translation helpful? Give feedback.
All reactions