spec: revisit the "trust" relation between consensus logic and driver #600

cason · 2024-09-09T10:21:09Z

cason
Sep 9, 2024
Collaborator

In Brussels, some people asked us why we decided to split the consensus logic in two components, the consensus state-machine, and the so-called driver.

In my mind, the idea of the driver is to collect single events received from the network (e.g., single votes) and deliver them to consensus when they become a complex event, composed by multiple single events (e.g., votes from a quorum of validators).

But as the modelling and implementation has evolved, I noticed that the role of the consensus component has became less and less relevant because the "complex event" have became more and more complex. For instance, if we see a Polka on the driver, we don't deliver it to consensus if we don't have the associated full-value, because otherwise it will not trigger a state transition (it might schedule a timeout, true, but this is not my point here).

My point here is that the events provided by the driver to the consensus component have become very opaque. It is like the consensus module (blindly?) trust that we have a Polka for v at round r without having any access to the actual set of Prevote messages that produced the Polka for id(v) not even the proof that the full value v was received and validated.

And this somehow worries me. It is not that I want a new Comet with 2893748 levels of checking for the same information (we really have a quorum, is this really a valid value?) but I think that the consensus module should be less lenient to possible bugs in the driver implementation. By the way, my rationale here is based on the Quint modelling, not in the actual Rust code.

I brought that discussion to @josef-widder, that pointed me out to something really relevant in this separation of concerns. The consensus pseudo-code contains multiple upon causes that might become true at the same time, and the role of the driver, on one of which, is to prioritize events. For instance, if we can decide a value, why to bother the consensus model with information from previous rounds or round steps? This is very good point.

But from this perspective, we might have to consider multiple possible implementations of the driver, in terms of how it handles priorities. And I wonder whether the consensus module has enough logic and information to work properly in the case the driver implementation is not optimal, or, even worse, it is wrong by design or have a bug.

So my general question is: Shouldn't we be able to test the consensus logic by providing to it events that our current design of the driver will not deliver to it (at a given state)? Should the consensus logic store some partial information, that by itself does not produce a state transition (e.g., a Polka for id(v)) and be able to use this partial information to produce state transition (e.g., if afterwards it receives an event indicating that v was received and it is valid)?

This separation of concerns is worrying me somehow. As I see how complex the logic of the driver is (again, having Quint models in mind) and how simple (can I say naïve or even dumb) is the consensus logic. More specifically here, it appears that we have the driver producing all events that lead to a pseudo-code upon clause to be processed. And the consensus logic just... applies the complex event, without basically any check, and produces the state-transition.

I hope I am wrong on this rationale and only don't have enough knowledge of the model and implementation. If so, we can just close this issue. Otherwise, we should consider that the driver may have issues and that consensus is able to identify them.

josef-widder · 2024-09-19T14:06:13Z

josef-widder
Sep 19, 2024
Collaborator

The design we have chosen has a clear separation of concerns

the consensus state machine encapsulates Tendermint consensus. It is so simple that anyone can look at it next to the arXiv paper to check that this indeed is the same algorithm (nothing obfuscates the implementation... when I write implementation, I mean Quint, but I think Rust is super similar)
the driver resolves the non-determinism of the rules (as you mentioned)

Indeed, one is able to implement a different driver and re-use the same consensus state machine. But there is a contract between the driver and the consensus. and we have not been incredibly explicit about. For instance, for safety:

the driver needs to ensure, e.g. that
- it emits a ProposalAndPolkaAndValidCInput(Value) event, only if it has seen the proposal, and enough prevotes, and if the value is valid.
- if consensus emits a VoteOutput(Vote), then it doesn't change the value within the vote before sending it to the others.
consensus needs to ensure, e.g., upon receiving a ProposalAndCommitAndValidCInput((r, v)) it doesn't decide on v' and doesn't emit DecidedOutput((r, v'))

You have rightly observed that it is much easier to check that the consensus state machine fulfills its side of the contract. For the driver it is much harder due to the complexity.

However, I don't think the solution should be to add more checks to consensus. Rather, I would like to add more confidence about our driver design being correct. Interestingly, this is one of the motivations of #143. The first example transition invariant that is mentioned there is

the driver never issues a Polka event, if the votekeeper does not have 2f+1 corresponding prevotes stored

The idea is to specify the contract between consensus and the driver in several invariants. I think these invariants are super easy to verify. Perhaps we are even able to add assertions to the (Rust) code that encode these invariants. Not sure what is the best software engineering approach there.

So, I agree with your concern. This is why I would like to eventually start to address #143, or a version of that issue that is more focussed around the consensus/driver interface requirements.

0 replies

cason · 2024-10-04T07:40:53Z

cason
Oct 4, 2024
Collaborator Author

Thank for your comments.

In the examples you provided (ProposalAndPolkaAndValidCInput(Value) and ProposalAndCommitAndValidCInput(r, v) I just wonder why consensus should not be aware of the two "main" components of it. Namely, it could process the "valid value received" event, then process the Polka or Commit event. Notice that we already keep on the driver the information that we have one of them (votes or value) and we are missing the other, so that to produce the "composite" event.

I do not propose to have consensus processing individual events and the driver should indeed have an important role and be correct. But I think that, also for the sake of validating the implementation, consensus could store some "pending" events (e.g., a value that is valid but we don't have yet enough votes for it) then check if the "composite" event is indeed valid.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: revisit the "trust" relation between consensus logic and driver #600

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

spec: revisit the "trust" relation between consensus logic and driver #600

cason Sep 9, 2024 Collaborator

Replies: 2 comments

josef-widder Sep 19, 2024 Collaborator

cason Oct 4, 2024 Collaborator Author

cason
Sep 9, 2024
Collaborator

josef-widder
Sep 19, 2024
Collaborator

cason
Oct 4, 2024
Collaborator Author