Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: Add skip round logic #75

Merged
merged 19 commits into from
Nov 24, 2023
Merged

spec: Add skip round logic #75

merged 19 commits into from
Nov 24, 2023

Conversation

josef-widder
Copy link
Member

@josef-widder josef-widder commented Nov 17, 2023

I guess I will need to wait until we merge #74 until I can finish completing it. (In the current version, the votekeeper does not produce the proper skip event)

  • We need to make sure that when we skip to a round to check what messages are already there. Perhaps we are missing events there. I started with writing a test that brings three processes to round 1 and I kept v4 in round 0. Then the jump should happen

@josef-widder josef-widder changed the title kip round logic skip round logic Nov 17, 2023
@romac
Copy link
Member

romac commented Nov 17, 2023

I just merged #74

@romac romac marked this pull request as ready for review November 22, 2023 10:12
@romac
Copy link
Member

romac commented Nov 22, 2023

We need to make sure that when we skip to a round to check what messages are already there

@josef-widder Do you want to address this in this PR, or do it in a follow-up PR?

@romac romac changed the title skip round logic spec: Add skip round logic Nov 22, 2023
Comment on lines 162 to 163
// this should work, once the bookkeeper skip event is fixed
assert(N4F0::system.get("v4").es.cs.round == 1),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romac could you please have a look here. This test fails, although as far as I understand if the bookkeeper would generate the skip event properly, it should go through. Here v4

  • lags behind in round 0
  • then receives 3 round 1 messages (two precommits and one prevote). The messages are stored in the bookkeeper
  • at some point a skip event should bring v4 to round 1 but it never happens

Copy link
Member

@romac romac Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds right. But the test appears to be passing, or am I misunderstanding what the test is checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, locally it doesn't. Perhaps I have messed something up with branches locally. If the test passes, let's not block on this, and we can have a look (and add more tests later)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does pass locally for me either. Not sure what happened with the CI but let's trust our eyes for now. I'll look into it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be due to non-determinism? We might have gotten lucky on the CI this one time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure. This is the check <...> that you already added, right?

Yeah I added the check while I was writing the comment.

Copy link
Member

@romac romac Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the point is that even if the votekeeper generates events for round less than the current round, the executor has to figure out whether it still wants to do in some rounds (and sometimes does).

Can you please expand on that? I am not sure I am following what you are getting at.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I am following what you are getting at.

It was just an observation that in the interface between the bookkeeper and the executor, we most of time don't need to pass the current round. That is, only for generating the skip rule we actually need the current round, while for the rest of the bookkeeper logic, the round that is passed as part of the vote is sufficient. Perhaps it is not important, it was just for me to understand ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha! Yeah it's a shame this is the only case where we need the current round, maybe this hints at a better way to model the code and decouple things further but I am not seeing it right now. If the current state of things is fine with you then let's leave it at that for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can leave it for now.

If I think longer about it, the current solution also becomes more natural: the consensus algorithm forces us to ignore the current round in several places. In the arXiv paper

  • for line 28 we may need prevotes from the previous round,
  • for line 49 we may need precommits from previous (in fact any) rounds.

So actually, perhaps it is natural that most of the time we don't care about the current round when doing bookkeeping. So perhaps it is just intuition that is failing me here: while we think about messages from the current round a lot for lucky path, etc., the mentioned lines require us to look at messages from all rounds.

So, I guess it is good. Again, thanks for fixing the issue!

@josef-widder
Copy link
Member Author

josef-widder commented Nov 22, 2023

We need to make sure that when we skip to a round to check what messages are already there

@josef-widder Do you want to address this in this PR, or do it in a follow-up PR?

I think there is nothing to be done if the skip works correctly. if we skip on the f+1st message, no messages can be around. If we skip on 2f+1 I think it is also fine, but I can have a look and do a follow-up PR if necessary.

But there is the failing test I just mentioned a minute ago.

@romac romac mentioned this pull request Nov 24, 2023
@romac romac merged commit ce7a130 into main Nov 24, 2023
2 checks passed
@romac romac deleted the josef/skipRoundLogic branch November 24, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants