-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collation fairness in backing subsystem #7114
Draft
tdimitrov
wants to merge
27
commits into
master
Choose a base branch
from
tsv-backing-fairness
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+3,117
−1,295
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…without a candidate hash
…ue state from backing
All GitHub workflows were cancelled due to failure one of the required jobs. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#4880 implements collation fairness in the collator protocol subsystem but it has got some drawbacks:
To overcome these problems a solution was proposed in #5079 (comment). In nutshell the idea is to move collations tracking in the backing subsystem since it has got a better view on what candidates are being fetched and seconded. How this solves the problems outline above? One of backing subsystem's goals is to keep track of all candidates that are getting backed no matter if they originate from validator's own backing group (if any or not). This fact naturally solves both problems. First we track candidates from group rotations since we have got a single view of the claim queue. Second we receive statements from all backing groups and we can keep the local claim queue state relatively up to date.
Implementation notes
The PR touches collation-protocol/validator side, backing subsystem and
ClaimQueueState
(introduced with #4880).ClaimQueueState
ClaimQueueState
was created on the fly in #4880. This means that each time we wanted to know the state of the claim queue we builtClaimQueueState
instance. In this PR we want to keep a single view of the claim queue for each leaf and each core and continuously update it by adding new leaves and candidates and dropping old ones. For this reasonClaimQueueState
is extended to keep track of the candidate hash occupying each a slot. This is needed because we want to be able to drop claims for failed fetches, invaid candidates, etc. Soclaimed
is no longer abool
but an enum:Note that the ordering here is not perfect. Let's say there are candidates A1->A2->A3 for a parachain built on top of each other. If we see them in the order A2, A3, A1 we will insert them in the claim queue in the same (wrong) order. This is fine because (a) we need the candidate hash only to be able to release claims and (b) the exact order of candidates is known by the prospective parachains subsystem.
The second change in this module is related to leaves. The claim queue has got different state for each fork and we need to be able to track this. For this reason
PerLeafClaimQueueState
which wraps aHashMap
ofClaimQueueState
for each leaf. The non-trivial logic here is adding a new leaf (add_leaf
). We need to handle three different cases:❗ All these cases are covered with unit tests but I'd appreciate an extra attention from the reviewers. ❗
backing subsystem
Backing subsystem has got an instance of
PerLeafClaimQueueState
for each core and updates it by keeping track of new candidates. We have got three different set of candidates:Each group is handled differently:
To achieve all these three new messages are introduced:
CanClaim
returnstrue
if a claim can be made for a specific para id. This message is needed to handle collator protocol v1 advertisements which don't contain candidate hash. More on this in the next section.PendingSlots
- returns all pending claims from the claim queue.DropClaims
- drops claims for one or more candidates.All these messages are used by the collator protocol and their application will be explained there.
In nutshell the following changes are made in backing:
PerLeafClaimQueueState
and keep it up to date. This includes adding new leaves, removing stale leaves, making claims for candidate and releasing claims for bad candidates. These should be easy to spot in the PR.collator protocol
Most controversial changes are in this subsystem. Since backing subsystem keeps track of the claim queue state the collator protocol no longer builds
ClaimQueueState
'on the fly'. The price to pay is additional messages exchanged with backing.Generally speaking the collator protocol follows these steps:
If it is over protocol version 2+
CanSecond
message is sent to the backing subsystem. The latter gets the leaves where the candidate can be seconded and makes a claim for each leaf. If a claim can't be made - the leaf is dropped.If it is over protocol version 1 we can't claim a slot in the claim queue since we don't know the candidate hash. I opted not to support generic claims since they complicate the implementation and hopefully we will deprecate v1 at some point. So what we do is send
CanClaim
to backing subsystem and get a confirmation that at this point there is a free spot for the candidate in the claim queue. There is a potential race here because the spot is not claimed and can be occupied by another candidate while we do the fetch. I think this risk is acceptable. Worst case we might have a few extra candidates.Seconded
to backing and the claim is marked asSeconded
. If the fetch is unsuccessful we useDropClaims
to notify backing that the claim should be released.PendingSlots
returns all claims in pending state (these are all pending fetches). We see if we have got a pending candidate for this para id and fetch it. The claims in the result are ordered by priority.PendingSlots
.The collator protocol also keeps track of candidates which are blocked by backing (
CanSecond
has returnedBlockedByBacking
error for them). These candidates has got pending claims and if their relay parent goes out of view they are dropped.❗ Note for the reviewer: This might be abused by making a lot of unbackable advertisements which fill in the claim queue. The alternative is to release claims for these candidates and making a new claim when they are unblocked. This will introduce additional complexity but might be worth doing. I'd love to get some feedback on this. ❗
TODOs
The testing for this PR is still work in progress, but the functionally I consider it done.