-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 3589: Uniformly filter manageJobsWithoutQueueNames by namespace #3595
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
In other words, the VAP must implement the same functionality as `jobframework.IsOwnerManagedByKueue` | ||
and cross check controller-references against managed kinds. This approach may be challenging to scale | ||
as the number of Kueue managed kinds increases and ancestor trees become deeper. | ||
For example in the not so distant future we might see: AppWrapper->PyTorchJob->JobSet->Job->Pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be enough to do the same as is done in jobframework.IsOwnerManagedByKueue
and just check if the parent is manageable by Kueue? So ancestor tree size wouldn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to phrase this better. It isn't that the check has to actually crawl up the ancestor tree. It's that the check has to consider each of the kinds that could be Kueue-managed parents. So it needs a longer list of kinds and it has to somehow know that a particular Kind's Kueue integration is enabled. For kinds like Job or JobSet that could have multiple Kueue-managed parent Kinds, each has to be covered by the check.
As a practical matter, Pods running in system namespaces must be exempted from being | ||
managed by Kueue even if `manageJobsWithoutQueueName` is true. Therefore Kueue's Pod | ||
integration limits the scope of `manageJobsWithoutQueueName` by restricting its effect to | ||
Pods that are in namespaces that match against the `integrations.podOptions.namespaceSelector`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to use this proposal to deprecate integrations.podOptions.namespaceSelector
as it seems redundant now. We may still have it by default on in the next release, but then in the next disable, and for v1beta2 we could drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can just mark podNamespaceSelector as deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move forward with this KEP we can. Otherwise we still need it to say that pod integration, deployement, and statefulSet is constrained.
However, while exploring this idea further a complication arose. The ValidatingAdmissionPolicy | ||
must be able to distinguish between top-level resources without queue names (which should be denied) | ||
and child resources that are owned by Kueue-managed parents (which must be admitted even if they do not have a queue name). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can understand why VAP becomes complicated when there are long chains of CRDs. I think this picture may be even more complex than in this example if we have multiple parent CRDs which can own batch/v1 Job.
Also, introducing ManagedJobsNamespaceSelector
isn't really adding new complexity assuming we deprecate and drop podOptions.namespaceSelector over time (another comment)
LGTM overall, I would just like to use this work to deprecate the podOptions.namespaceSelector, so we don't need to maintain both APIs in the long run. Then, I think the API is just a more generic replacement to what we currently have, but I don't consider it much of a complication as we already do it for pods. |
Landing on here /cc |
|
||
### Non-Goals | ||
|
||
Consider any scope for filtering `manageJobsWithoutQueueName` that is not based on namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just comment.
My primary concern for this KEP was this one.
I would not like to accept this feature in any case since this tries to re-implement the Gatekeeper, VAP, and MAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I fully agree we don't want this mechanism to be more complex, and +1 to reject features not based on namespaces, but for namespaces we already have the complication in podOptions.namespaceSelector
, and making it uniformly supported seems like good idea when we have Deployments and StatefulSets supported.
I agree with the direction to use VAP and MAP, but this will probably take many releases as MAP is still alpha in 1.32, and we need to propose details for VAP and MAP generation to avoid issues with hierarchical jobs (as described in the KEP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
... | ||
|
||
// ManageJobsWithoutQueueName controls whether or not Kueue reconciles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a comment and not an objection to this feature.
I was wondering if we could just deprecate and remove the manageJobsWithoutQueueName
and introduce more informable validations so that batch users can easily find the quota violation.
I think that the current manageJobsWithoutQueuenName does not have a good user experience since there is no way to find the reason why jobs are not runnable when they accidentally submit Jobs without queue names.
Because the Kueue automatically suspends the Jobs.
However, based on some consideration, this manageJobsWithoutQueueName
has already been released. So, even if we do not implement this, we must follow the appropriate deprecation steps.
Additionally, if the cluster administrator provides the automatic queue-name insertion mechanism, this could be helpful for the batch users.
Furthermore, such an informable validation mechanism could be an alternative new feature since validation and mutation could not cover all use cases each other, and I think that both are valuable.
As a practical matter, Pods running in system namespaces must be exempted from being | ||
managed by Kueue even if `manageJobsWithoutQueueName` is true. Therefore Kueue's Pod | ||
integration limits the scope of `manageJobsWithoutQueueName` by restricting its effect to | ||
Pods that are in namespaces that match against the `integrations.podOptions.namespaceSelector`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can just mark podNamespaceSelector as deprecated.
For deprecation I noticed that the PR #3610 (comment) would also end up deprecation I don't have the context here but it seems that the consensus is that this field needs to go and a few competing PRs seem to be doing that. |
I think this is the key point. For either I can rework this KEP to focus on just the generalization of |
We migrate away from using `podOptions.namespaceSelector` over the course of several releases. | ||
Ideally we remove it from the `Configuration` struct as part of going to the `v1` API level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sgtm, I think we can say that in the transition period the actual namespace selector used for Pod integration (and thus StatefulSet and Deployments) will be the intersection of both managedJobsNamespaceSelector
and podOptions.namespaceSelector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, left a remark to explain how the effective namespace selector in the transition period before we drop podOptions.namespaceSelector
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Now that there is actually a KEP for 2936, I've updated this KEP accordingly (in particular drawbacks and alternatives section). The TL;DR is that if we pursue 2936 and it will implicitly filter by namespace (which seems likely since it is looking for a default local queue in the job's namespace) then we don't need to pursue this KEP. We may still want to pursue this KEP as a tactical move, if we think it will take a significant number of releases to get 2936 to a usable state and really remove |
I think this KEP is still useful as a generalization of Maybe one day with LQ defaulting and generated VAP we could drop @dgrove-oss do you think if we accept the KEP you could provide implementation for 0.10? This is planned 3-4 Dec. |
I have the same opinion with Michal. Regardless of #2936, this sounds useful. |
I have time on Monday/Tuesday to work on an implementation. If it goes smoothly, that may be enough. I can give it a try. |
/lgtm |
LGTM label has been added. Git tree hash: c366d59358f16cfb761860ba3e9315c334627d97
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrove-oss, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
KEP to propose a design for #3589
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?