Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 3589: Uniformly filter manageJobsWithoutQueueNames by namespace #3595

Merged
merged 9 commits into from
Nov 29, 2024

Conversation

dgrove-oss
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

KEP to propose a design for #3589

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 20, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 20, 2024
Copy link

netlify bot commented Nov 20, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 44d6ce2
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67476435b3cf2a000877fdf9

@dgrove-oss dgrove-oss changed the title KEP 3985: Uniformly filter manageJobsWithoutQueueNames by namespace KEP 3589: Uniformly filter manageJobsWithoutQueueNames by namespace Nov 20, 2024
In other words, the VAP must implement the same functionality as `jobframework.IsOwnerManagedByKueue`
and cross check controller-references against managed kinds. This approach may be challenging to scale
as the number of Kueue managed kinds increases and ancestor trees become deeper.
For example in the not so distant future we might see: AppWrapper->PyTorchJob->JobSet->Job->Pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to do the same as is done in jobframework.IsOwnerManagedByKueue and just check if the parent is manageable by Kueue? So ancestor tree size wouldn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to phrase this better. It isn't that the check has to actually crawl up the ancestor tree. It's that the check has to consider each of the kinds that could be Kueue-managed parents. So it needs a longer list of kinds and it has to somehow know that a particular Kind's Kueue integration is enabled. For kinds like Job or JobSet that could have multiple Kueue-managed parent Kinds, each has to be covered by the check.

keps/3589-manage-jobs-selectively/README.md Show resolved Hide resolved
As a practical matter, Pods running in system namespaces must be exempted from being
managed by Kueue even if `manageJobsWithoutQueueName` is true. Therefore Kueue's Pod
integration limits the scope of `manageJobsWithoutQueueName` by restricting its effect to
Pods that are in namespaces that match against the `integrations.podOptions.namespaceSelector`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to use this proposal to deprecate integrations.podOptions.namespaceSelector as it seems redundant now. We may still have it by default on in the next release, but then in the next disable, and for v1beta2 we could drop it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can just mark podNamespaceSelector as deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move forward with this KEP we can. Otherwise we still need it to say that pod integration, deployement, and statefulSet is constrained.

Comment on lines +183 to +201
However, while exploring this idea further a complication arose. The ValidatingAdmissionPolicy
must be able to distinguish between top-level resources without queue names (which should be denied)
and child resources that are owned by Kueue-managed parents (which must be admitted even if they do not have a queue name).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can understand why VAP becomes complicated when there are long chains of CRDs. I think this picture may be even more complex than in this example if we have multiple parent CRDs which can own batch/v1 Job.

Also, introducing ManagedJobsNamespaceSelector isn't really adding new complexity assuming we deprecate and drop podOptions.namespaceSelector over time (another comment)

@mimowo
Copy link
Contributor

mimowo commented Nov 25, 2024

LGTM overall, I would just like to use this work to deprecate the podOptions.namespaceSelector, so we don't need to maintain both APIs in the long run. Then, I think the API is just a more generic replacement to what we currently have, but I don't consider it much of a complication as we already do it for pods.

@tenzen-y
Copy link
Member

Landing on here

/cc

keps/3589-manage-jobs-selectively/README.md Outdated Show resolved Hide resolved
keps/3589-manage-jobs-selectively/README.md Outdated Show resolved Hide resolved

### Non-Goals

Consider any scope for filtering `manageJobsWithoutQueueName` that is not based on namespaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just comment.

My primary concern for this KEP was this one.
I would not like to accept this feature in any case since this tries to re-implement the Gatekeeper, VAP, and MAP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I fully agree we don't want this mechanism to be more complex, and +1 to reject features not based on namespaces, but for namespaces we already have the complication in podOptions.namespaceSelector, and making it uniformly supported seems like good idea when we have Deployments and StatefulSets supported.

I agree with the direction to use VAP and MAP, but this will probably take many releases as MAP is still alpha in 1.32, and we need to propose details for VAP and MAP generation to avoid issues with hierarchical jobs (as described in the KEP).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for expanding that. If anyone wants to know VAP and MAP introduction details, please refer #3636 and #3637.


...

// ManageJobsWithoutQueueName controls whether or not Kueue reconciles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a comment and not an objection to this feature.

I was wondering if we could just deprecate and remove the manageJobsWithoutQueueName and introduce more informable validations so that batch users can easily find the quota violation.

I think that the current manageJobsWithoutQueuenName does not have a good user experience since there is no way to find the reason why jobs are not runnable when they accidentally submit Jobs without queue names.

Because the Kueue automatically suspends the Jobs.

However, based on some consideration, this manageJobsWithoutQueueName has already been released. So, even if we do not implement this, we must follow the appropriate deprecation steps.

Additionally, if the cluster administrator provides the automatic queue-name insertion mechanism, this could be helpful for the batch users.
Furthermore, such an informable validation mechanism could be an alternative new feature since validation and mutation could not cover all use cases each other, and I think that both are valuable.

As a practical matter, Pods running in system namespaces must be exempted from being
managed by Kueue even if `manageJobsWithoutQueueName` is true. Therefore Kueue's Pod
integration limits the scope of `manageJobsWithoutQueueName` by restricting its effect to
Pods that are in namespaces that match against the `integrations.podOptions.namespaceSelector`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can just mark podNamespaceSelector as deprecated.

@kannon92
Copy link
Contributor

kannon92 commented Nov 25, 2024

For deprecation manageJobsWithoutQueueName, it seems #2936 has some relation to this discussion.

I noticed that the PR #3610 (comment) would also end up deprecation manageJobsWithoutQueueName.

I don't have the context here but it seems that the consensus is that this field needs to go and a few competing PRs seem to be doing that.

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 25, 2024

LGTM overall, I would just like to use this work to deprecate the podOptions.namespaceSelector, so we don't need to maintain both APIs in the long run. Then, I think the API is just a more generic replacement to what we currently have, but I don't consider it much of a complication as we already do it for pods.

I think this is the key point. For either manageJobsWithoutQueueName or the default LocalQueue name being pursued in #3610 to be usable, there has to be a way to control which namespaces they apply to for at least the Pod, StatefulSet and Deployment integrations. On OpenShift, which tends to use Job for system operations, the namespace control needs to also cover the Job integration as well.

I can rework this KEP to focus on just the generalization of integrations.podOptions.namespaceSelector to managedJobsNamespaceSelector. I think we need it independent of what happens to manageJobsWithoutQueueName.

Comment on lines 83 to 84
We migrate away from using `podOptions.namespaceSelector` over the course of several releases.
Ideally we remove it from the `Configuration` struct as part of going to the `v1` API level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm, I think we can say that in the transition period the actual namespace selector used for Pod integration (and thus StatefulSet and Deployments) will be the intersection of both managedJobsNamespaceSelector and podOptions.namespaceSelector.

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, left a remark to explain how the effective namespace selector in the transition period before we drop podOptions.namespaceSelector

dgrove-oss and others added 2 commits November 27, 2024 12:17
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 27, 2024

Now that there is actually a KEP for 2936, I've updated this KEP accordingly (in particular drawbacks and alternatives section).

The TL;DR is that if we pursue 2936 and it will implicitly filter by namespace (which seems likely since it is looking for a default local queue in the job's namespace) then we don't need to pursue this KEP.

We may still want to pursue this KEP as a tactical move, if we think it will take a significant number of releases to get 2936 to a usable state and really remove manageJobsWithoutQueueName. I think the implementation effort for this KEP is fairly low (and as discussed in alternatives could also be done without an API change if we are ok with an even odder semantics for podOptions.namespaceSelector until we remove it entirely).

/cc @mimowo @tenzen-y @kannon92 @yaroslava-serdiuk

@mimowo
Copy link
Contributor

mimowo commented Nov 28, 2024

I think this KEP is still useful as a generalization of podOptions.namespaceSelector, which is needed as long as we support manageJobsWithoutQueueName.

Maybe one day with LQ defaulting and generated VAP we could drop manageJobsWithoutQueueName: true, but this seems a long way off. So, I think this PR makes sense as a tactical move.

@dgrove-oss do you think if we accept the KEP you could provide implementation for 0.10? This is planned 3-4 Dec.

@tenzen-y
Copy link
Member

I think this KEP is still useful as a generalization of podOptions.namespaceSelector, which is needed as long as we support manageJobsWithoutQueueName.

Maybe one day with LQ defaulting and generated VAP we could drop manageJobsWithoutQueueName: true, but this seems a long way off. So, I think this PR makes sense as a tactical move.

I have the same opinion with Michal. Regardless of #2936, this sounds useful.
In addition to Michal mentioned use cases, I can images the useful situations where the cluster admin want to use the dedicated LQ name separate from "default", and cluster admin want to obviously tell batch users that all Jobs should be obviously enqueue to specific LocalQueue.

@dgrove-oss
Copy link
Contributor Author

I have time on Monday/Tuesday to work on an implementation. If it goes smoothly, that may be enough. I can give it a try.

@mimowo
Copy link
Contributor

mimowo commented Nov 29, 2024

/lgtm
Leaving approval tag to @tenzen-y

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c366d59358f16cfb761860ba3e9315c334627d97

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrove-oss, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit cac26f0 into kubernetes-sigs:main Nov 29, 2024
7 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.10 milestone Nov 29, 2024
@dgrove-oss dgrove-oss deleted the kep-3589 branch November 29, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants