Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter manageJobsWithoutQueueNames using a namespaceSelector for all integrations #3589

Closed
3 tasks done
Tracked by #3599
dgrove-oss opened this issue Nov 18, 2024 · 14 comments
Closed
3 tasks done
Tracked by #3599
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@dgrove-oss
Copy link
Contributor

dgrove-oss commented Nov 18, 2024

What would you like to be added:

Add a manageJobsWithoutQueueNameNamespaceSelector to the Kueue configuration API and use it to restrict the jobs to which manageJobsWithoutQueueName is applied across all integrations.

We can then deprecate podOptions.namepaceSelector and eventually remove it from the configuration API.

Why is this needed:

As discussed in the review of #3520, because the Deployment and StatefulSet integrations are built on top of the Pod integration, they implicitly restrict the scope of manageJobsWithoutQueueName by filtering it with podOptions.namepaceSelector. This results in an irregular API that may be confusing to users.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@dgrove-oss dgrove-oss added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 18, 2024
@dgrove-oss
Copy link
Contributor Author

/assign

I will work on drafting a KEP.

@mimowo
Copy link
Contributor

mimowo commented Nov 19, 2024

cc @mwielgus @tenzen-y @mwysokin

@tenzen-y
Copy link
Member

tenzen-y commented Nov 19, 2024

IIRC, we did not introduce this mechanism in the previous discussions: #2119

Instead of introducing this knob, shouldn't we decouple the Stateful and Deployment integration from Pod integration?
Actually, the StatefulSet integration has dedicated controller, and I'm not sure the reason why we keep depending on the Pod integrations.

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 19, 2024

This was discussed before (#2119), but what changed since then is the addition of the Deployment and StatefulSet integrations.

Deployment and StatefulSet have the same fundamental issue with manageJobsWithoutQueueNames as Pod (independent of how we choose to implement the integrations). If manageJobsWithoutQueueNames is true and you don't restrict its scope somehow things will break because system namespaces that use Deployments or StatefulSets will unexpectedly be suspended, breaking the system's functionality.

So, the proposal is to put together a KEP to see what it would take to have a uniform API to modulate manageJobsWithoutQueueNames.

An alternative that we can discuss in the KEP is to instead deprecate and then remove manageJobsWithoutQueueNames entirely in favor of a ValidatingAdmissionPolicy. I've implemented such a policy as part of our MLBatch project (see admissionPolicy.yaml) and it was subtler than was suggested in the discussion of #2119. In particular, dealing with child Jobs requires teaching the VAP about ownership links and (at least for me) that was tricky and I'm not convinced it is less complex than the namespaceSelector option. Note that my VAP only deals with one level of ownership (because that's all I needed for our restricted use case). A general one would have to deal with crawling several links to get to the "top-level" Job that was being managed by Kueue.

@mimowo
Copy link
Contributor

mimowo commented Nov 19, 2024

Regarding the Deployments and StatefulSet via pod-based integration.

The fundamental difference between Deployments, StatefulSet and JobCRDs is that there is no suspend field ( and it is unlikely to be ever added, or it will take a year at very least to get into beta).

So, using scheduling Gates for suspending seems like a good fit, and this is already provided by the pod-based integration, so we are reusing it. I'm open to alternatives, but I think they need to be presented more holistically, and in detail, because they seem vague at the moment to me at least.

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 20, 2024

Initial draft of the KEP in #3595

@yaroslava-serdiuk
Copy link
Contributor

It seems local queue defaulting and manageJobsWithoutQueueNames deprecation solve should solve your issue, right? The admins will create default local queue and it will be used for jobs without queue name.

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 26, 2024

It seems local queue defaulting and manageJobsWithoutQueueNames deprecation solve should solve your issue, right? The admins will create default local queue and it will be used for jobs without queue name.

No. I'm revising the KEP to clarify the point. Whether it is manageJobsWithoutQueueNames or the local queue defaulting, there still needs to be a way to scope either mechanism to a subset of the namespaces. Without the namespace scoping for at least Pod, Deployment, StatefulSet and Job it is not practical to enable any default management at the level of the Kueue manager for jobs without queue name.

@mimowo
Copy link
Contributor

mimowo commented Dec 5, 2024

As mentioned in #3712 (comment) I would like to yet:

  • respect the new selector in the webhooks for Deployment and StatefulSet
  • add documentation for the new feature - possibly migrate the use of podOptions.namespaceSelector, but could be left for follow up

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Dec 5, 2024

Documented the new feature in #3748.

I'll do the migration of the documentation for podOptions.namespaceSelector in a separate PR to make it easier to review.

@mimowo
Copy link
Contributor

mimowo commented Dec 6, 2024

Thanks, I think we still need to follow up with the implementation, to make the transition from podOptions.namespaceSelector possible:

  • for deployment and statefulset webhook respect mangedJobsNamespaceSelector - to avoid adding the PodGroup levels and annotations to pods which would later not be considered by Kueue
  • for pod_webhook respect also mangedJobsNamespaceSelector - effectively intersection of both pod and workload

With the changes above user can completely skip setting podOptions.namespaceSelector and we can deprecate it

@dgrove-oss
Copy link
Contributor Author

Agreed on the webhooks... I didn't quite get it done yesterday. Hoping to get there today.

@dgrove-oss
Copy link
Contributor Author

I believe #3828 is the last thing to be done. Once that merges we should be able to close this issue as completed.

@mimowo
Copy link
Contributor

mimowo commented Dec 13, 2024

/close
as #3828 and #3817 are merged. I'm ok to handle the podOptions deprecation as a separate issue.

@mimowo mimowo closed this as completed Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants