Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mutating queue name in StatefulSet Webhook. #3520

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mbobrovskyi
Copy link
Contributor

@mbobrovskyi mbobrovskyi commented Nov 13, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Allow mutating queue name in StatefulSet Webhook.

Which issue(s) this PR fixes:

Fixes #3279

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Allow mutating queue name in StatefulSet Webhook.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Nov 13, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 13, 2024
Copy link

netlify bot commented Nov 13, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 1489600
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67781642aa0cb90008fc7e28

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 13, 2024
@mimowo
Copy link
Contributor

mimowo commented Nov 13, 2024

/hold
I want to understand the flow e2e first from the user perspective.
In particular, how can user start such a StafulSet, will adding the label make it start?

IIRC for Jobs we start such a Job (but please double-check and confirm).

I synced with @mbobrovskyi that this is to align the behavior for Deployment, but another option is to simply reject such Deployments if they are not supported anyway.

I think it deserves e2e test.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
@mbobrovskyi mbobrovskyi marked this pull request as draft November 13, 2024 09:53
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2024
@dgrove-oss
Copy link
Contributor

I'd also like to understand how it interacts with the namespaceSelector on the pod integration. In the Pod webhook Default method, if the namespaceSelector doesn't match then we never get to the code that consults manageJobsWithoutQueueName.

We don't support namespaceSelectors to modify manageJobsWithoutQueueName for any other integration (discussed at length in #2119).

What is the intended semantics for a StatefulSet or Deployment that is deployed in a namespace that doesn't match the namespaceSelector for the Pod integration when manageJobsWithoutQueueName is true?

@mimowo
Copy link
Contributor

mimowo commented Nov 14, 2024

In the Pod webhook Default method, if the namespaceSelector doesn't match then we never get to the code that consults manageJobsWithoutQueueName.

Yes, this is WAI. The intention was to have a mechanism to exclude pods (like static pods or DeamonSet pods) in kube-system and kueue-system. We made the mechanism more generic (to exclude arbitrary namespaces).

We don't support namespaceSelectors to modify manageJobsWithoutQueueName for any other integration (discussed at length in #2119).

Right, we don't do it for all other integrations. However, I think Deployments and StatefulSets need to be the other cases, first Deployments are used in kube-system and kueue-system so we better don't touch them. Second, the support is based on the PodGroup integration and so we inherit the lookup into namespaceSelector for the pod integration.

What is the intended semantics for a StatefulSet or Deployment that is deployed in a namespace that doesn't match the namespaceSelector for the Pod integration when manageJobsWithoutQueueName is true?

IIUC this means basically "for Deployments and StatefulSets in the kube-system or kueue-system". I think we should not manage them - no workload should be created. Since Deployments and StatefulSets are based on PodGroup integration this should happen "for free".

Let me know if this matches your expectations and understanding.

cc @mwielgus

@dgrove-oss
Copy link
Contributor

It honestly feels a bit like our implementation is leaking through to the API. In particular, treating StatefulSets one way and Jobs another wrt manageJobsWithoutQueueName.

I think it could be less surprising / easier to explain if the boolean manageJobsWithoutQueueNames was replaced with a namespaceSelector across all integrations. I know this was discussed before, but maybe it is worth revisiting now that (a) we see what we need for Deployment and StatefulSet and (b) we are thinking about what a v1 API would look like and what perhaps should be improved between now and then.

@mimowo
Copy link
Contributor

mimowo commented Nov 14, 2024

It honestly feels a bit like our implementation is leaking through to the API. In particular, treating StatefulSets one way and Jobs another wrt manageJobsWithoutQueueName.

Yeah, I see the point - so that it is not clear why StatefulSet or Deployment pods are controlled by podOptions.namespaceSelector, whilst for other Jobs this is not respected.

I think it could be less surprising / easier to explain if the boolean manageJobsWithoutQueueNames was replaced with a namespaceSelector across all integrations.

You mean "replaced"? Or something like "restricted" - so that we only manage workloads matching the namespaceSelector?

I know this was discussed before, but maybe it is worth revisiting now that (a) we see what we need for Deployment and StatefulSet and (b) we are thinking about what a v1 API would look like and what perhaps should be improved between now and then.

I would be in favor of that. The original intention of podOptions.namespaceSelector was to exclude "kube-system" and "kueue-system" from pods. Back then we didn't foresee the need to exclude managing for Jobs or other supported CRDs. However, as we now support Deployments it makes also sense to exclude "kube-system" and "kueue-system". Luckily this is for free by using Pod integration, but as you say it means leaking implementation details.

Let me also cc @mwielgus and @tenzen-y for their opinions, but +1 from me to decouple namespaceSelector from podOptions.

The remaining question from me: do we support both places, or we validate only one is set? We could consider supporting both places for v1beta1 and depracate the one in podOptions, but it would be good to have a KEP for that. Are you interested in driving this?

@dgrove-oss
Copy link
Contributor

You mean "replaced"? Or something like "restricted" - so that we only manage workloads matching the namespaceSelector?

restricted is a better word :).

Yes, I'd propose that we do a uniform filtering by namespaceSelector for all integrations when manageJobsWithoutQueueName is true. I'll give people some time to comment, but if there is interest in exploring this I'd be happy to kick off a KEP and drive it.

@mbobrovskyi mbobrovskyi changed the title Allow manageJobsWithoutQueueName on StatefulSet. Allow mutating queue name in StatefulSet Webhook. Nov 18, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from f9aa14c to 752d4dc Compare November 18, 2024 05:08
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 18, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch 3 times, most recently from 43a4d26 to 3d997ec Compare November 27, 2024 11:49
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from 3d997ec to 534b8aa Compare November 27, 2024 11:56
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 27, 2024
@mbobrovskyi
Copy link
Contributor Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Nov 27, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from a88469d to c0cd292 Compare December 6, 2024 13:31
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 10, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from c0cd292 to 6bd2e16 Compare December 30, 2024 10:11
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 30, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch 2 times, most recently from b643b05 to 73132a5 Compare December 30, 2024 15:34
@mbobrovskyi mbobrovskyi marked this pull request as ready for review December 30, 2024 15:34
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 30, 2024
@mbobrovskyi mbobrovskyi marked this pull request as draft December 30, 2024 15:51
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 30, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch 2 times, most recently from 151aa53 to f146909 Compare January 3, 2025 15:02
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 3, 2025
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from f146909 to c457cb2 Compare January 3, 2025 16:28
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 3, 2025
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from c457cb2 to 27b57c7 Compare January 3, 2025 16:44
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from 27b57c7 to 1489600 Compare January 3, 2025 16:54
@mbobrovskyi mbobrovskyi marked this pull request as ready for review January 3, 2025 16:54
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 3, 2025
@k8s-ci-robot k8s-ci-robot requested a review from PBundyra January 3, 2025 16:54
@mbobrovskyi
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2025
@mbobrovskyi
Copy link
Contributor Author

/cc @mimowo

@k8s-ci-robot k8s-ci-robot requested a review from mimowo January 7, 2025 07:47
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to be changing much more than needed. Let me hold until the scope and purpose of the PR is clarified.
/hold

@@ -39,7 +39,7 @@ const (
func init() {
utilruntime.Must(jobframework.RegisterIntegration(FrameworkName, jobframework.IntegrationCallbacks{
SetupIndexes: SetupIndexes,
NewReconciler: NewReconciler,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed? Please don't if not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, a Pod is updated, but the StatefulSet isn't aware of it, causing the reconcile process to not work as expected (e.g., the finalizer isn't removed). Thats why it is better to reconcile Pod instead of StatefulSet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this case - can you elaborate? For example in the core k8s Job we also have finalizers on Job pods, but the reconciler is at the Job level, reference. I would like to first document such problematic scenarios in some form of tests and change the implementation in a dedicated PR (if needed).

ss.Spec.Template.Labels[constants.QueueLabel] = queueName
ss.Spec.Template.Labels[pod.GroupNameLabel] = GetWorkloadName(ss.Name)
groupName, err := GetWorkloadName(obj.(*appsv1.StatefulSet))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not seem releated to changing queue-name. Please revert if not necessary.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MVP support for arbitrary resizing a StatefulSet (investigate if feasible)
4 participants