Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-44786: add support for the LLC alignment cpumanager policy option #2136

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

ffromani
Copy link

add support for the LLC alignment by pulling ahead of time u/s PR 126750.

@openshift-ci-robot openshift-ci-robot added backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Nov 20, 2024
@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-44786, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

add support for the LLC alignment by pulling ahead of time u/s PR 126750.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 20, 2024
@openshift-ci-robot
Copy link

@ffromani: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@ffromani
Copy link
Author

/retest-required

@kannon92
Copy link

From what I remember, we don't really have a lot of e2e tests on this feature in upstream. How are we going to verify that this works on previous releases?

@ffromani
Copy link
Author

ffromani commented Nov 25, 2024

From what I remember, we don't really have a lot of e2e tests on this feature in upstream. How are we going to verify that this works on previous releases?

we will have e2e tests in the telco testsuites; and the telco testsuites will run on split-LLC hardware (high end AMD CPUs).
If we want the e2e tests in openshift/origin, which I'd be happy to add, we will need to figure out how to link machines high end AMD CPUs in openshift CI, because AFAIK there are none.

@openshift-ci-robot
Copy link

@ffromani: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link

@ffromani: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@ffromani
Copy link
Author

/retest-required

@ffromani
Copy link
Author

ffromani commented Dec 3, 2024

/jira refresh

@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-44786, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Dec 3, 2024

/jira refresh

@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-44786, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Dec 3, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 3, 2024
@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-44786, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (schoudha@redhat.com), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Comment on lines 38 to 42
// TestOnlySetEnabled allows changing the state of management partition enablement
// This method MUST NOT be used outside of test code
func TestOnlySetEnabled(enabled bool) {
llcAlignmentEnabled = enabled
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what're the plans for using this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no real plans yet - too eager copy-paste from managed code. I will remove and re-add later if needed.

@openshift-ci-robot
Copy link

@ffromani: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

ffromani added a commit to ffromani/openshift-api that referenced this pull request Dec 4, 2024
add feature gate to enable selected users to consume
cpumanager policy options of alfpha maturtiy.
Needs to be merged alongside openshift/kubernetes#2136
which enables per-option granularity

for more details: openshift/enhancements#1724

Signed-off-by: Francesco Romani <fromani@redhat.com>
@openshift-ci-robot
Copy link

@ffromani: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@ffromani
Copy link
Author

ffromani commented Dec 9, 2024

/retest

@ffromani
Copy link
Author

ffromani commented Dec 9, 2024

@haircommander @mrunalp could you PTAL again? the PR implements now the suggested approach I outlined and we agreed upon in openshift/enhancements#1724

@ffromani
Copy link
Author

ffromani commented Dec 9, 2024

/retest-required

2 similar comments
@ffromani
Copy link
Author

/retest-required

@ffromani
Copy link
Author

/retest-required

@ffromani
Copy link
Author

/test okd-scos-e2e-aws-ovn

{  openshift cluster install failed with cluster bootstrap}

@haircommander
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2024
@rphillips
Copy link

/test okd-scos-e2e-aws-ovn

// must override the base feature gate check. Relevant only for alpha (disabled by default).
// for beta options are enabled by default and we totally want to keep the possibility to
// disable them explicitly.
if alphaOptions.Has(option) && checkPolicyOptionHasEnablementFile(option) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this put the cluster in TPNU?

Copy link
Author

@ffromani ffromani Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the internal check: alphaOptions is the internal set of known cpumanager policy options, and this specific guard is meant to bypass the upstream alpha FG check for features which have the enablement file. It was implemented this way (vs extending the existing guard below) to make it as easy as possible to remove later on when the carry is no longer needed. It is meant to be equivalent to:

// CheckPolicyOptionAvailable verifies if the given option can be used depending on the Feature Gate Settings.
// returns nil on success, or an error describing the failure on error.
func CheckPolicyOptionAvailable(option string) error {
	if !alphaOptions.Has(option) && !betaOptions.Has(option) && !stableOptions.Has(option) {
		return fmt.Errorf("unknown CPU Manager Policy option: %q", option)
	}

	if alphaOptions.Has(option) {
		if checkPolicyOptionHasEnablementFile(option) {
			return nil
		}
		if !utilfeature.DefaultFeatureGate.Enabled(kubefeatures.CPUManagerPolicyAlphaOptions) {
			return fmt.Errorf("CPU Manager Policy Alpha-level Options not enabled, but option %q provided", option)
		}
	}

	if betaOptions.Has(option) && !utilfeature.DefaultFeatureGate.Enabled(kubefeatures.CPUManagerPolicyBetaOptions) {
		return fmt.Errorf("CPU Manager Policy Beta-level Options not enabled, but option %q provided", option)
	}

	return nil
}

(note that the feature is enabled if either the enablement file or the FG are enabled)

@ffromani
Copy link
Author

/test e2e-agnostic-ovn-cmd

{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:406]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-samples-operator\" produces more watch requests than expected: watchrequestcount=177, upperbound=118, ratio=1.5",
    ]
to be empty
Ginkgo exit error 1: exit with code 1}

@ffromani
Copy link
Author

: [sig-cli][Feature:LegacyCommandTests][Disruptive][Serial] test-cmd: test/cmd/images.sh [apigroup:image.openshift.io] expand_less 	23s
{  fail [github.com/openshift/origin/test/extended/cmd/cmd.go:111]: Expected
    <[]error | len:1, cap:1>: [
        <*errors.errorString | 0xc0012fb510>{
            s: "error waiting for the pod 'test-cmd' to complete:  imagestreamtag tag-c --from-image=quay.io/openshifttest/hello-openshift:openshift' expecting failure and text 'must be of the form <stream_name>:<tag>'\nRunning test/cmd/images.sh:108: executing 'oc create imagestreamtag tag-c:1 -A foo' expecting failure and text 'annotations must be of the form key=value, but is \"foo\"'...\nSUCCESS after 1.000s: test/cmd/images.sh:108: executing 'oc create imagestreamtag tag-c:1 -A foo' expecting failure and text 'annotations must be of the form key=value, but is \"foo\"'\nRunning test/cmd/images.sh:109: executing 'oc create imagestreamtag tag-c:2 --from=mysql --from-image=quay.io/openshifttest/hello-openshift:openshift' expecting failure and text '\\--from and --from-image may not be used together'...\nSUCCESS after 0.000s: test/cmd/images.sh:109: executing 'oc create imagestreamtag tag-c:2 --from=mysql --from-image=quay.io/openshifttest/hello-openshift:openshift' expecting failure and text '\\--from and --from-image may not be used together'\nRunning test/cmd/images.sh:111: executing 'oc get istag/tag:1 -o jsonpath={.image.dockerImageReference}' expecting success and text 'wildfly-centos7.*@sha256:'...\nSUCCESS after 0.000s: test/cmd/images.sh:111: executing 'oc get istag/tag:1 -o jsonpath={.image.dockerImageReference}' expecting success and text 'wildfly-centos7.*@sha256:'\nRunning test/cmd/images.sh:113: executing 'oc get istag/tag-b:1 -o jsonpath={.image.metadata.name}' expecting success and text 'sha256:a2812e358a6495ef37ead2e015b4f680c6d022a99375e095a73de319cd5ad53c'...\nFAILURE after 0.000s: test/cmd/images.sh:113: executing 'oc get istag/tag-b:1 -o jsonpath={.image.metadata.name}' expecting success and text 'sha256:a2812e358a6495ef37ead2e015b4f680c6d022a99375e095a73de319cd5ad53c': the command returned the wrong error code; the output content test failed\nThere was no output from the command.\nStandard error from the command:\nError from server (NotFound): imagestreamtags.image.openshift.io \"tag-b:1\" not found\n[ERROR] hack/lib/cmd.sh:30: `return \"${return_code}\"` exited with status 1.\n",
        },
    ]
to have length 0
Ginkgo exit error 1: exit with code 1}

@ffromani
Copy link
Author

/test okd-scos-e2e-aws-ovn

Copy link

openshift-ci bot commented Dec 11, 2024

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-ovn-cmd 6fded69 link false /test e2e-agnostic-ovn-cmd
ci/prow/okd-scos-e2e-aws-ovn 6fded69 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mrunalp mrunalp added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Dec 11, 2024
Copy link

openshift-ci bot commented Dec 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: ffromani, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 3c62f73 into openshift:master Dec 11, 2024
19 of 21 checks passed
@openshift-ci-robot
Copy link

@ffromani: Jira Issue OCPBUGS-44786: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-44786 has been moved to the MODIFIED state.

In response to this:

add support for the LLC alignment by pulling ahead of time u/s PR 126750.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani ffromani deleted the ocp-split-l3-cache branch December 11, 2024 15:39
@ffromani
Copy link
Author

/cherry-pick release-4.18

@openshift-cherrypick-robot

@ffromani: new pull request created: #2162

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-hyperkube
This PR has been included in build openshift-enterprise-hyperkube-container-v4.19.0-202412111909.p0.g3c62f73.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: kube-proxy
This PR has been included in build kube-proxy-container-v4.19.0-202412111909.p0.g3c62f73.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-pod
This PR has been included in build openshift-enterprise-pod-container-v4.19.0-202412111909.p0.g3c62f73.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-kube-apiserver-artifacts
This PR has been included in build ose-installer-kube-apiserver-artifacts-container-v4.19.0-202412111909.p0.g3c62f73.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants