-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-36753,OCPBUGS-36754,OCPBUGS-36755OCPBUGS-36756: [release-4.14] Critical Bugs #961
OCPBUGS-36753,OCPBUGS-36754,OCPBUGS-36755OCPBUGS-36756: [release-4.14] Critical Bugs #961
Conversation
The generic plugin was applying config changes only if the desired spec of interfaces was different from the last applied spec. This logic is different from the one in OnNodeStateChange where the real status of the interfaces is used to detect changes. By removing the LastState parameter (and related code), the generic plugin will also use the real status of interfaces to decide whether to apply changes or not. The SyncNodeState function has this logic.
Users could modify the settings of VFs which have been configured by the sriov operator. This PR starts the reconciliation loop when these changes are detected in the generic plugin. Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
Logic to check missing kernel arguments is placed in a method to be used by both OnNodeStateChange and CheckStatusChanges.
Webhook resources (`ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration`) in OpenShift are configured with `service.beta.openshift.io/inject-cabundle` in a way that a third component fills the ClientConfig.CABundle field of the webhook. When reconciling webhooks, do not override the field and avoid a flakiness, as there might be a time slot in which the API server is not configured with a valid client certificate: ``` Error from server (InternalError): error when creating "policies": Internal error occurred: failed calling webhook "operator-webhook.sriovnetwork.openshift.io": failed to call webhook: Post "https://operator-webhook-service.openshift-sriov-network-operator.svc:443/mutating-custom-resource?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority ``` The same behavior also happens when using CertManager Refs: - https://docs.openshift.com/container-platform/4.15/security/certificates/service-serving-certificate.html - https://issues.redhat.com/browse/OCPBUGS-32139 - https://cert-manager.io/docs/concepts/ca-injector/ Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
we need to be consistent with the policy order Signed-off-by: Sebastian Sch <sebassch@gmail.com>
When the MTU set in the SRIOV Network Node Policy is lower than the actual MTU of the PF, it triggers the reconcile loop for the Node state indefinitely, preventing the configuration from completing. Signed-off-by: amaslennikov <amaslennikov@nvidia.com>
If a Virtual Function is configured with a DPDK driver (e.g. `vfio-pci`) and it is not referred by any SriovNetworkNodePolicy, `NeedToUpdateSriov` function must not trigger a reconfiguration. This may happen if a PF is configured by multiple policies (via PF partitioning) and a policy is deleted by the user. In these cases, the VF is not reconfigured [1] and a drain loop is started The same logic applies to VDPA devices. refs: [1] https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/5f3c4e903f789aa177fe54686efd6c18576b7ab1/pkg/host/internal/sriov/sriov.go#L457 Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
It's possible to have a race in the VFIsReady function. vf netdevice can have a default eth0 device name and be the time we call the netlink syscall to get the device information eth0 can be a different device. this cause duplicate mac allocation on vf admin mac address Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
/jira cherrypick OCPBUGS-36734 |
@zeeke: Jira Issue OCPBUGS-36734 has been cloned as Jira Issue OCPBUGS-36753. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira cherrypick OCPBUGS-36733 |
@zeeke: Jira Issue OCPBUGS-36733 has been cloned as Jira Issue OCPBUGS-36754. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira cherrypick OCPBUGS-36731 |
@zeeke: Jira Issue OCPBUGS-36731 has been cloned as Jira Issue OCPBUGS-36755. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira cherrypick OCPBUGS-36730 |
@zeeke: Jira Issue OCPBUGS-36730 has been cloned as Jira Issue OCPBUGS-36756. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@zeeke: This pull request references Jira Issue OCPBUGS-36753, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36754, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-36755, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-36756, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@zeeke: This pull request references Jira Issue OCPBUGS-36753, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36754, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36755, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36756, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@SchSeba , @evgenLevin, @wizhaoredhat , @ajaggapa Please take a look and put labels if it looks good |
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SchSeba, wizhaoredhat, zeeke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
9e7bc3a
into
openshift:release-4.14
[ART PR BUILD NOTIFIER] This PR has been included in build sriov-network-webhook-container-v4.14.0-202407100810.p0.g9e7bc3a.assembly.stream.el8 for distgit sriov-network-webhook. |
4.14 backport of:
Webhook.ClientConfig.CABundle
k8snetworkplumbingwg/sriov-network-operator#711cc @SchSeba, @mlguerrero12
ref: