-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-37955,SDN-4930,OCPBUGS-42616,SDN-5031,OCPBUGS-38753: [DownstreamMerge] 10-8-24 #2314
Conversation
Signed-off-by: Numan Siddique <numans@ovn.org>
This is needed in order to get net-attach-def client v1.7.3 [0] that can consume the UDN interfaces. [0] k8snetworkplumbingwg/multus-cni#1336 Signed-off-by: Ram Lavi <ralavi@redhat.com>
Before this commit, this was the network status we reported for a UDN pod: ``` kubectl get pods pod -ojsonpath="{@.metadata.annotations.k8s\.v1\.cni\.cncf\.io\/network-status}" | jq [ { "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.244.1.9", "fd00:10:244:2::9" ], "mac": "0a:58:0a:f4:01:09", "default": true, "dns": {} } ] ``` With it, we now report: ``` [ { "name": "ovn-kubernetes", "interface": "ovn-udn1", "ips": [ "10.128.0.3" ], "mac": "0a:58:0a:80:00:03", "default": true, "dns": {} }, { "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.244.1.6", "fd00:10:244:2::6" ], "mac": "0a:58:0a:f4:01:06", "default": false, "dns": {} } ] ``` This way, applications complying to the k8snetworkplumbingwg de-facto standard can be aware of the UDN interface information. We report the primary UDN first so the network-attachment-definition-client can identify which of the 2 interfaces to report as primary [0]. [0] k8snetworkplumbingwg/network-attachment-definition-client#71 Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> Co-authored-by: Ram Lavi <ralavi@redhat.com>
This allows us to add a separate context in the future, to test pods with primary UDN annotations set. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> Co-authored-by: Ram Lavi <ralavi@redhat.com>
Whenever we invoke `cmdAddWithGetCNIResultFunc` it will return a result with 2 interfaces: the first for the host side veth, and the second for the pod side veth (featuring the sandbox attribute). Check the source code at [0] to understand how it happens. Now that we have changed the production code to *also* reply with the result of adding the UDN interface, we are in fact returning 4 interfaces on the result of `cmdAddWithGetCNIResultFunc`: index: interface ---------------- 0: host side primary UDN 1: pod side primary UDN 2: host side cluster default network 3: pod side cluster default network [0] - https://github.com/ovn-org/ovn-kubernetes/blob/a1e88dcd0b79b6c5eb033daea178708993f8664e/go-controller/pkg/cni/helper_linux.go#L650 Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> Co-authored-by: Ram Lavi <ralavi@redhat.com>
After PR [0] was merged we now need to specify which IP families are available in the cluster whenever the `GetActiveNetworkForNamespace` function is used [1], otherwise we'll see errors like: ``` W0913 10:40:54.797673 126506 util.go:418] Skipping nad 'foo-ns/tenantred' as active network after failing parsing it with invalid subnet cnfiguration: error while parsing subnets: network tenantred is attempting to use ipv4 subnets but the cluster does not support ipv4 ``` [0] - ovn-kubernetes/ovn-kubernetes#4625 [1] - https://github.com/ovn-org/ovn-kubernetes/blob/4cf0a2e9b317a71aa15faa76ced99c88f04868c3/go-controller/pkg/util/util.go#L418 Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> Co-authored-by: Ram Lavi <ralavi@redhat.com>
Adding a new e2e test to check the network-status in the of udn primary pods. Signed-off-by: Ram Lavi <ralavi@redhat.com>
In Layer2 networks there is no join switch, only the cluster switch. Ensure that the ports are named appropriately, this is important for the logical router policies created for local node access. Signed-off-by: Patryk Diak <pdiak@redhat.com> (cherry picked from commit 967923ec75b5ec50d80e158a33441ece2eb82fdb)
For some reason we were not adding the reroutes to mpX interface required in LGW in L2 topology on the GR. This commit fixes that. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Co-Authored-by: Patryk Diak <pdiak@redhat.com> Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
after this PR: unable to get logical router GR_l2.network_ovn-worker2: object not found before this PR: object not found Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
syncNodeManagementPort should be called after the gwManager because we want to add the routes to the GR which is only created when the gwManager is run Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
See https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/default_network_controller.go#L920 In UDN L3/L2 controllers we were missing checking for mgmtPortFailed and accounting for that as part of nodeupdates. So retry for any errors from syncManagementPort was not happening. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
fedora: Update Dockerfile to OVN release ovn-24.09.0-33.
UDN: Design routes and policies on L2's GR correctly.
Fixes: ovn-kubernetes/ovn-kubernetes#4733 Signed-off-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Leverages the NAD Controller to also track primary networks. This should be faster than listing all NADs in a namespace and iterating through them to find the primary network for a namespace. Refactors network manager and nad controller. Changes the network cache to only be stored in network manager, rather than having a duplicate cache in NAD controller. NAD Controller is modified to only hold references to network names for NADs, and then queries network manager to get the networks. NAD Controller and Network Manager are now thread safe. Tests are also modified: - NAD Controller now implemented as an interface so that unit tests can implement a fake NAD controller for getting active network - Fully functioning OVN controller unit tests still leverage a fully functioning NAD controller. NCM may be faked out in these cases. - Fake entities moved to testing/nad package Co-authored-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> Signed-off-by: Tim Rozet <trozet@redhat.com>
Previously the function would check the informer cache which provides (in practical terms) real time access to if there is a primary network for this namespace as a NAD. This did not cover the case where a UDN could be generating the NAD, but the NAD has not been created yet. This commit addresses it by checking for a UDN (only when a NAD is not found for the namespace). This should help in UDN cases (the case we really care about) to ensure that the function returns an accurate result. Especially during times of delay where perhaps a pod handler and UDN handler are firing almost simultaneously. Right now the function will return an error indicating there is a primary network that hasn't been processed yet, letting the caller know to retry. We have seen in the past with time sensitive handlers that they may give up after 5 or so retries before the dependent controller has finished. In those cases, we usually add a wait in the handler itself and have around a 200-300ms timeout while we wait for the dependent handler (in this case the NAD handler) to finish. This commit may be expanded on in the future to accomodate that where needed. Signed-off-by: Tim Rozet <trozet@redhat.com>
On GRs we have an SNAT from the join port IP address to the external IP address for ICMP needs frag sent by OVN to the external network. In L2 UDNs, even though there is no join network, the internal port has two IPs, one for the pod subnet and another for the join subnet. The ICMP source address will be the first one in lexicographical order which might be the pod subnet one. The proper SNAT for the GR pod subnet address gets created in the gateway initialization. But: 1 when DisableSNATMultipleGWs=false, it is redundant because there is already a SNAT for the whole pod subnet 2 when DisableSNATMultipleGWs=true we end up with no SNAT because cleanupStalePodSNATs cleans it up becaause it never expected that SNAT This commit fixes openshift#2 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Patryk Diak <pdiak@redhat.com>
Signed-off-by: Patryk Diak <pdiak@redhat.com>
/cherry-pick release-4.17 |
@tssurya: once the present PR merges, I will cherry-pick it on top of release-4.17 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
UGH wrong PR please ignore bot |
/retest |
@kyrtapz , I didn't notice anything, but was there any work done that helped make it green? @tssurya , LMK if you want me to re-test anything to make sure. |
waiting on openshift/origin#29243 |
/retest |
@trozet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: maiqueb, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview |
/payload-job periodic-ci-openshift-microshift-release-4.18-periodics-e2e-aws-ovn-ocp-conformance-serial |
@tssurya: trigger 4 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/18a09ee0-9767-11ef-90a6-a9339fdf3790-0 |
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-ovn-kubernetes-2314-periodics-e2e-aws-ovn-ocp-conformance-serial/1851913062232100864 and https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-ovn-kubernetes-2314-periodics-e2e-aws-ovn-ocp-conformance/1851913062798331904 are not even installing gonna give up on that ... @neisw @dgoodwin : maybe TRT knows something around why these nightly payloads are not running properly due to :
|
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-ovn-kubernetes-2314-nightly-4.18-e2e-aws-ovn-serial/1851913063821742080 is not related to this PR:
|
/hold cancel |
fdcbde9
into
openshift:master
@trozet: Jira Issue OCPBUGS-42616: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-42616 has been moved to the MODIFIED state. Jira Issue OCPBUGS-38753: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-38753 has been moved to the MODIFIED state. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@tssurya: Failed to get PR patch from GitHub. This PR will need to be manually cherrypicked. Error messagestatus code 406 not one of [200], body: {"message":"Sorry, the diff exceeded the maximum number of files (300). Consider using 'List pull requests files' API or locally cloning the repository instead.","errors":[{"resource":"PullRequest","field":"diff","code":"too_large"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#list-pull-requests-files","status":"406"}In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-base |
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-microshift |
[ART PR BUILD NOTIFIER] Distgit: ose-ovn-kubernetes |
/retitle OCPBUGS-37955,SDN-4930,OCPBUGS-42616,SDN-5031,OCPBUGS-38753: [DownstreamMerge] 10-8-24 |
@trozet: Jira Issue OCPBUGS-37955: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-37955 has been moved to the MODIFIED state. Jira Issue OCPBUGS-42616 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state. Jira Issue OCPBUGS-38753 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
No description provided.