SDN-4930: Downstream Merge [01-08-2025] #2412

jluhrsen · 2025-01-09T00:49:22Z

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation
if so, I have updated the documentation as required
My code requires tests
if so, I have added and/or updated the tests as required
All the tests have passed in the CI

How to verify it

Handle host-network pods as default network. Don't return per-pod errors on startup. Remove nadController from UDNHostIsolationManager as we don't use it anymore to find pod's UDN based on NADs that exist in the namespace. Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

udn host isolation: fix initialSync.

…face Signed-off-by: Martin Kennelly <mkennell@redhat.com>

This code isnt being used anymore. We dont expect users to upgrade directly from code which contained the legacy LRPs, therefore its safe to remove. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

Signed-off-by: Martin Kennelly <mkennell@redhat.com>

L2 UDN: EgressIP hosted by primary interface (`breth0`)

If EncapIP is configured, it means it is different from the node's primary address. Do not update EncapIP when node's primary address changes. Signed-off-by: Yun Zhou <yunz@nvidia.com>

Assign network ID from network manager running in cluster manager. The network ID is included in NetInfo and annotated on the NAD along with the network name. Network managers running in zone & node controllers will read the network ID from the annotation to set it on NetInfo. On startup, network manager running in cluster manager will read the network IDs annotated on the nodes to cover for the upgrade scenario. Network IDs will still be annotated on the nodes because this PR does not transition all the code to use the network ID from the NetInfo instead of the node annotation. That will have to be done progressively. This have several benefits, among them: - NetworkID is available sooner overall since we dont have to wait for all the nodes to be annotated - No need to unmarshall the node annotation to get the network IDs, they are available in NetInfo - No need to unmashall the NAD to get the network name, can be accessed directly from the annotation. If a network is replaced with a different one with the same name, the network ID is reused as the respective network controller will not start as the previous one is stopped and cleaned up so it shouldn't be a problem. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Instead of considering managed VRFs those that follow the mp<id>-udn-vrf naming template, use the table number: those vrfs associated to a table within our reserved block of table numbers are managed by us. The block right now is anything higher than RoutingTableIDStart (1000). This allows to manage VRFs with any name which is desirable if the name is going to be exposed through BGP. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Anticipating that these VRF names are going to be exposed through BGP, we should to use friendlier names for our VRFs. The most natural name to use is the network name. Thus giving a cluster UDN a name below 15 characters that matches an already existing VRF not managed by ovn-k will fail. This is considered an admin problem and not an ovn-k problem for now. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Was causing deadlocks in unit tests Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

…heir subcontrollers Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Assuming that there is three types of controllers, being: network agnostic, network aware and network specific; we were already notifying network specific controllers of network changes. But network aware controllers, controllers for which we have a single instance capable of managing multiple networks, had no code path to be informed of netwokr changes. This commit adds a code path for that and makes the RouteAdvertisments controller aware of network changes. Changed ClusterManager to be the controller manager for cluster manager instead of secondaryNetworkClusterManager. It just makes more sense that way sice ClusterManager is the top level manager. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

…twork exist test Signed-off-by: Or Mergi <ormergi@redhat.com>

On CUDN cleanup is inconsistent as we see some flaky tests due to CUDN "already exist" errors, implying object are not actually deleted. Wait for CUDN object be gone when deleted Signed-off-by: Or Mergi <ormergi@redhat.com>

CUDN is cluster-scoped object, in case tests running in parallel, having random names avoids conflicting with other tests. Use random metadata.name for CUDN objects. The "isolates overlapping CIDRs" tests create objects based on the 'red' and 'blue' variables, including CUDN objects. Change the tests CUDN creation use random names and update the given 'networkAttachmentConfigParams' with the new generated name. Update 'red' & 'blue' vaiables with the generated name, carried by 'networkAttachmentConfigParams' (netConfig.name). The pod2Egress tests asserts on the CUDN object name given by 'userDefinedNetworkName'. In practice the tests netConfigParam.name is userDefinedNetworkName. Change the assertion to check the given netConfigParam. Signed-off-by: Or Mergi <ormergi@redhat.com>

Signed-off-by: nithyar <nithyar@nvidia.com>

e2e, CUDN: Improve stability

Reconcile RouteAdvertisements in cluster manager

Add missing enum validation for RouteAdvertisements

The NetPol test checks assigned pod IP only against IPv4 subnet which would fail on IPv6 only cluster. This commit fixes it by checking on all valid CIDRs. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

The variable ginkgo_focus is misspelled as gingko_focus. As the latter var is not used anywhere else in this repo and is used to concatenate the var ginkgo_focus in the next line to ginkgoargs it seems to be a typo. Fixes: #4942 Signed-off-by: Felix Schumacher <felix.schumacher@internetallee.de>

This commit adds a new controller to import BGP learnt routes into OVN. The controller runs in ovnkube-controller so it only supports IC architecture where ovnkube-controller has kernel access on each node. Networks should register to this controller to have routes imported for them. Routes are imported into the network's gateway router. Multipath routes are supported. The controller subscribes for netlink route events. When a route is updated, the corresponding network is queued to be sync'ed. A network is also sync'ed when registered to the controller. Synchronizations are delayed by a small amount of time to prevent a series of consecutive route updates so synchornize the same network twice. Synchronizations apply the difference between current and desired state. The controller subscribes to netlink link events to learn the routing table associated to a network vrf. The network is inferred from the vrf device name. When learning the routing table, the corresponding network is queued to be sync'ed. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

Fix subnet check for assigned pod IPs

go-controller: fix typo in test script

Import learnt BGP routes into OVN

openshift-ci-robot · 2025-01-09T00:49:28Z

@jluhrsen: This pull request references SDN-4930 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.19." or "openshift-4.19.", but it targets "openshift-4.18" instead.

In response to this:

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation

if so, I have updated the documentation as required

My code requires tests

if so, I have added and/or updated the tests as required

All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-01-09T00:50:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign trozet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jluhrsen · 2025-01-09T00:54:58Z

/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-aws-ovn-hypershift-conformance-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-vsphere-ovn-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test openshift-e2e-gcp-ovn-techpreview-upgrade
/payload 4.19 ci blocking
/payload 4.19 nightly blocking

openshift-ci · 2025-01-09T00:55:04Z

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.19

periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/59d94350-ce24-11ef-9fe7-2791c9a6bd2e-0

trigger 14 job(s) of type blocking for the nightly release of OCP 4.19

periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-nightly-4.19-fips-payload-scan
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6
periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance
periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance-serial
periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/59d94350-ce24-11ef-9fe7-2791c9a6bd2e-1

jluhrsen · 2025-01-09T03:12:40Z

/test e2e-azure-ovn-techpreview

jluhrsen · 2025-01-09T16:35:59Z

/test e2e-aws-ovn-serial
/test e2e-azure-ovn-upgrade
/test e2e-metal-ipi-ovn-dualstack-techpreview

jluhrsen · 2025-01-09T16:37:52Z

/payload-aggregate periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance 10
/payload-aggregate periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance-serial 10
/payload-aggregate periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn 10

openshift-ci · 2025-01-09T16:37:57Z

@jluhrsen: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance
periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance-serial
periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/12975570-cea8-11ef-820f-aade8edf1c59-0

openshift-ci · 2025-01-09T22:29:32Z

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/security	`65a3e28`	link	false	`/test security`
ci/prow/e2e-openstack-ovn	`65a3e28`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-azure-ovn-upgrade	`65a3e28`	link	true	`/test e2e-azure-ovn-upgrade`
ci/prow/e2e-metal-ipi-ovn-ipv6-techpreview	`65a3e28`	link	false	`/test e2e-metal-ipi-ovn-ipv6-techpreview`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jluhrsen · 2025-01-13T19:45:48Z

/test e2e-azure-ovn-upgrade
/test 2e-openstack-ovn
/test 4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
/test 4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
/test 4.19-upgrade-from-stable-4.18-images

jluhrsen · 2025-01-13T19:47:42Z

/payload-job periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn 1

openshift-ci · 2025-01-13T19:47:52Z

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/417d8390-d1e7-11ef-8bd8-742fa6af9ebf-0

jluhrsen · 2025-01-13T19:49:16Z

/payload-job periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade 10

openshift-ci · 2025-01-13T19:49:18Z

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/78df37c0-d1e7-11ef-945e-d8e140fc0948-0

jluhrsen · 2025-01-13T19:50:27Z

/payload 4.19 ci blocking

openshift-ci · 2025-01-13T19:50:30Z

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.19

periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a3a513d0-d1e7-11ef-97fe-20124f3e1665-0

npinaeva and others added 30 commits December 18, 2024 20:23

Add missing enum validation for RouteAdvertisements

06c26bc

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Merge pull request #4930 from npinaeva/udn-isolation-hostnet

d456afd

udn host isolation: fix initialSync.

Add UDN Layer2 support for an Egress IP assigned to the primary inter…

6c4e021

…face Signed-off-by: Martin Kennelly <mkennell@redhat.com>

EIP/ESVC: remove code to remove legacy no reroutes for nodes

1aedb9a

This code isnt being used anymore. We dont expect users to upgrade directly from code which contained the legacy LRPs, therefore its safe to remove. Signed-off-by: Martin Kennelly <mkennell@redhat.com>

EIP E2Es: create const for httpd image name and add image tag

33f957d

Signed-off-by: Martin Kennelly <mkennell@redhat.com>

Merge pull request #4833 from martinkennelly/eip-l2

d8d42f1

L2 UDN: EgressIP hosted by primary interface (`breth0`)

Do not update EncapIP if it is configured

6c366bb

If EncapIP is configured, it means it is different from the node's primary address. Do not update EncapIP when node's primary address changes. Signed-off-by: Yun Zhou <yunz@nvidia.com>

Add FRRConfiguration to factory

90136b5

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Reconcile RouteAdvertisements from cluster manager

3d1ec7a

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Fix receiver names of nad controller

4c0eb71

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Fix deadlock when comparing mutableNetInfo with self

7c03f3f

Was causing deadlocks in unit tests Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Fix network controllers constructors not using the same NetInfo for t…

a13297e

…heir subcontrollers Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Fix egress IP tests

81449dd

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

do not assign an ID to network still used by a controller being stopped

e2b9f0a

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

test,networksegmentation: Wait for CUDN status populate in primary ne…

f401ff5

…twork exist test Signed-off-by: Or Mergi <ormergi@redhat.com>

test,networksegmentation: Wait for CUDN object be gone

30f1e6c

On CUDN cleanup is inconsistent as we see some flaky tests due to CUDN "already exist" errors, implying object are not actually deleted. Wait for CUDN object be gone when deleted Signed-off-by: Or Mergi <ormergi@redhat.com>

Add nftables binaries to ubuntu arm image

4e344ec

Signed-off-by: nithyar <nithyar@nvidia.com>

Fix issues in DPU host initialization

97d0504

Signed-off-by: nithyar <nithyar@nvidia.com>

Merge pull request #4842 from ormergi/e2e-cudn-fix

e4b585c

e2e, CUDN: Improve stability

Merge pull request #4691 from jcaamano/cm-routeadvertisements

ff34493

Reconcile RouteAdvertisements in cluster manager

Merge pull request #4934 from jcaamano/advertisements-enum-validation

326f9db

Add missing enum validation for RouteAdvertisements

Fix subnet check for assigned pod IPs

48a7d60

The NetPol test checks assigned pod IP only against IPv4 subnet which would fail on IPv6 only cluster. This commit fixes it by checking on all valid CIDRs. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

jcaamano and others added 6 commits January 8, 2025 11:49

RouteImport: change re-subscription timer to ticker

465b128

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Add ipv6 network-segmentation test lane

b35ff47

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

Fix e2e test UDN CIDRs for IPv6 only cluster

064f0d2

Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>

Merge pull request #4943 from pperiyasamy/fix-netpol-e2e-ipv6

93be65f

Fix subnet check for assigned pod IPs

Merge pull request #4945 from FSchumacher/fix-typo-in-test-go-sh

79380a1

go-controller: fix typo in test script

Merge pull request #4835 from jcaamano/import-routes

4153d10

Import learnt BGP routes into OVN

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 9, 2025

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2025

openshift-ci bot requested review from JacobTanenbaum and trozet January 9, 2025 00:50

Merge remote-tracking branch 'ovn-org/master' into d/s-merge-01-08-2025

65a3e28

jluhrsen force-pushed the d/s-merge-01-08-2025 branch from 1759248 to 65a3e28 Compare January 9, 2025 00:51

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDN-4930: Downstream Merge [01-08-2025] #2412

SDN-4930: Downstream Merge [01-08-2025] #2412

jluhrsen commented Jan 9, 2025

openshift-ci-robot commented Jan 9, 2025 •

edited by openshift-ci bot

Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci bot commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

jluhrsen commented Jan 13, 2025

jluhrsen commented Jan 13, 2025

openshift-ci bot commented Jan 13, 2025

jluhrsen commented Jan 13, 2025

openshift-ci bot commented Jan 13, 2025

jluhrsen commented Jan 13, 2025

openshift-ci bot commented Jan 13, 2025

SDN-4930: Downstream Merge [01-08-2025] #2412

Are you sure you want to change the base?

SDN-4930: Downstream Merge [01-08-2025] #2412

Conversation

jluhrsen commented Jan 9, 2025

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci-robot commented Jan 9, 2025 • edited by openshift-ci bot Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci bot commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

jluhrsen commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

jluhrsen commented Jan 13, 2025

jluhrsen commented Jan 13, 2025

openshift-ci bot commented Jan 13, 2025

jluhrsen commented Jan 13, 2025

openshift-ci bot commented Jan 13, 2025

jluhrsen commented Jan 13, 2025

openshift-ci bot commented Jan 13, 2025

openshift-ci-robot commented Jan 9, 2025 •

edited by openshift-ci bot

Loading