Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDN-5297,SDN-5508: DownStream Merge Sync from 4.18 [01-07-2025] #2410

Open
wants to merge 474 commits into
base: release-4.17
Choose a base branch
from

Conversation

jluhrsen
Copy link
Contributor

@jluhrsen jluhrsen commented Jan 7, 2025

πŸ“‘ Description

Fixes #

Additional Information for reviewers

βœ… Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

ricky-rav and others added 30 commits October 22, 2024 19:57
and fix typo in error message for gateway udn and NNC manager

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Split initGateway into two parts: initGatewayPreStart and initGatewayMainStart.

A later commit ("Split gateway start()") will change the execution flow so as to run initGatewayPreStart /before/ the Secondary Node Network Controller (SNNC) is started; initGatewayMainStart will run with the rest of Default Node Network Controller (DNNC) initialization.

The motivation behind this is that SNNC needs to reference openflow manager created in DNNC and SNNC is started before the rest of DNNC functionality is started.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Split Default Node Network Controller (DNNC) start into PreStart() and Start().

This changes the execution flow in order to run PreStart, which calls initGatewayPreStart, /before/ the Secondary Node Network Controller (SNNC) is started; Start(), which calls initGatewayMainStart, is run /after/ SNNC has been started and completes all DNNC initialization.

The motivation behind this is that SNNC needs to reference openflow manager created in DNNC and SNNC is started before the rest of DNNC functionality is started.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Add an E2E test that addresses https://issues.redhat.com/browse/OCPBUGS-41499

Once a UDN has been configured, any restart of OVNK caused ovnkube-controller to go into CLBO, due to SNNC referencing openflow manager before it actually gets created by DNNC.

Verify in this E2E test that removing and recreating an OVNK pod won't cause such issue.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Hybrid Overlay only supports IPv4 (or dual stack).

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
The "add" code for egress nodes eventually adds the default route on ovn_cluster_router to the gw router (CreateDefaultRouteToExternal), while the update path does not.

If ever the "add" event on an egress node failed, it'll be retried. Any update event on the node happening /before/ the retry add is done will result in an update even being processed first. In this case, the egress node will be added through the update path and the default route won't be added.

Let's add the default route also in the update path.

Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
OCPBUGS-41499: split gateway creation
As it is unused in any network controller except the node default
network controller.

Tha future plan is to strip PreStart from the node default network
controller as well and move that somewhere else.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Otherwise github terminates the job on timeout, doesn't give ginkgo
chance to print status so that we know why the timeout happened.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Remove PreStart from NetworkController interface
Signed-off-by: Peng Liu <pliu@redhat.com>
Set shard-conformance timeout lower tham github timeout
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Suppress remote node annotation missing
This commit temporarily changes the subnets to be
lower alphabetically than the default joinSubnet ranges
to fix the force-snat=router-ip bug we have when using
multiple networks. This will be reverted in a followup
PR and is a temporary fix to keep things moving for this
PR from CI perspective.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This helps with having less code duplicaton and cleaner
utils that can be reused.

Convert InvalidNetworkID to InvalidID

Reduces code duplicaton and enables using invalidID
value for more things than just networkID.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This annotation will store the tunnelIDs allocated by
cluster-manager for each node in each network.
This will be used in ovnkube-controller to create
remote LRPs in L2 UDNs.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Let's move the idAllocator in podAllocator to be a
tunnelIDAllocator in networkclustermanager and then
use that from nodeallocator and podallocator to
assign tunnelIDs to nodes and pods that belong to a
specific network.

At start up let's also make sure to populate the caches
with the correct tunnelIDs.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Leverage the new annotation created by CM
on the ovnkube-controller side and add the
remote LRPs. Also update the local LSP on
local zone side to add requested-tnl-key
across the transit switch.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Enable IP forwarding on the UDN management interface
for IPv4 enabled networks. IPv6 forwarding is
globally enabled.
Disable IP forwarding in the SGW UDN lane to catch
any potential issues caused by the restricted forwarding.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
Having a different parameter to use as the network_name option of the
localnet logical switch port allows the admin to create multiple
physical network attachment without having to reconfigure the physical
OVN bridge mappings.

This improves the admin's UX (less operations) and solution scalability
(since a single mapping can be re-used) and thus the size of the
ovn-bridge-mappings string can be kept low.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Adds an e2e tests that asserts the same bridge mapping can be shared
between multiple networks.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Add hybird overlay pod IPs to the namespace address_set
poroh and others added 15 commits December 11, 2024 17:04
Signed-off-by: Dmitry Porokh <dporokh@nvidia.com>
Today ovnkube-controller relies on ovnkube-node to annotate the mac
address so that it can properly program the OVN mgmt port. In practice
with UDN, this means cluster manager creates the node and allocates some
subnet info to the node, both the node and the ovnkube-controller side
get updates, but ovnkube-controller fails to program the first time
because it is waiting for the node side to configure the mac address.

This patch changes the behavior of ovn-kubernetes to calculate the MAC
address for the management port from the mgmt port IP address of the
first subnet for the network. For backwards compatibility, it will first
attempt to read the node annotation, and if it no mgmt MAC exists, it
will derive it from the mgmt IP of the subnet.

Signed-off-by: Tim Rozet <trozet@redhat.com>
With the change to no longer depend on mgmt port mac address annotation
on the node, the egress firewall test would unexpectedly now correctly
sync the management port during the test. This would cause
addAllowACLFromNode to execute and add an ACL for network policy to the
node switch. That in turn would cause this egress firewall test to fail
because it would see there are more than 0 ACLs on the switch.

This commit fixes it by properly scoping the lookup in the egress
firewall test for ACLs only applicable to egress firewall.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
L3 UDN: EgressIP hosted by primary interface (`breth0`)
This change may add 5ms delay on each informer start (on controller
start). It shouldn't be an issue because this delay is negligible
in compare to 100ms cache status polling interval.

Signed-off-by: Dmitry Porokh <dporokh@nvidia.com>
Instead of failing ignore EndpointSlices without a service
label set. These custom, user created slices should be ignored.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
Calculates mgmt port MAC rather than storing
Improves pod deletion with user defined networks
UDN Gateway: Ignore EndpointSlices without a service label
this is a d/s only fix. it's needed u/s because the feature needs
a kernel fix in the u/s test runners.

Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
OCPBUGS-41156, SDN-4930, OCPBUGS-44794, OCPBUGS-43354, OCPBUGS-43519, OCPBUGS-32754: Downstream Merge [12-04-2024]
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 7, 2025

@jluhrsen: This pull request references SDN-5297 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.17." or "openshift-4.17.", but it targets "openshift-4.18" instead.

This pull request references SDN-5508 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.17.z" version, but no target version was set.

In response to this:

πŸ“‘ Description

Fixes #

Additional Information for reviewers

βœ… Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 7, 2025
@openshift-ci openshift-ci bot requested review from abhat and jcaamano January 7, 2025 22:48
Copy link
Contributor

openshift-ci bot commented Jan 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jan 7, 2025

/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-aws-ovn-hypershift-conformance-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-vsphere-ovn-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test openshift-e2e-gcp-ovn-techpreview-upgrade
/payload 4.17 ci blocking
/payload 4.17 nightly blocking

Copy link
Contributor

openshift-ci bot commented Jan 7, 2025

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/98cb6d30-cd49-11ef-8585-44558cf7296a-0

trigger 9 job(s) of type blocking for the nightly release of OCP 4.17

  • periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.17-fips-payload-scan
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/98cb6d30-cd49-11ef-8585-44558cf7296a-1

openshift-merge-bot bot and others added 2 commits January 8, 2025 23:45
…2-19-2024

SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024]
@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jan 9, 2025

/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-aws-ovn-hypershift-conformance-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-vsphere-ovn-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test openshift-e2e-gcp-ovn-techpreview-upgrade
/payload 4.17 ci blocking
/payload 4.17 nightly blocking

Copy link
Contributor

openshift-ci bot commented Jan 9, 2025

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b8a761a0-ce1f-11ef-8f0d-6709587af129-0

trigger 9 job(s) of type blocking for the nightly release of OCP 4.17

  • periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.17-fips-payload-scan
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b8a761a0-ce1f-11ef-8f0d-6709587af129-1

@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jan 9, 2025

/test 4.17-upgrade-from-stable-4.16-images

@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jan 9, 2025

/test 4.17-upgrade-from-stable-4.16-images
/test e2e-metal-ipi-ovn-dualstack-local-gateway
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-metal-ipi-ovn-techpreview

@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jan 9, 2025

/test 4.17-upgrade-from-stable-4.16-images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.