Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] ANP PRE MERGE TESTING on UDNs #2408

Open
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented Jan 6, 2025

πŸ“‘ Description

Fixes #

Additional Information for reviewers

βœ… Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

jcaamano and others added 30 commits December 13, 2024 17:17
Hoping to bring improvement to the naming and interfacing of the nad
controller package to reduce confusion. No functional changes but,
obviously, broad code impact. On to the details...

Rename package network-attach-def-controller to networkmanager. The
controller struct itself has been unexported and instead two interfaces
are provided: Controller to start and stop, and NetworkManager to
provide (for now) read-only information about networks. A default
implementation is provided that assumes the default network is the only
ever existing network, to be used when multi-network capabilities are
disabled or for testing. Other unexported structs have also been renamed
accordingly as well as other types that followed the former naming scheme.

Rename package network-controller-manager to controllemanager,
understood as a package that manages other controllers that can be of any
nature: network specific, network aware or network unaware. The
interfaces that controller managers need to implement to interface with
network manager have also been renamed accordingly as well as other
types that followed the former naming scheme.

For reasons beyond my comprehension, it befell to me coming accross, and
fixing, a number of unrelated issues, like flaking unit tests, missing
nftables stubbing to be able to run the tests on the testing container,
etc... that given the broad impact of this commit, I decided to fix
here instead of bothering with new commits. While I was at it, fixed all
go vet warnings on impacted files because it bothers me to have my IDE
painted yellow. We should add that to our linter. Moving on...

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Consolidate all getters in NetInfo becoming the single interface to
obtain network information intended to be used broadly as read-only.

Consolidate all setters in new MutableNetInfo interface. A
MutableNetInfo can only be built from a copy of NetInfo. Add setters for
route advertisements related information. This is intended to be used by
a network manager to aggregate consolidated information about the
network from multiple sources that can change over time.

Add a ReconcilableNetInfo interface acting as a receiver by which
changes done in a MutableNetInfo can be incorporated into an existing
NetInfo. A ReconcilableNetInfo can only be built from a copy of NetInfo.
It is  intended to be used by top level network controllers to reconcile
changes to network information when instructed by a network manager.

New free methods IsNetworkCompatible, DoesNetworkNeedReconciliation, and
Reconcile facilitate the coordination of the reconciliation.

The overall idea is to have a tighter control of NetInfo changes and
making sure that controllers become aware of those changes in a way that
allows the controllers to reconcile them.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Add the ability for network controllers to reconcile some network
information changes. Currently just changes of the VRFs the network is
leaking/advertising to. Support for reconciling NAD changes is not
included in this commit.

Currently reconciles:
- for zone network controllers to configure or not the pod IP to node IP
  SNAT on the GR for a nodes local to the zone
- for node network controllers to configure or not br-ex flows to
  redirect pod IP ingress traffic to the OVN network (except ingress to
  the management ip address). Only done for the default network, UDNs
  will be handled on a later commit.

This should be enough to provide direct ingress capabilities for the
default network in SGW mode.

Note that non-primary network controllers don't reconcile anything as
route advertising is not supported on them. Also cluster manager network
controllers don't reconcile much as they don't have the need.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Network manager will, upon ensuring a network, fetch relevant
information from RouteAdvertisements (such as VRFs being advertised on
and selected nodes to advertise EIPs) applying to the network and set
it on the corresponding NetInfo used to create the network controller,
or triggering a reconciliation if it was already running.

This relies on an annotation that will be set by cluster manager on NADs
pointing to the RouteAdvertisements that apply to the network (future
commit)

This will also be done for the default network. This commit assumes
cluster manager will create a dummy NAD for the default network in ovn-k
namespace were the annotaiton will be set. The NAD controller will
process the NAD and ensure the network on the network controller, just
like with any other network. The network controller gains access to the
default network controller, however, it treats that controller
differently: it's not started or stopped, obviously, just reconciled.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Fix kind export logs to address the following warning message:

  $ kind export logs --name ${KIND_CLUSTER_NAME} --loglevel=debug /tmp/kind/logs
  WARNING: --loglevel is deprecated, please switch to -v and -q!
  please adjust the command so it is equivalent and works on latest kind release

Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
This commit integrates ocp-traffic-flow-tests into the existing
GitHub E2E workflows. The control tests can now be executed as
a standalone lane.

Currently, the traffic flow tests use a specific commit from the
repository https://github.com/wizhaoredhat/ocp-traffic-flow-tests.git.
This is a temporary solution until the repository becomes part
of the ovn-kubernetes GitHub organization.

Next steps to look into, after this commit / pr:
- External and secondary network tests
- Add bandwidth thresholds that would be reasonable for Github
- Move repo to ovn-kubernetes org

Fixes: ovn-kubernetes/ovn-kubernetes#4756
Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
Some time echoserver in VM could take more than 20s to start up, so increase
timeout to 60 seconds to avoid flakiness.

Signed-off-by: Lei Huang <leih@nvidia.com>
Direct ingress for default network in SGW mode with route advertisements
'ubuntu-latest' now references ubuntu 24.04.
Previously, it referenced 22.04.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Statically set GH runner image to ubuntu 22.04 for tests
Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
This reverts commit bae7288.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Looks like this path was missed and causes egress IP with UDNs on local
gateway mode to not function correctly. Reply traffic was being sent to
default cluster network patch port rather than the correct UDN patch
port.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Previous to this change in dualstack, we are not
attempting to match the mgmt port IP family with
the EgressIP family.

Single stack clusters performed as expected.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Follow up work is required to run IPv6 CDN tests
in dualstack env because currently only IPv4 tests
are executed for dual stack clusters.
Upstream CI executes IPv6 tests for single stack IPv6
clusters only.

Also, skip test 'replies to egress IP packets that require fragmentation'
because it is broken. Tracked by downstream bug OCPBUGS-46476.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Fix for unexistent package msbuild in latest ubuntu images
Add EgressIP E2Es to Net Seg lanes
Latest metallb repo support dualstack at dev-env let's pin to it.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Add testing for:
- External clients to LB, NodePort
- Podify clients to LB

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
…-e2e

udn, e2e: Add external client to nodeport and loadblanacer services tests
Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
ovs-doca requires mtu_request to be set on ports metadata for fragmented packets

Signed-off-by: Alin Gabriel Serdean <aserdean@nvidia.com>
PR #4907 used ginkgo v1 to utilize DescribeTable
but we already moved away from that as seen in
PR #4749

Also, executed 'go mod tidy && go mod vendor'
as go mod file was stale.

Follow up work is required to detect this scenario
in CI.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Since it will be part of overall network reconciliation when its
advertising configuration changes.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
Used on secondary nic flows, not needed if pod network is advertised as
traffic is supposed to reach the node hosting the pod directly.

However, we add an SNAT rule for local IPT traffic flows. Otherwise the
node masquerade address is used as it is set as the source address in
the services route. Note that in this case the linux source selection
algorithm just looks at the default routing table and does not look at
non default, at least when the ip rule is based on packet marks set
through conntrack.

Signed-off-by: Jaime CaamaΓ±o Ruiz <jcaamano@redhat.com>
kyrtapz and others added 14 commits December 18, 2024 16:44
A service is added to the cache in AddEndpointSlice
before the rules. If addServiceRules fails the newly
added cache entry should be removed to allow for a proper
retry.

Signed-off-by: Patryk Diak <pdiak@redhat.com>
Because on subsequent runs, another host net test pod may
attempt to bind to the same port.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Handle host-network pods as default network.
Don't return per-pod errors on startup.
Remove nadController from UDNHostIsolationManager as we don't use it
anymore to find pod's UDN based on NADs that exist in the namespace.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
SDN-5139: Direct ingress for default network in LGW mode with route advertisements
Ensure only ginkgo v2 for DescribeTable support
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Remove service from cache on add failure
…face

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
This code isnt being used anymore. We dont expect users
to upgrade directly from code which contained the legacy LRPs,
therefore its safe to remove.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
L2 UDN: EgressIP hosted by primary interface (`breth0`)
If EncapIP is configured, it means it is different from the node's
primary address. Do not update EncapIP when node's primary address
changes.

Signed-off-by: Yun Zhou <yunz@nvidia.com>
@openshift-ci openshift-ci bot requested review from abhat and JacobTanenbaum January 6, 2025 18:04
Copy link
Contributor

openshift-ci bot commented Jan 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2025
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@tssurya tssurya force-pushed the anp-pre-merge-testing-DNM branch from fd0505b to 54874c4 Compare January 6, 2025 19:13
Copy link
Contributor

openshift-ci bot commented Jan 7, 2025

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-techpreview 54874c4 link false /test e2e-vsphere-ovn-techpreview
ci/prow/e2e-metal-ipi-ovn-ipv6-techpreview 54874c4 link false /test e2e-metal-ipi-ovn-ipv6-techpreview
ci/prow/e2e-aws-ovn-single-node-techpreview 54874c4 link false /test e2e-aws-ovn-single-node-techpreview
ci/prow/e2e-metal-ipi-ovn-techpreview 54874c4 link false /test e2e-metal-ipi-ovn-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview 54874c4 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview
ci/prow/okd-scos-e2e-aws-ovn 54874c4 link false /test okd-scos-e2e-aws-ovn
ci/prow/security 54874c4 link false /test security
ci/prow/e2e-azure-ovn-techpreview 54874c4 link false /test e2e-azure-ovn-techpreview
ci/prow/e2e-azure-ovn 54874c4 link false /test e2e-azure-ovn
ci/prow/4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade 54874c4 link true /test 4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.