Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ip full ip/v3/p9 #1

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Ip full ip/v3/p9 #1

wants to merge 9 commits into from

Conversation

numansiddique
Copy link
Owner

No description provided.

@@ -742,6 +862,10 @@ consider_port_binding_for_vif(const struct sbrec_port_binding *pb,
child = local_binding_create(pb->logical_port, lbinding->iface,
pb, binding_type);
local_binding_add_child(lbinding, child);
if (b_ctx_out->tracked_dp_bindings) {
tracked_binding_datapath_lport_add(
b_ctx_out->tracked_dp_bindings, pb, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not calling "local_datapath_update_lport_count()" here to increase the port count in local datapath?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@@ -1154,9 +1182,13 @@ static bool
runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
{
struct ed_type_runtime_data *rt_data = data;
struct ed_type_runtime_tracked_data *tracked_data = node->tracked_data;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call
en_runtime_clear_tracked_data(node->tracked_data);

here as well

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need. engine will take care of clearing the tracked data.

struct binding_ctx_in b_ctx_in;
struct binding_ctx_out b_ctx_out;
init_binding_ctx(node, rt_data, &b_ctx_in, &b_ctx_out);
tracked_data->tracked = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call
en_runtime_clear_tracked_data(node->tracked_data);

here as well

@@ -1189,6 +1221,10 @@ runtime_data_sb_port_binding_handler(struct engine_node *node, void *data)
return false;
}

struct ed_type_runtime_tracked_data *tracked_data = node->tracked_data;
tracked_data->tracked = true;
b_ctx_out.tracked_dp_bindings = &tracked_data->tracked_dp_bindings;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why u missed
b_ctx_out.local_lports_changed = &tracked_data->local_lports_changed;

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because it is meant to track ovs interfaces with iface-id set.
This is required to update the condition monitoring.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in the below sequence ..
1)ovn-nbctl ls-add sw0
2)ovs-vsctl add-port br-int p1 --
set Interface p1 external_ids:iface-id=sw0-port1
3)ovn-nbctl lsp-add sw0 sw0-port1

We step 3 will add the local binding and claim the port.
Step 2 will run runtime_data_ovs_interface_handler, but local binding is not created as port doesn't exist in SBDB
Step 3 will result in runtime_data_sb_port_binding_handler, which adds local binding. So in this case, we need to update b_ctx_out.local_lports_changed as this flag is used in flow_output_runtime_data_handler

Am I missing anything here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_binding is created in step 2.

…and binding_ctx_out.

No functional changes are introduced in this patch.

Signed-off-by: Numan Siddique <numans@ovn.org>
… state.

This patch adds a new data structure - 'struct local_binding' which represents
a probable local port binding. A new object of this structure is creared for
each interface present in the integration bridge (br-int) with the
external_ids:iface-id set. This struct has 2 main fields
 - 'struct ovsrec_interface *' and 'struct sbrec_port_binding *'. These fields
are updated during port claim and release.

A shash of the local bindings is maintained with the 'iface-id' as the hash key
in the runtime data of the incremental processing engine.

This patch helps in the upcoming patch to add incremental processing support
for OVS interface and SB port binding changes.

Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique and others added 7 commits April 15, 2020 14:00
…data.

This patch handles SB port binding changes and OVS interface changes
incrementally in the runtime_data stage which otherwise would have
resulted in calls to binding_run().

Prior to this patch, port binding changes were handled differently.
The changes were only evaluated to see if they need full recompute
or they can be ignored.

With this patch, the runtime data is updated with the changes (both
SB port binding and OVS interface) and ports are claimed/released in
the handlers itself, avoiding the need to trigger recompute of runtime
data stage.

Signed-off-by: Numan Siddique <numans@ovn.org>
This patch adds partial support of incremental processing of datapath binding.
If a datapath is deleted, then a full recompute is triggered if that
datapath is present in the 'local_datapaths' hmap of runtime data.

Signed-off-by: Numan Siddique <numans@ovn.org>
… actions have changed.

If ofctrl_check_and_add_flow(F') is called where flow F' has match-actions (M, A2)
and if there already exists a flow F with match-actions (M, A1) in the desired flow
table, then this function  doesn't update the existing flow F with new actions
actions A2.

This patch is required for the upcoming patch in this series which
adds incremental processing for OVS interface in the flow output stage.
Since we will be not be clearing the flow output data if these changes
are handled incrementally, some of the existing flows will be updated
with new actions. One such example would be flows in physical
table OFTABLE_LOCAL_OUTPUT (table 33). And this patch is required to
update such flows. Otherwise we will have incomplete actions installed.

Signed-off-by: Numan Siddique <numans@ovn.org>
…put stage.

This patch handles ct zone changes and OVS interface changes incrementally
in the flow output stage.

Any changes to ct zone can be handled by running physical_run() instead of running
flow_output_run(). And any changes to OVS interfaces can be either handled
incrementally (for OVS interfaces representing VIFs) or just running
physical_run() (for tunnel and other types of interfaces).

To better handle this, a new engine node 'physical_flow_changes' is added which
handles changes to ct zone and OVS interfaces.

Signed-off-by: Numan Siddique <numans@ovn.org>
With this patch, an engine node which is an input to another engine node
can provide the tracked data. The parent of this engine node can handle
this tracked data incrementally.

At the end of the engine_run(), the tracked data of the nodes are
cleared.

Signed-off-by: Numan Siddique <numans@ovn.org>
In order to handle runtime data changes incrementally, the flow outut
runtime data handle should know the changed runtime data.
Runtime data now tracks the changed data for any OVS interface
and SB port binding changes. The tracked data contains a hmap
of tracked datapaths (which changed during runtime data processing.

The flow outout runtime_data handler in this patch doesn't do much
with the tracked data. It returns false if there is tracked data available
so that flow_output run is called. If no tracked data is available
then there is no need for flow computation and the handler returns true.

Next patch in the series processes the tracked data incrementally.

Co-Authored-by: Venkata Anil <anilvenkata@redhat.com>
Signed-off-by: Venkata Anil <anilvenkata@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
This patch processes the logical flows of tracked datapaths
and tracked logical ports. To handle the tracked logical port
changes, reference of logical flows to port bindings is maintained.

Co-Authored-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Venkata Anil <anilvenkata@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique pushed a commit that referenced this pull request Oct 22, 2020
The 'nexthop' that passed to ic_route_hash() is not fully initialized in
get_nexthop_from_lport_addresses(). 'nexthop' has type of 'struct v46_ip' which
contains a union to share space for ipv4 and ipv6 address.  If only ipv4
initialized where is a plenty of uninitialized space that goes to
hash_bytes(nexthop, sizeof *nexthop, basis).

Impact: there are two places where this function is called.

1. In add_to_routes_ad(), the nexthop is initialized in parse_route() before
   calling get_nexthop_from_lport_addresses(), luckily.

2. In add_network_to_routes_ad(), we are unlucky.  When a directly connected
network of a router is found to be advertised, if the route already existed in
the global IC-SB, it may not be found due to the hash difference, and results
in the existing route being deleted and the same one recreated, unnecessarily.

This patch fixes the problem by initializing the struct to zero before setting
the fields.

From Ilya's report:
> Report from MemorySanitizer:
>
> ==3074629==WARNING: MemorySanitizer: use-of-uninitialized-value
>     #0 0x67177e in mhash_add__ ovs/./lib/hash.h:66:9
>     #1 0x671668 in mhash_add ovs/./lib/hash.h:78:12
>     #2 0x6701e9 in hash_bytes ovs/lib/hash.c:38:16
>     ovn-org#3 0x524b4a in add_network_to_routes_ad ic/ovn-ic.c:1095:5
>     ovn-org#4 0x51eea3 in route_run ic/ovn-ic.c:1424:21
>     ovn-org#5 0x51887b in main ic/ovn-ic.c:1674:17
>     ovn-org#6 0x7fd4ce7871a2 in __libc_start_main
>     ovn-org#7 0x49c90d in _start (ic/ovn-ic+0x49c90d)
>
>   Uninitialized value was created by an allocation of 'nexthop' in the
>   stack frame of function 'add_network_to_routes_ad'
>     #0 0x5245f0 in add_network_to_routes_ad ic/ovn-ic.c:1069

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-October/376160.html
Fixes: 57b347c ("ovn-ic: Route advertisement.")
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Han Zhou <hzhou@ovn.org>
numansiddique pushed a commit that referenced this pull request Nov 24, 2020
'child_port_list' is an array of pointers that should be freed too.

  Direct leak of 30 byte(s) in 6 object(s) allocated from:
    #0 0x501fff in malloc (/tests/ovstest+0x501fff)
    #1 0x6227e6 in xmalloc /lib/util.c:138:15
    #2 0x6228b8 in xmemdup0 /lib/util.c:168:15
    ovn-org#3 0x8183d6 in parse_fwd_group_action /lib/actions.c:3374:30
    ovn-org#4 0x814b6e in parse_action /lib/actions.c:3610:9
    ovn-org#5 0x8139ef in parse_actions /lib/actions.c:3637:14
    ovn-org#6 0x8136a3 in ovnacts_parse /lib/actions.c:3672:9
    ovn-org#7 0x813c80 in ovnacts_parse_string /lib/actions.c:3699:5
    ovn-org#8 0x53a979 in test_parse_actions /tests/test-ovn.c:1372:21
    ovn-org#9 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17
    ovn-org#10 0x537c75 in test_ovn_main /tests/test-ovn.c:1630:5
    ovn-org#11 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17
    ovn-org#12 0x537359 in main /tests/ovstest.c:133:9
    ovn-org#13 0x7f06978f21a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

CC: Manoj Sharma <manoj.sharma@nutanix.com>
Fixes: edb2400 ("Forwarding group to load balance l2 traffic with liveness detection")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique pushed a commit that referenced this pull request Nov 24, 2020
'dsts' should be freed in case of any error.

  Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x502378 in realloc (/tests/ovstest+0x502378)
    #1 0x622826 in xrealloc /lib/util.c:149:9
    #2 0x8194f4 in parse_select_action /lib/actions.c:1185:20
    ovn-org#3 0x814f49 in parse_set_action /lib/actions.c:3499:13
    ovn-org#4 0x814341 in parse_action /lib/actions.c:3554:9
    ovn-org#5 0x8139ef in parse_actions /lib/actions.c:3643:14
    ovn-org#6 0x8136a3 in ovnacts_parse /lib/actions.c:3678:9
    ovn-org#7 0x813c80 in ovnacts_parse_string /lib/actions.c:3705:5
    ovn-org#8 0x53a4e8 in test_parse_actions /tests/test-ovn.c:1321:17
    ovn-org#9 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17
    ovn-org#10 0x537c75 in test_ovn_main /tests/test-ovn.c:1630:5
    ovn-org#11 0x54e7a8 in ovs_cmdl_run_command__ /lib/command-line.c:247:17
    ovn-org#12 0x537359 in main /tests/ovstest.c:133:9
    ovn-org#13 0x7f9ce05ba1a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

CC: Han Zhou <hzhou@ovn.org>
Fixes: 85b3544 ("ovn-controller: A new action "select".")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique pushed a commit that referenced this pull request Nov 24, 2020
'parse_ofp_meter_mod_str' allocates space for meter.bands that
should be freed.

  Direct leak of 448 byte(s) in 7 object(s) allocated from:
    #0 0x52100f in malloc (/controller/ovn-controller+0x52100f)
    #1 0x7523a6 in xmalloc /lib/util.c:138:15
    #2 0x6fd079 in ofpbuf_init /lib/ofpbuf.c:123:26
    ovn-org#3 0x6cba27 in parse_ofp_meter_mod_str /lib/ofp-meter.c:779:5
    ovn-org#4 0x5705b8 in add_meter_string /controller/ofctrl.c:1674:19
    ovn-org#5 0x56f736 in ofctrl_put /controller/ofctrl.c:2105:13
    ovn-org#6 0x59aebb in main /controller/ovn-controller.c:2627:25
    ovn-org#7 0x7f07873251a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

CC: Guoshuai Li <ligs@dtdream.com>
Fixes: c25094b ("ovn: OVN Support QoS meter")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique pushed a commit that referenced this pull request Nov 24, 2020
'smap_clear()' doesn't free allocated memory, but 'smap_clone()'
re-initializes hash map clearing internal pointers and leaking
this memory.  'smap_destroy()' should be used instead.

Also, all the records and array of datapaths should be freed on
destruction of a cache entry.

  Direct leak of 16 byte(s) in 2 object(s) allocated from:
    #0 0x5211c7 in calloc (/controller/ovn-controller+0x5211c7)
    #1 0x752364 in xcalloc /lib/util.c:121:31
    #2 0x576e76 in sync_dns_cache /controller/pinctrl.c:2517:25
    ovn-org#3 0x5758fb in pinctrl_run /controller/pinctrl.c:3158:5
    ovn-org#4 0x59b06c in main /controller/ovn-controller.c:2642:25
    ovn-org#5 0x7fb570fc11a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

  Indirect leak of 26 byte(s) in 2 object(s) allocated from:
    #0 0x52100f in malloc (/controller/ovn-controller+0x52100f)
    #1 0x7523d6 in xmalloc /lib/util.c:138:15
    #2 0x7524a8 in xmemdup0 /lib/util.c:168:15
    ovn-org#3 0x73d8fc in smap_clone /lib/smap.c:314:45
    ovn-org#4 0x576e2f in sync_dns_cache /controller/pinctrl.c:2513:13
    ovn-org#5 0x5758fb in pinctrl_run /controller/pinctrl.c:3158:5
    ovn-org#6 0x59b06c in main /controller/ovn-controller.c:2642:25
    ovn-org#7 0x7fb570fc11a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

Fixes: 6b72068 ("ovn-controller: Add a new thread in pinctrl module to handle packet-ins.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique pushed a commit that referenced this pull request Nov 24, 2020
shash contains pointers to the data that should be freed.

  Direct leak of 32 byte(s) in 2 object(s) allocated from:
    #0 0x52100f in malloc (/controller/ovn-controller+0x52100f)
    #1 0x752436 in xmalloc /lib/util.c:138:15
    #2 0x5a2f0b in add_pending_ct_zone_entry /controller/ovn-controller.c:548:45
    ovn-org#3 0x5a2d1d in update_ct_zones /controller/ovn-controller.c:668:9
    ovn-org#4 0x59d8c6 in en_ct_zones_run /controller/ovn-controller.c:1495:5
    ovn-org#5 0x5dade4 in engine_run /lib/inc-proc-eng.c:377:9
    ovn-org#6 0x59adf4 in main /controller/ovn-controller.c
    ovn-org#7 0x7f0799ef41a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

CC: xu rong <xu.rong@zte.com.cn>
Fixes: 252e164 ("ovn-controller: pending_ct_zones should be destroyed")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
@numansiddique numansiddique force-pushed the master branch 6 times, most recently from 38a5351 to 5be9e1e Compare November 24, 2020 13:20
numansiddique pushed a commit that referenced this pull request Nov 24, 2020
When a port binding of type "l3gateway" is claimed its remote peer
port_binding is also stored in local_datapath.peer_ports[].remote.

If the remote peer port_binding is deleted first (i.e., before the local
"l3gateway" one) then we need to remove the complete
local_datapath.peer_ports[] entry in order to avoid ending up using
dangling pointers to already freed port bindings.

Also, properly reset local_datapath->has_local_l3gateway in
remove_pb_from_local_datapath().

Ilya reported this issue found by AddressSanitizer during his testing:

==1816017==ERROR: AddressSanitizer: heap-use-after-free on address 0x6140000cb170 at pc 0x0000005ab574 bp 0x7fff68925a30 sp 0x7fff68925a28
READ of size 8 at 0x6140000cb170 thread T0
    #0 0x5ab573 in put_replace_chassis_mac_flows git/ovn/controller/physical.c:550:9
    #1 0x5a65eb in consider_port_binding git/ovn/controller/physical.c:1168:13
    #2 0x5a8764 in physical_run git/ovn/controller/physical.c:1607:9
    ovn-org#3 0x5a0064 in flow_output_physical_flow_changes_handler git/ovn/controller/ovn-controller.c:2127:9
    ovn-org#4 0x5db423 in engine_compute git/ovn/lib/inc-proc-eng.c:306:18
    ovn-org#5 0x5dae1f in engine_run_node git/ovn/lib/inc-proc-eng.c:352:14
    ovn-org#6 0x5dac74 in engine_run git/ovn/lib/inc-proc-eng.c:377:9
    ovn-org#7 0x59ad64 in main git/ovn/controller/ovn-controller.c
    ovn-org#8 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
    ovn-org#9 0x480b2d in _start (git/ovn/controller/ovn-controller+0x480b2d)

0x6140000cb170 is located 304 bytes inside of 408-byte region [0x6140000cb040,0x6140000cb1d8)
freed by thread T0 here:
    #0 0x520d07 in free (git/ovn/controller/ovn-controller+0x520d07)
    #1 0x712de7 in ovsdb_idl_db_track_clear git/ovs/lib/ovsdb-idl.c:1984:21
    #2 0x59b5cd in main git/ovn/controller/ovn-controller.c:2762:9
    ovn-org#3 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 354bdba ("ovn-controller: I-P for SB port binding and OVS interface in runtime_data.")
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
@numansiddique numansiddique force-pushed the master branch 4 times, most recently from 176890d to 031af9a Compare November 24, 2020 15:53
@numansiddique numansiddique force-pushed the main branch 5 times, most recently from 397915b to 22bacab Compare December 1, 2023 23:55
@numansiddique numansiddique force-pushed the main branch 3 times, most recently from 1a799cf to 9a0f307 Compare January 10, 2024 16:39
numansiddique pushed a commit that referenced this pull request Apr 30, 2024
In some scenarios, op->list is not attached anywhere, which makes
attempts to detach it trigger ubsan failure.

ovn/ovs/include/openvswitch/list.h:252:17: runtime error: member access
within null pointer of type 'struct ovs_list'

 #0 0x?? in ovs_list_remove ovn/ovs/include/openvswitch/list.h:252:17
 #1 0x?? in ovn_port_allocate_key ovn/northd/northd.c:4021:13
 #2 0x?? in ls_port_init ovn/northd/northd.c:4321:10
 ovn-org#3 0x?? in ls_port_create ovn/northd/northd.c:4342:10
 ovn-org#4 0x?? in ls_handle_lsp_changes ovn/northd/northd.c:4511:18
 ovn-org#5 0x?? in northd_handle_ls_changes ovn/northd/northd.c:4655:14
 ovn-org#6 0x?? in northd_nb_logical_switch_handler ovn/northd/en-northd.c:150:

This patch makes northd use op->list only as a temporary means for
build_ports logic to track ports that are persisted in both, nb, or sb
only. Now build_ports will always detach ops once done.

Now that op->list is never left attached to a list, we can remove
ovs_list_remove calls for it elsewhere, including where op was never
attached in the first place.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Acked-by: Mark Michelson <mmichels@redhat.com>
numansiddique pushed a commit that referenced this pull request May 9, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique pushed a commit that referenced this pull request May 9, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
numansiddique pushed a commit that referenced this pull request May 9, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
numansiddique pushed a commit that referenced this pull request May 9, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
numansiddique pushed a commit that referenced this pull request May 9, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
numansiddique pushed a commit that referenced this pull request May 9, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
numansiddique pushed a commit that referenced this pull request May 15, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
numansiddique pushed a commit that referenced this pull request May 15, 2024
Issue:
Upon updating the network_name option of localnet port from one physical
bridge to another, ovn-controller fails to cleanup the peer localnet
patch port from the old bridge and ends up creating a duplicate peer
localnet patch port which fails in the following ovsdb transaction:

```
"Transaction causes multiple rows in \"Interface\" table to have
identical values
(patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
```

Current workflow:
1. Keep a set of all existing localnet ports on br-int. Let us call this
   set: existing_ports.
2. For each localnet port in SB:
   2.1 Create a patch port on br-int. (if it already exists on br-int,
       do not create but remove the entry from exisitng_ports set).
   2.2 Create a peer patch port on br-phys-x. (if it already exists on the
       bridge, do not create but remove the entry from exisitng_ports set).
3. Finally, cleanup all the ports and its peers from exisiting_ports.

With the above workflow, upon network_name change of localnet LSP, since
ports on br-int does not change, only peer port needs to be move from
br-phys-x to br-phys-y, the localnet port is removed from
exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.

Fix:
Include both patch port on br-int and its peer port in the
exisiting_ports set as part of #1.
This make sures that the peer port is only removed from existing_ports
only if it is already present on the correct bridge. (#2.1/#2.2)
Otherwise, during the cleanup in ovn-org#3 it will be considered.

Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")
Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 2609cd9)
@numansiddique numansiddique force-pushed the main branch 5 times, most recently from d95180c to 0d7e867 Compare June 6, 2024 21:12
@numansiddique numansiddique force-pushed the main branch 3 times, most recently from e4222db to cf55c6c Compare July 8, 2024 21:21
numansiddique added a commit that referenced this pull request Sep 19, 2024
Commit 8d13579 creates a chassisredirect port for a logical switch
port (of type 'router') in certain scenarios and 'op->nbsp' can be NULL.
The following crash is reported by Sanitizer.

==84927==ERROR: AddressSanitizer: SEGV on unknown address
==84927==The signal is caused by a READ memory access.
==84927==Hint: address points to the zero page.
    #0 0x57ab3854f04a in hmap_first_with_hash ovn/ovs/./include/openvswitch/hmap.h:360:38
    #1 0x57ab3854fedf in smap_find__ ovn/ovs/lib/smap.c:421:5
    #2 0x57ab3854f7b8 in smap_get_node ovn/ovs/lib/smap.c:217:12
    ovn-org#3 0x57ab3854f74f in smap_get_def ovn/ovs/lib/smap.c:208:30
    ovn-org#4 0x57ab3854f726 in smap_get ovn/ovs/lib/smap.c:200:12
    ovn-org#5 0x57ab3854f862 in smap_get_int ovn/ovs/lib/smap.c:240:25
    ovn-org#6 0x57ab383222eb in ovn_port_assign_requested_tnl_id ovn/northd/northd.c:4372:27
    ovn-org#7 0x57ab383072fa in build_ports ovn/northd/northd.c:4454:9
    ovn-org#8 0x57ab38301457 in ovnnb_db_run ovn/northd/northd.c:18023:5
    ovn-org#9 0x57ab3841d99e in en_northd_run ovn/northd/en-northd.c:137:5
    ovn-org#10 0x57ab384599b2 in engine_recompute ovn/lib/inc-proc-eng.c:411:5
    ovn-org#11 0x57ab38459d6e in engine_run_node ovn/lib/inc-proc-eng.c:473:9
    ovn-org#12 0x57ab38459ec3 in engine_run ovn/lib/inc-proc-eng.c:524:9
    ovn-org#13 0x57ab38430c5d in inc_proc_northd_run ovn/northd/inc-proc-northd.c:420:5
    ovn-org#14 0x57ab3841bb2f in main ovn/northd/ovn-northd.c:970:32

This patch fixes this issue.  It also corrects some typo introduced by
the commit - changes the reference from "chassisresident" to
"chassisredirect".

Fixes: 8d13579 ("Add support for centralize routing for distributed gw ports.")
Reported-by: Felix Huettner <felix.huettner@mail.schwarz>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-August/416264.html
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
numansiddique added a commit that referenced this pull request Sep 20, 2024
Commit 8d13579 creates a chassisredirect port for a logical switch
port (of type 'router') in certain scenarios and 'op->nbsp' can be NULL.
The following crash is reported by Sanitizer.

==84927==ERROR: AddressSanitizer: SEGV on unknown address
==84927==The signal is caused by a READ memory access.
==84927==Hint: address points to the zero page.
    #0 0x57ab3854f04a in hmap_first_with_hash ovn/ovs/./include/openvswitch/hmap.h:360:38
    #1 0x57ab3854fedf in smap_find__ ovn/ovs/lib/smap.c:421:5
    #2 0x57ab3854f7b8 in smap_get_node ovn/ovs/lib/smap.c:217:12
    ovn-org#3 0x57ab3854f74f in smap_get_def ovn/ovs/lib/smap.c:208:30
    ovn-org#4 0x57ab3854f726 in smap_get ovn/ovs/lib/smap.c:200:12
    ovn-org#5 0x57ab3854f862 in smap_get_int ovn/ovs/lib/smap.c:240:25
    ovn-org#6 0x57ab383222eb in ovn_port_assign_requested_tnl_id ovn/northd/northd.c:4372:27
    ovn-org#7 0x57ab383072fa in build_ports ovn/northd/northd.c:4454:9
    ovn-org#8 0x57ab38301457 in ovnnb_db_run ovn/northd/northd.c:18023:5
    ovn-org#9 0x57ab3841d99e in en_northd_run ovn/northd/en-northd.c:137:5
    ovn-org#10 0x57ab384599b2 in engine_recompute ovn/lib/inc-proc-eng.c:411:5
    ovn-org#11 0x57ab38459d6e in engine_run_node ovn/lib/inc-proc-eng.c:473:9
    ovn-org#12 0x57ab38459ec3 in engine_run ovn/lib/inc-proc-eng.c:524:9
    ovn-org#13 0x57ab38430c5d in inc_proc_northd_run ovn/northd/inc-proc-northd.c:420:5
    ovn-org#14 0x57ab3841bb2f in main ovn/northd/ovn-northd.c:970:32

This patch fixes this issue.  It also corrects some typo introduced by
the commit - changes the reference from "chassisresident" to
"chassisredirect".

Fixes: 8d13579 ("Add support for centralize routing for distributed gw ports.")
Reported-by: Felix Huettner <felix.huettner@mail.schwarz>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-August/416264.html
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Numan Siddique <numans@ovn.org>
(cherry picked from commit 9fbda4a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants