You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While creating ovn interconnect logical flows (Transit switch, LRPs and LSPs) for a Logical Router in an AZ, there is load spike in all chassis's ovn-controllers to 100%.
This remains for 1 minute and half and returns to normal after the logical flows are installed.
Since this behavior only manifests itself while creating ovn interconnect logical flows, enabling the debug logs shows that the ovn-controller receives from SB the following change:
The addresses from the above change (16:00:01:75:f8:87 169.254.10.1/24 fe80:10::1/64) are configured into the LRP connected to the transit switch.
After while another incremental processing is started:
2024-12-26T17:38:18.035Z|11190795|coverage|INFO|Skipping details of duplicate event coverage for hash=db09e857 2024-12-26T17:38:18.035Z|11190796|poll_loop|DBG|wakeup due to [POLLIN] on fd 3 (1XX.XX.XX.X99:47382<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (100% CPU usage) 2024-12-26T17:38:18.035Z|11190797|inc_proc_eng|DBG|Initializing new run 2024-12-26T17:38:18.035Z|11190798|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_sb_global, old_state Updated, new_state Stale 2024-12-26T17:38:18.035Z|11190799|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: northd_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190800|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_dhcp_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190801|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_dhcpv6_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190802|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: dhcp_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190803|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_address_set, old_state Unchanged, new_state Stale
In other chassis we observerd the recompute time increasing to 18secs:
2024-12-26T17:38:46.463Z|40096697|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:38:46.463Z|40096698|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (99% CPU usage) 2024-12-26T17:38:52.442Z|40096699|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:38:52.442Z|40096700|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (102% CPU usage) 2024-12-26T17:38:58.409Z|40096701|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:38:58.409Z|40096702|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (101% CPU usage) 2024-12-26T17:39:04.477Z|40096703|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:39:04.477Z|40096704|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (98% CPU usage) 2024-12-26T17:39:27.896Z|40096705|inc_proc_eng|INFO|node: logical_flow_output, recompute (forced) took 18932ms 2024-12-26T17:39:28.612Z|40096706|inc_proc_eng|INFO|node: physical_flow_output, recompute (forced) took 670ms 2024-12-26T17:39:30.614Z|40096707|timeval|WARN|Unreasonably long 22134ms poll interval (19780ms user, 2315ms system) 2024-12-26T17:39:30.614Z|40096708|timeval|WARN|faults: 2634558 minor, 0 major 2024-12-26T17:39:30.614Z|40096709|timeval|WARN|disk: 0 reads, 8 writes 2024-12-26T17:39:30.614Z|40096710|timeval|WARN|context switches: 0 voluntary, 2688 involuntary
This behavior only happens with ovn interconnect logical flows, do you know any know issue regarding it?
#Versions in use
$ ovn-nbctl -V
ovn-nbctl 23.03.1
Open vSwitch Library 3.1.3
DB Schema 7.0.0
$ ovs-vsctl -V
ovs-vsctl (Open vSwitch) 3.1.3
DB Schema 8.3.1
Hi,
While creating ovn interconnect logical flows (Transit switch, LRPs and LSPs) for a Logical Router in an AZ, there is load spike in all chassis's ovn-controllers to 100%.
This remains for 1 minute and half and returns to normal after the logical flows are installed.
Since this behavior only manifests itself while creating ovn interconnect logical flows, enabling the debug logs shows that the ovn-controller receives from SB the following change:
2024-12-26T17:38:15.023Z|10562646|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_encap, old_state Unchanged, new_state Stale 2024-12-26T17:38:15.023Z|10562647|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: mff_ovn_geneve, old_state Unchanged, new_state Stale 2024-12-26T17:38:15.023Z|10562648|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: physical_flow_output, old_state Unchanged, new_state Stale 2024-12-26T17:38:15.023Z|10562649|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: flow_output, old_state Unchanged, new_state Stale 2024-12-26T17:38:15.032Z|10562650|jsonrpc|DBG|ssl:1XX.XX.XX.X16:6642: received notification, method="update3", params=[["monid","OVN_Southbound"],"9c54ec46-83a4-4deb-877d-3089a99ae482",{"SB_Global ":{"019c4225-e637-4881-b382-36fdfcb9e0c9":{"modify":{"nb_cfg":34448}}},"Port_Binding":{"4230d38a-75cd-4f1c-9fde-62e753958656":{"insert":{"up":false,"tunnel_key":7,"mac":"16:00:01:f4:03:cb 169.254 .30.3/24 fe80:30::3/64","datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-793b8","type":"patch"}},"f4d71615-88b8-4274-87ab-448cf8 c6a6ee":{"insert":{"up":false,"tunnel_key":6,"mac":"16:00:01:75:f8:87 169.254.10.1/24 fe80:10::1/64","datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc- 9486-e66f9d7b25bf-lrp-ba453","type":"patch"}},"3eac57dd-386c-43b0-bf5b-8225941b52cb":{"insert":{"up":false,"tunnel_key":5,"mac":"16:00:01:9a:30:fa 169.254.40.4/24 fe80:40::4/64","datapath":["uuid ","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-36ac0","type":"patch"}},"7295d5c6-ff8d-4f4f-b1a5-eb1ca87fcb7d":{"insert":{"up":false,"tunnel_key ":8,"mac":"16:00:01:de:df:b5 169.254.20.2/24 fe80:20::2/64","datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-c21c6","type":"patc h"}}},"Logical_Flow":{"224852f2-9a46-4569-b51d-5d340ed6f4b4":{"modify":{"logical_dp_group":["uuid","bd825723-9a0e-451c-b5ad-b7d6f9cc75e4"]}},"05f5d515-bccf-47c1-9d55-8a4dfce88822":{"insert":{"pip eline":"ingress","match":"inport == \"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-36ac0\" && ip6.dst == fe80::/64","priority":194,"logical_datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"]," table_id":13,"actions":"ip.ttl--; reg8[0..15] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::1400:1ff:fe9a:30fa; eth.src = 16:00:01:9a:30:fa; outport = \"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-36ac0\"; flags.loopback = 1; next;"}},"c67ec1ee-bfc1-47e0-b35c-95fcc2adc13b":{"modify":{"logical_dp_group":
The addresses from the above change (16:00:01:75:f8:87 169.254.10.1/24 fe80:10::1/64) are configured into the LRP connected to the transit switch.
After while another incremental processing is started:
2024-12-26T17:38:18.035Z|11190795|coverage|INFO|Skipping details of duplicate event coverage for hash=db09e857 2024-12-26T17:38:18.035Z|11190796|poll_loop|DBG|wakeup due to [POLLIN] on fd 3 (1XX.XX.XX.X99:47382<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (100% CPU usage) 2024-12-26T17:38:18.035Z|11190797|inc_proc_eng|DBG|Initializing new run 2024-12-26T17:38:18.035Z|11190798|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_sb_global, old_state Updated, new_state Stale 2024-12-26T17:38:18.035Z|11190799|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: northd_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190800|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_dhcp_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190801|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_dhcpv6_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190802|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: dhcp_options, old_state Unchanged, new_state Stale 2024-12-26T17:38:18.035Z|11190803|inc_proc_eng|DBG|lib/inc-proc-eng.c:379: node: SB_address_set, old_state Unchanged, new_state Stale
In other chassis we observerd the recompute time increasing to 18secs:
2024-12-26T17:38:46.463Z|40096697|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:38:46.463Z|40096698|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (99% CPU usage) 2024-12-26T17:38:52.442Z|40096699|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:38:52.442Z|40096700|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (102% CPU usage) 2024-12-26T17:38:58.409Z|40096701|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:38:58.409Z|40096702|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (101% CPU usage) 2024-12-26T17:39:04.477Z|40096703|poll_loop|INFO|Dropped 72 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2024-12-26T17:39:04.477Z|40096704|poll_loop|INFO|wakeup due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at lib/stream-ssl.c:842 (98% CPU usage) 2024-12-26T17:39:27.896Z|40096705|inc_proc_eng|INFO|node: logical_flow_output, recompute (forced) took 18932ms 2024-12-26T17:39:28.612Z|40096706|inc_proc_eng|INFO|node: physical_flow_output, recompute (forced) took 670ms 2024-12-26T17:39:30.614Z|40096707|timeval|WARN|Unreasonably long 22134ms poll interval (19780ms user, 2315ms system) 2024-12-26T17:39:30.614Z|40096708|timeval|WARN|faults: 2634558 minor, 0 major 2024-12-26T17:39:30.614Z|40096709|timeval|WARN|disk: 0 reads, 8 writes 2024-12-26T17:39:30.614Z|40096710|timeval|WARN|context switches: 0 voluntary, 2688 involuntary
This behavior only happens with ovn interconnect logical flows, do you know any know issue regarding it?
#Versions in use
$ ovn-nbctl -V
ovn-nbctl 23.03.1
Open vSwitch Library 3.1.3
DB Schema 7.0.0
$ ovs-vsctl -V
ovs-vsctl (Open vSwitch) 3.1.3
DB Schema 8.3.1
external_ids: ovn-monitor-all="true", ovn-ofctrl-wait-before-clear="10000", ovn-openflow-probe-interval="0", ovn-remote-probe-interval="180000"
Regards,
Tiago Pires
The text was updated successfully, but these errors were encountered: