Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend_table|ERR|table meter-table: out of table ids #259

Open
frct1 opened this issue Sep 16, 2024 · 7 comments
Open

extend_table|ERR|table meter-table: out of table ids #259

frct1 opened this issue Sep 16, 2024 · 7 comments

Comments

@frct1
Copy link

frct1 commented Sep 16, 2024

Hello,
We running kinda big hypervisors (hundreds of small short-lived VMs) based on OpenStack and started to face issues that DHCP response not being send dhcp offer to tap interface at all (but logs shows that DHCPOFFER has been sent). While starting ovn-controller we always seeing this err log line that probably related to this:

2024-09-16T20:28:49.013Z|00526|extend_table|ERR|table meter-table: out of table ids.

The real weird thing that ovn-controller version is actual and issue should be gone starting one of 2023.* fall releases mentioned here, but it is not

OpenStack deployed using kolla-ansible, master version which ovn-controller is at version 2024.3.2 (info)
Versions:

ovn-controller 24.03.2
Open vSwitch Library 3.3.0
OpenFlow versions 0x6:0x6
SB DB Schema 20.33.0
ovs-vsctl (Open vSwitch) 3.3.1
DB Schema 8.5.0

What could be a reason for this?

@frct1
Copy link
Author

frct1 commented Sep 20, 2024

CC some folks who might have related experience on table sizes.
@igsilya @dceara
Folk at OpenStack community sent link to this patchwork that shows about group and meter tables not limited as 16bit.

@igsilya
Copy link
Contributor

igsilya commented Sep 20, 2024

How many meters do you have configured in OVS? You may also run ovs-ofctl -OOpenFlow15 meter-features br-int to see how many meters your datapath supports. For the kernel datapath, the value is dynamic and depends on how much RAM the system has and some other factors, IIRC, but it's capped at 200K. For userspace datapath it is limited to 256K.

@frct1
Copy link
Author

frct1 commented Sep 20, 2024

How many meters do you have configured in OVS? You may also run ovs-ofctl -OOpenFlow15 meter-features br-int to see how many meters your datapath supports. For the kernel datapath, the value is dynamic and depends on how much RAM the system has and some other factors, IIRC, but it's capped at 200K. For userspace datapath it is limited to 256K.

Old hypervisor where QoS and DHCP issue presents:

# ovs-ofctl -OOpenFlow15 meter-features br-int
OFPST_METER_FEATURES reply (OF1.5) (xid=0x2):
max_meter:0 max_bands:0 max_color:0
band_types: 0
capabilities: 

# ovs-ofctl -O OpenFlow15 dump-meters br-int | grep "meter" | wc -l
0

Fresh provisioned hypervisor with OVN (no QoS or DHCP issue observed):

# ovs-ofctl -OOpenFlow15 meter-features br-int
OFPST_METER_FEATURES reply (OF1.5) (xid=0x2):
max_meter:200000 max_bands:1 max_color:0
band_types: drop
capabilities: kbps pktps burst stats

#ovs-ofctl -O OpenFlow15 dump-meters br-int | grep "meter" | wc -l
1544

1544 is nearly close to a total port number (774) * 2 created in OpenStack because QoS is configured for ingress and egress as well.

Versions are the same.

@igsilya
Copy link
Contributor

igsilya commented Sep 20, 2024

OK. So, your issue is max_meter:0. It means your datapath (kernel?) doesn't support meters, or for some reason ovs-vswitchd thinks that the datapath doesn't support meters. What is your kernel version? Also, what does ovs-appctl dpif/show-dp-features br-int show? Are there any errors/warnings related to meters in the ovs-vswitchd.log ?

@frct1
Copy link
Author

frct1 commented Sep 20, 2024

Kernel 5.15 is used across all hypervisors: 5.15.0-107-generic and 5.15.0-122-generic.

what does ovs-appctl dpif/show-dp-features br-int show

Fresh provisioned:

Masked set action: Yes
Tunnel push pop: No
Ufid: Yes
Truncate action: Yes
Clone action: Yes
Sample nesting: 10
Conntrack eventmask: Yes
Conntrack clear: Yes
Max dp_hash algorithm: 0
Check pkt length action: Yes
Conntrack timeout policy: Yes
Explicit Drop action: No
Optimized Balance TCP mode: No
Conntrack all-zero IP SNAT: Yes
MPLS Label add: Yes
Max VLAN headers: 2
Max MPLS depth: 3
Recirc: Yes
CT state: Yes
CT zone: Yes
CT mark: Yes
CT label: Yes
CT state NAT: Yes
CT orig tuple: Yes
CT orig tuple for IPv6: Yes
IPv6 ND Extension: No

Where issue observed:

Masked set action: Yes
Tunnel push pop: No
Ufid: Yes
Truncate action: Yes
Clone action: Yes
Sample nesting: 10
Conntrack eventmask: Yes
Conntrack clear: Yes
Max dp_hash algorithm: 0
Check pkt length action: Yes
Conntrack timeout policy: Yes
Explicit Drop action: No
Optimized Balance TCP mode: No
Conntrack all-zero IP SNAT: Yes
MPLS Label add: Yes
Max VLAN headers: 2
Max MPLS depth: 3
Recirc: Yes
CT state: Yes
CT zone: Yes
CT mark: Yes
CT label: Yes
CT state NAT: Yes
CT orig tuple: Yes
CT orig tuple for IPv6: Yes
IPv6 ND Extension: No

Are there any errors/warnings related to meters in the ovs-vswitchd.log ?

Yep, did some grep, there are.

First hypervisor with broken metering feature:

2024-09-13T14:11:21.894Z|379262|coverage|INFO|dpif_meter_set             0.0/sec     0.000/sec        0.0000/sec   total: 9658
2024-09-13T14:11:21.894Z|379263|coverage|INFO|dpif_meter_del             0.0/sec     0.000/sec        0.0000/sec   total: 8160
2024-09-13T14:19:31.684Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:19:31.684Z|00033|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:19:31.684Z|00034|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-09-13T14:19:31.684Z|00035|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2024-09-13T14:44:42.548Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:44:42.548Z|00033|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-13T14:44:42.548Z|00034|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-09-13T14:44:42.548Z|00035|dpif_netlink|INFO|The kernel module has a broken meter implementation.

Second hypervisor with broken metering:

2024-09-16T20:28:46.386Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-16T20:28:46.386Z|00033|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2024-09-16T20:28:46.386Z|00034|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2024-09-16T20:28:46.386Z|00035|dpif_netlink|INFO|The kernel module has a broken meter implementation.

13 of September is the first day when metering issue has started and probably become broken for some reason

@legitYosal
Copy link

Hi i have the same error, on kernel 5.15.0-86-generic
OVN and OVS are 24.03.4 and 3.3.2
errors are like what frct1 showed:

2025-01-05T15:29:43.014Z|00034|dpif_netlink|INFO|The kernel module has a broken meter implementation.

@igsilya
We have hypervisors which have max_meter:0 and max_meter:200000, they are all the same in kernel version, and with same hardware.
This is causing QOS to not take affect, is there any way to work around it? I can not relate why on one hypervisor it is 200K and on one it is not!

@frct1
Copy link
Author

frct1 commented Jan 5, 2025

Hi @legitYosal ,
I was managed to workaround through redeploying ovn + ovs (which probably involved reload openvswitch kernel module).

Also switched hypervisors to 6.8 kernel, all is fine with about 2.5k QoS rules (around 1250 ports w/ ingress + egress QoS). But i'm actually not sure about limitation at 200k since all QoS rules are stored in all hypervisors (even for ports which are running at another hypervisors) instead of having node (aka chassis) specific QoS rules for certain hypervisors and related ports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants