Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provisioned nodes can't reach rancher-vcluster on VIP #8

Open
qrkourier opened this issue Nov 26, 2023 · 1 comment
Open

provisioned nodes can't reach rancher-vcluster on VIP #8

qrkourier opened this issue Nov 26, 2023 · 1 comment

Comments

@qrkourier
Copy link

rancher-vcluster shares Harvester's VIP, and the Harvester guest VMs that Rancher provisions as RKE2 nodes are unable to reach the Rancher Server to apply the initial plan. They have no trouble pinging the VIP, to which the configured Rancher Server URL does resolve in DNS on the guest node, and they are able to reach the Harvester UI with a cURL GET to the Harvester node IP.

The HTTP request with Host: rancher.hella header, from the guest node to the Rancher Server URL, results in a SYN that is never acknowledged. The SYN flows from the guest node's veth to the Harvester node's mgmt-br, where it is mangled to an array of destination IPs that have routes to the Harvester node's calico and flannel interfaces.

A cURL GET on the Harvester node where the guest node is running is able to fetch the Harvester UI web server without a Host header, and the Rancher Server UI with the Host: rancher.hella header, proving the destination is correct. The expected response is the HTTP 302 redirect to the dashboard location.

rancher@nuc2:~> curl -ks https://rancher.hella|sha256sum
3509bf97089da3314f168d5811fd5a5015bc185c50e24f4855dab26bf7df8f8b  -

rancher@nuc2:~> curl -ks https://10.52.1.36 -H 'host: rancher.hella'|sha256sum 
3509bf97089da3314f168d5811fd5a5015bc185c50e24f4855dab26bf7df8f8b  -

It's interesting that the guest VM can reach the web server running on some, but not all, of the three Harverster node IPs, and the ones that can not be reached changes depending upon where the VIP is currently bound and on which Harvester node the RKE2 node guest is scheduled.

If the VIP is bound by node3, then the guest running on node2 is able to reach node3's primary interface and GET / gets an HTTP 302 to /dashboard/. The same request to node1, node2 IP times out. Request to the VIP with or without host header times out.

When the VIP is bound on the same node2 as the guest, the previously successful GET / times out, and the request to node1 begins to succeed!

I believe it's commonplace for VIPs like Harvester's to be assigned to the mgmt-br interface with subnet mask /32, despite the primary interface address having /24. Noting this in case Harvester's VIP should actually be using /24.

@qrkourier
Copy link
Author

qrkourier commented Nov 26, 2023

More context, in case it's relevant.

The metal Harvester nodes have untagged mgmt network with PVID 30 on the switch. The VM network tags guest's packets for VLAN 40. There's no issue forwarding between the two VLANs, as determined by ICMP echo replies between the Harvester node and guest VM. Metal Harvester nodes and guest VMs lease IPs via DHCP running on the router, which has pools for each subnet.

TL;DR The SYN initiated by the Rancher-provisioned guest VM never reaches the router's VLAN interface on its way to the Rancher Server provided by rancher-vcluster. It's being dropped somewhere in the Harvester networking.

Slack thread about the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant