Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateNetworkStatuses should assign IP addresses to a default interface if one isn't specified. #74

Open
Brian-McM opened this issue Nov 22, 2024 · 11 comments

Comments

@Brian-McM
Copy link

Multus 4.1 started using CreateNetworkStatuses instead of CreateNetworkStatus in this PR. This broke the current default behaviour where an IP is assigned to a default address if it doesn't specify an interface index.

Upgrading multus to 4.1 breaks when using calico (and possibly other CNIs) since it doesn't specify the interface index. Calico should probably be updated to specific the interface index just to be explicit, but I think it is reasonable to have a default, especially since:

  1. This used to default the IPs to an interface
  2. It would be reasonable to default to an eth0 interface, just like containerd does here

In conjunction with this, I think it would also be reasonable to fail (and probably should fail) if it can't assign the IP addresses to an interface instead of silently dropping the IPs.

@maiqueb
Copy link
Contributor

maiqueb commented Dec 9, 2024

Isn't this fixed by #72 ?

Could you confirm the multus version you're using ? This is supposed to have been released in multus-cni v4.1.2 whose HEAD is https://github.com/k8snetworkplumbingwg/multus-cni/pull/1341/files#diff-230ad5a9f4cef187812c56167fc65a4ba379efa016fa878d61d80bff891f3876

@Brian-McM
Copy link
Author

@maiqueb thanks for the reply. That wouldn't fix the issue, because the number of interfaces that's returned isn't 1, it's 2 (one interface for each side of the veth pair).

Could this be updated find the eth0 interface (if it exists) and assign the IPs to that like containerd does?

@maiqueb
Copy link
Contributor

maiqueb commented Dec 11, 2024

@maiqueb thanks for the reply. That wouldn't fix the issue, because the number of interfaces that's returned isn't 1, it's 2 (one interface for each side of the veth pair).

In #72 I show an example of a calico's CNI response. It only has a single interface.
Maybe it changes in some other version ?

Could you paste the CNI response triggering this issue ? Without it, not much we can do.

Could this be updated find the eth0 interface (if it exists) and assign the IPs to that like containerd does?

Maybe. But keep in mind containerd is not the only runtime we need to care about.

@Brian-McM
Copy link
Author

In #72 I show an example of a calico's CNI response. It only has a single interface.
Maybe it changes in some other version ?

@maiqueb here's the CNI response:

{
    "cniVersion": "0.3.1",
    "interfaces": [
        {
            "name": "calif887e436e8b"
        },
        {
            "name": "eth0",
            "mac": "62:4e:a4:58:49:59",
            "sandbox": "/var/run/netns/cni-c2840e70-31e3-ad9f-92fa-61c264c9bf50"
        }
    ],
    "ips": [
        {
            "version": "4",
            "address": "192.168.42.136/32"
        }
    ],
    "dns": {}
}

I don't think anything has really changed with Calico (I actually work there), but it's possible.

My thinking was that regardless of this in multus 4.1 the result has changed if the interface indices for the IPs aren't specified if multiple interfaces exist, when previously it worked.

Maybe. But keep in mind containerd is not the only runtime we need to care about.

Just to be clear, my suggestion about following what containerd does wasn't to support containerd, it's just to use the default behaviour that containerd uses as it seems like a reasonable default. I wouldn't be surprised if CRIO or other runtimes would use this default.

Another reasonable default (which would align with what you had before) would be that any IP that doesn't have an interface index assigned is assigned to the last interface provided. I think looking for the eth0 interface provides a nice reasonable default instead of assuming that the last interface is the one that the IP should be assigned to.

I also hadn't mentioned that the issue here is for the enterprise multi nics feature which relies on the annotations multus assigns to the pods to assign the IPs to the calico managed interfaces.

We should update Calico to provide interface indices for the IPs, which I intend to tackle for new releases, but any old release that upgrades it's multus version that's relying on this feature breaks because of this change in functionality.

@maiqueb
Copy link
Contributor

maiqueb commented Dec 11, 2024

In #72 I show an example of a calico's CNI response. It only has a single interface.
Maybe it changes in some other version ?

@maiqueb here's the CNI response:

{
    "cniVersion": "0.3.1",
    "interfaces": [
        {
            "name": "calif887e436e8b"
        },
        {
            "name": "eth0",
            "mac": "62:4e:a4:58:49:59",
            "sandbox": "/var/run/netns/cni-c2840e70-31e3-ad9f-92fa-61c264c9bf50"
        }
    ],
    "ips": [
        {
            "version": "4",
            "address": "192.168.42.136/32"
        }
    ],
    "dns": {}
}

I don't think anything has really changed with Calico (I actually work there), but it's possible.

Check out projectcalico/calico#9294 . I think calico did change something :)

I think we can close that issue now that calico seems to reply with the sandbox information.

My thinking was that regardless of this in multus 4.1 the result has changed if the interface indices for the IPs aren't specified if multiple interfaces exist, when previously it worked.

Maybe. But keep in mind containerd is not the only runtime we need to care about.

Just to be clear, my suggestion about following what containerd does wasn't to support containerd, it's just to use the default behaviour that containerd uses as it seems like a reasonable default. I wouldn't be surprised if CRIO or other runtimes would use this default.

Another reasonable default (which would align with what you had before) would be that any IP that doesn't have an interface index assigned is assigned to the last interface provided. I think looking for the eth0 interface provides a nice reasonable default instead of assuming that the last interface is the one that the IP should be assigned to.

I agree that we may need to use a default. I just am not sure defaulting to eth0 being the "go-to" interface is right.
Is it written anywhere that eth0 will even be always available in the pods ? Take a look at the iface reported in #72 ... there's no eth0 reported there ...

I also hadn't mentioned that the issue here is for the enterprise multi nics feature which relies on the annotations multus assigns to the pods to assign the IPs to the calico managed interfaces.

We should update Calico to provide interface indices for the IPs, which I intend to tackle for new releases, but any old release that upgrades it's multus version that's relying on this feature breaks because of this change in functionality.

For sure.

Given this bug, maybe we could filter out the interfaces without sandbox iif we have more than one interface. And if after doing that we have a single interface, invoke CreateNetworkStatus.

Not sure how clean this is ... allow me a couple of days to think this through !

@Brian-McM
Copy link
Author

Check out projectcalico/calico#9294 . I think calico did change something :)

I'm not sure I see what you're pointing to that changed, there's a referenced ticket that's open to add the sandbox info, but nothing I can see that would change the number of interfaces returned. I'm a little curious now so I can try digging through (like I said I should update the cni plugin to do the correct thing and add the interface indices to the IPs, so I can take a look as part of that work).

I agree that we may need to use a default. I just am not sure defaulting to eth0 being the "go-to" interface is right.
Is it written anywhere that eth0 will even be always available in the pods ? Take a look at the iface reported in #72 ... there's no eth0 reported there ...

That's a fair point, I was thinking that this would restore the previous behaviour for the issue I was facing but not in general.

Given this bug, maybe we could filter out the interfaces without sandbox iif we have more than one interface. And if after doing that we have a single interface, invoke CreateNetworkStatus.

Not sure how clean this is ... allow me a couple of days to think this through !

I was thinking more along the lines of updating the IP assignment logic, like assign all the IPs without interfaces to the last interface could be a good way to restore the previous default behaviour. Even if there is sandbox info, it's still possible the IPs don't have an interface explicitly defined which previously worked. So something like:

lastInfIdx := <get last inf from list>
for _, ipConfig := range result.IPs {
		originalIndex := lastInfIdx
		if ipConfig.Interface != nil {
			originalIndex := *ipConfig.Interface
		}
		if newIndex, ok := indexMap[originalIndex]; ok {
			ns := networkStatuses[newIndex]
			ns.IPs = append(ns.IPs, ipConfig.Address.IP.String())
		}
	}

WDYT?

@maiqueb
Copy link
Contributor

maiqueb commented Dec 12, 2024

Check out projectcalico/calico#9294 . I think calico did change something :)

I'm not sure I see what you're pointing to that changed, there's a referenced ticket that's open to add the sandbox info, but nothing I can see that would change the number of interfaces returned. I'm a little curious now so I can try digging through (like I said I should update the cni plugin to do the correct thing and add the interface indices to the IPs, so I can take a look as part of that work).

Well, I was focusing on the fact that one of the interfaces has the sandbox defined. Before, the only returned interface didn't.

I agree that we may need to use a default. I just am not sure defaulting to eth0 being the "go-to" interface is right.
Is it written anywhere that eth0 will even be always available in the pods ? Take a look at the iface reported in #72 ... there's no eth0 reported there ...

That's a fair point, I was thinking that this would restore the previous behaviour for the issue I was facing but not in general.

Given this bug, maybe we could filter out the interfaces without sandbox iif we have more than one interface. And if after doing that we have a single interface, invoke CreateNetworkStatus.
Not sure how clean this is ... allow me a couple of days to think this through !

I was thinking more along the lines of updating the IP assignment logic, like assign all the IPs without interfaces to the last interface could be a good way to restore the previous default behaviour. Even if there is sandbox info, it's still possible the IPs don't have an interface explicitly defined which previously worked. So something like:

lastInfIdx := <get last inf from list>
for _, ipConfig := range result.IPs {
		originalIndex := lastInfIdx
		if ipConfig.Interface != nil {
			originalIndex := *ipConfig.Interface
		}
		if newIndex, ok := indexMap[originalIndex]; ok {
			ns := networkStatuses[newIndex]
			ns.IPs = append(ns.IPs, ipConfig.Address.IP.String())
		}
	}

WDYT?

This might be OK - but I think we need to initialize originalIndex to the last sandbox interface. Right ? What's the point of putting these IPs on an host interface ?

@Brian-McM
Copy link
Author

Brian-McM commented Dec 13, 2024

Well, I was focusing on the fact that one of the interfaces has the sandbox defined. Before, the only returned interface didn't.

Ahh gotcha, I incorrectly assumed the change you were referring to was the number of interfaces returned. I think my previous statement of "not thinking anything has changed" was maybe a little too broad, I don't think the logic around returning multiple interfaces has changed.

Thinking a little more on what you said about seeing a single interface, it might be the difference is when the CNI plugin is used for policy enforcement with the rest of the Calico product as opposed to just a generic CNI plugin that's used alone.

Not that any of this really matters right now outside of curiosity (the regression will be fixed in multus and going forward Calico will correctly provide the indices for the interfaces).

This might be OK - but I think we need to initialize originalIndex to the last sandbox interface. Right ? What's the point of putting these IPs on an host interface ?

Right, that would make sense as the "most correct" thing to do, however, it's still possible that the sandbox info isn't returned (that's what that ticket you pointed me to was about, wasn't it?). A little more context though on the output I gave you, that was a custom build I made off the master version of the enterprise side of the product, so there's certainly some differences (apologies for not mentioning that sooner, that's probably important context). I needed to add some more logging on the CNI plugin side while I was debugging the issue.

So maybe lastInfIdx could try to find the last sandbox interface, and if there isn't any sandbox interface, it then defaults to the last interface. I think this would make the logic more robust, more "correct" in general, while restoring the previous behaviour where multus worked when the sandbox info and the interface indices were missing.

Would you agree with that, or can you see something I'm missing?

@Brian-McM
Copy link
Author

@maiqueb just wanted to check in, was there any progress on a fix for this?

@maiqueb
Copy link
Contributor

maiqueb commented Jan 13, 2025

Well, I was focusing on the fact that one of the interfaces has the sandbox defined. Before, the only returned interface didn't.

Ahh gotcha, I incorrectly assumed the change you were referring to was the number of interfaces returned. I think my previous statement of "not thinking anything has changed" was maybe a little too broad, I don't think the logic around returning multiple interfaces has changed.

Thinking a little more on what you said about seeing a single interface, it might be the difference is when the CNI plugin is used for policy enforcement with the rest of the Calico product as opposed to just a generic CNI plugin that's used alone.

Not that any of this really matters right now outside of curiosity (the regression will be fixed in multus and going forward Calico will correctly provide the indices for the interfaces).

This might be OK - but I think we need to initialize originalIndex to the last sandbox interface. Right ? What's the point of putting these IPs on an host interface ?

Right, that would make sense as the "most correct" thing to do, however, it's still possible that the sandbox info isn't returned (that's what that ticket you pointed me to was about, wasn't it?). A little more context though on the output I gave you, that was a custom build I made off the master version of the enterprise side of the product, so there's certainly some differences (apologies for not mentioning that sooner, that's probably important context). I needed to add some more logging on the CNI plugin side while I was debugging the issue.

So maybe lastInfIdx could try to find the last sandbox interface, and if there isn't any sandbox interface, it then defaults to the last interface. I think this would make the logic more robust, more "correct" in general, while restoring the previous behaviour where multus worked when the sandbox info and the interface indices were missing.

This makes sense to me.

Would you agree with that, or can you see something I'm missing?

I agree with the proposal above.

@maiqueb just wanted to check in, was there any progress on a fix for this?

Right now we're focused on some late release work, we won't have time to sort this out.

Maybe we could spare enough reviews in case you're willing to push a PR ? ... @Brian-McM

@dougbtv do you have buffer for reviews to this ? I'll assist.

@Brian-McM
Copy link
Author

Thanks for the update @maiqueb, completely understand how this might not be the biggest priority. I'll try to carve out some time to push up a PR, I don't think it should take too much time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants