Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openstack cloud tests #179

Merged
merged 19 commits into from
Oct 27, 2023
Merged

Openstack cloud tests #179

merged 19 commits into from
Oct 27, 2023

Conversation

mkalcok
Copy link
Contributor

@mkalcok mkalcok commented Sep 22, 2023

Please review commit by commit. This branch is based on #175 so the first 7 commits by @fnordahl can be skipped in review.

This change builds on separation of CMS implementations in #175 and it adds Openstack CMS. More info is in commit messages but at current stage it can simulate provisioning of:

  • Project router
  • Project's Internal network
  • External network with gateway routing

@mkalcok mkalcok force-pushed the openstack-cloud-setup branch 2 times, most recently from 67a4af8 to e7cf77f Compare September 24, 2023 08:04
ovn-tester/ovn_utils.py Outdated Show resolved Hide resolved
Copy link
Member

@fnordahl fnordahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work on this, it looks like a great start! I have left a couple of comments in-line and I'll continue to look later today

ovn-tester/ovn_utils.py Show resolved Hide resolved
ovn-tester/cms/openstack/openstack.py Show resolved Hide resolved
ovn-tester/cms/openstack/openstack.py Outdated Show resolved Hide resolved
Copy link

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thanks for working on this. I had gone another route (trying to mock some parts of neutron to reuse the actual ML2 driver code) which has proven unfruitful.

I don't see anything about the metadata agent here nor any southbound access to simulate what the ML2 driver does. Is there any plan to do so?

ovn-tester/cms/openstack/openstack.py Outdated Show resolved Hide resolved
ovn-tester/cms/openstack/openstack.py Outdated Show resolved Hide resolved
@mkalcok
Copy link
Contributor Author

mkalcok commented Sep 25, 2023

Thanks for the reviews @fnordahl @rjarry.

I don't see anything about the metadata agent here nor any southbound access to simulate what the ML2 driver does. Is there any plan to do so?

That's a good point. I focused on the NB because I assumed that that would be the primary database the Neutron would communicate with. I'll take a look at the SB as well. Do you have any pointers to relevant code/docs about this?

@mkalcok mkalcok force-pushed the openstack-cloud-setup branch from e7cf77f to 7d9fea0 Compare September 25, 2023 13:51
@rjarry
Copy link

rjarry commented Sep 25, 2023

There are a two parts to consider:

  1. the neutron server access to the southbound db: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/scheduler/l3_ovn_scheduler.py

  2. the metadata agent that runs on all compute nodes: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py

I am far from being an expert in this code but I know it has significant impact on scale because it multiplies the number of southbound db clients by two (1 ovn-controller + 1 neutron metadata agent per compute node).

For performance reasons, openstack deployments usually recommend enabling ovn-controller monitor_all=true to reduce cpu pressure on the southbound ovsdb-server. The same logic is also applied for neutron metadata agent. To properly do scale testing, it would be good for ovn-heater to simulate these connections and subscriptions.

@dceara
Copy link
Collaborator

dceara commented Sep 25, 2023

There are a two parts to consider:

1. the neutron server access to the southbound db: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/scheduler/l3_ovn_scheduler.py

2. the metadata agent that runs on all compute nodes: https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py

I am far from being an expert in this code but I know it has significant impact on scale because it multiplies the number of southbound db clients by two (1 ovn-controller + 1 neutron metadata agent per compute node).

For performance reasons, openstack deployments usually recommend enabling ovn-controller monitor_all=true to reduce cpu pressure on the southbound ovsdb-server. The same logic is also applied for neutron metadata agent. To properly do scale testing, it would be good for ovn-heater to simulate these connections and subscriptions.

It seems to me we're only interested in how these extra read-only clients increase load on the Southbound. If that's the case, we can simulate DB clients by running ovsdb-server relay instances. Today we already start real SB relays if configured (e.g., cluster.n_relays > 0). We can programmatically start more of these (without connecting ovn-controllers to them) and achieve the same type of load on the SB as with neutron agents. IIRC @felixhuettner had also suggested this during one of the first ovn-heater community meetings.

@felixhuettner
Copy link

felixhuettner commented Sep 25, 2023

It seems to me we're only interested in how these extra read-only clients increase load on the Southbound. If that's the case, we can simulate DB clients by running ovsdb-server relay instances. Today we already start real SB relays if configured (e.g., cluster.n_relays > 0). We can programmatically start more of these (without connecting ovn-controllers to them) and achieve the same type of load on the SB as with neutron agents. IIRC @felixhuettner had also suggested this during one of the first ovn-heater community meetings.

I think that should give us a fairly accurate measurement of the neutron-ovn-metadata-agent. it currently does not set monitor conditions, so we get a full db dump anyway.
The only thing we are loosing is the performance difference between a relay and the neutron agent when receiving updates from the ovsdb; but i don't think that should matter too much

@rjarry
Copy link

rjarry commented Sep 25, 2023

@felixhuettner
Copy link

@dceara: I think the neutron metadata agent is not read-only, it also writes to the southbound db:

https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py#L193-L196

https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/agent.py#L292-L293

Yep,
there is also some regular heartbeat, but most people configure it

It seems to me we're only interested in how these extra read-only clients increase load on the Southbound. If that's the case, we can simulate DB clients by running ovsdb-server relay instances. Today we already start real SB relays if configured (e.g., cluster.n_relays > 0). We can programmatically start more of these (without connecting ovn-controllers to them) and achieve the same type of load on the SB as with neutron agents. IIRC @felixhuettner had also suggested this during one of the first ovn-heater community meetings.

I think that should give us a fairly accurate measurement of the neutron-ovn-metadata-agent. it currently does not set monitor conditions, so we get a full db dump anyway. The only thing we are loosing is the performance difference between a relay and the neutron agent when receiving updates from the ovsdb; but i don't think that should matter too much

Allthough it seems like there is now some kind of monitor conditions going on here, but only for chassis https://opendev.org/openstack/neutron/src/tag/21.1.2/neutron/agent/ovn/metadata/ovsdb.py#L50-L53

@mkalcok mkalcok force-pushed the openstack-cloud-setup branch from 7d9fea0 to 2ef1fae Compare September 26, 2023 15:51
:return: None
"""

ext_net = self._create_project_net(f"ext_net_{project.uuid}", 1500)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience openstack environments have generally a low count of widely used external networks (sometimes just one). So i would here not generate a new external network, but rather randomly choose from a list of existing ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are two very good points thank you. I'll decouple ext_net creation from project creation.


ext_net = self._create_project_net(f"ext_net_{project.uuid}", 1500)
ext_net_port = self._add_metadata_port(ext_net, project.uuid)
provider_port = self._add_provider_network_port(ext_net)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as i know there will only ever be one provider network port per provider network. Since _add_provider_network_port sets this statically to physnet1 it should only ever be called once

@mkalcok
Copy link
Contributor Author

mkalcok commented Sep 27, 2023

@felixhuettner I decoupled external network creation from project creation and now the BaseOpenstack test creates only one ext_net and connects all projects to it.

I'm creating these suggested changes as separate commits for easier review. Please let me know if that's OK or if you'd like me to squash them in the end to 53339f9

@felixhuettner
Copy link

@felixhuettner I decoupled external network creation from project creation and now the BaseOpenstack test creates only one ext_net and connects all projects to it.

I'm creating these suggested changes as separate commits for easier review. Please let me know if that's OK or if you'd like me to squash them in the end to 53339f9

Thanks. Looks good

@mkalcok
Copy link
Contributor Author

mkalcok commented Sep 30, 2023

I'd say that this change is now almost ready for full review. Two things that are missing and I'd like your opinion on are:

  • Floating IPs and external connectivity: I'd be happy to do this in separate PR to not blow up scope of this one even more.
  • ovn-metadata-agents: Is the consensus that we can simulate metadata agents with ovn-relay nodes to create "read" load on SB DB? Since as far as I can tell the only write operations of metadata agents are keepalive messages and updates to chassis external ID's on registration?

@fnordahl
Copy link
Member

fnordahl commented Oct 2, 2023

I'd say that this change is now almost ready for full review. Two things that are missing and I'd like your opinion on are:

* Floating IPs and external connectivity: I'd be happy to do this in separate PR to not blow up scope of this one even more.

* ovn-metadata-agents: Is the consensus that we can simulate metadata agents with ovn-relay nodes to create "read" load on SB DB? Since as far as I can tell the only write operations of metadata agents are keepalive messages and updates to chassis external ID's on registration?

From my perspective I think it would be valuable to finalize this PR as-is now and mark it as ready for review once you feel comfortable with it.

The items discussed above can be worked on independently, and the work you have done so far would be a dependency.

@felixhuettner
Copy link

@felixhuettner I decoupled external network creation from project creation and now the BaseOpenstack test creates only one ext_net and connects all projects to it.

I'm creating these suggested changes as separate commits for easier review. Please let me know if that's OK or if you'd like me to squash them in the end to 53339f9

I'm not sure what has happend, but we seem to create multiple external networks again (even though they now seem to have no external connectivity)?

@mkalcok
Copy link
Contributor Author

mkalcok commented Oct 2, 2023

🤦 thank you for noticing @felixhuettner . I lost a commit that decoupled project/ext net creation when I was rebasing/force pushing local copy on a different machine

@mkalcok mkalcok marked this pull request as ready for review October 2, 2023 13:05
@mkalcok
Copy link
Contributor Author

mkalcok commented Oct 2, 2023

Lost change was recovered (thanks @fnordahl) and I think this PR is ready for review.

fnordahl and others added 16 commits October 25, 2023 12:59
Commit 1380bba moved instantiation of CMS Cluster class back to the
generic ovn_tester.py.

The `create_cluster` function is generic and actually takes almost
identical arguments as the `Cluster` class.  Let's make it part of
the `Cluster` constructor instead!

This change also makes it apparent that a separate CMS plugin class
is not necessary, so let's roll this into the CMS specifc Cluster
class.

Co-Authored-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Added more methods into OvnNbctl class that map to ovsdbapp commands:
  * Create 'DHCP_Options'
  * Set 'options' for 'DHCP_Options'
  * Enable 'Logical_Switch_Port'
  * Set IPv4 address on 'Logical_Switch_Port'

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Neutron utilizes 'external_ids' a lot. This change adds ability to set them
when creating objects like routers/ports/switches. Additionaly some objects
also now accept fields like 'extra_config' and 'enabled'.

All these additions are optional and should not alter previous behavior of
these methods if they are not supplied.

misc: couple type hints for easier work with the code.
Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
This implementation of Openstack CMS is capable of getting OVN to the
state before first guest VMs are added. It provisions following:
  * Project Router
  * Project's internal network
  * (Optionally) External network and routing from Project's internal network
    via the external network.

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Very basic Openstack test scenario that brings up three compute nodes and
provisions one Project with internal network and external connectivity.

Co-Authored-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Test config option 'n_projects' allows specifying how many projects will
be created during the test.

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Attempt to evenly distribute load when assigning Gateway
Chassis to project routers.

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
New method adds ability to perform full-mesh pings between list of ports.

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
VMs can now be provisioned and assigned to projects.

Co-Authored-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Based on the feedback, Openstack clouds usually contain only one (or few)
external networks. This change removes automatic per-project external network
creation and allows sharing external network between multiple projects.

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
`n_gws_per_project` renamed to `n_chassis_per_gw_lrp` to better capture
effect of the option.

Openstack test scenario was also renamed to be more in line with naming
convention of other tests.

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Neutron no longer applies the `mcast_flood_reports=true` option
to LSPs [0].

0: https://review.opendev.org/c/openstack/neutron/+/888127
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
While Neutron currently adds this, it has been proposed for
removal as it is not needed by OVN [0].

0: https://review.opendev.org/c/openstack/neutron/+/897489
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
@fnordahl fnordahl force-pushed the openstack-cloud-setup branch from dc2ad04 to 48bd651 Compare October 25, 2023 11:17
@fnordahl
Copy link
Member

@dceara thanks a lot for the help with the rebase. I put your proposals for adaption into the correct commits and attributed accordingly.

I redid the parts of the commits that mainly move code around to ensure no changes were lost.

I also arrived at a slightly different solution to calling the prepare_test method, hope you agree.

Looking forward to your review!

@fnordahl
Copy link
Member

I just noticed that the rebase is one commit shorter than the previous iteration, and the reason is that I have unintentionally squashed "CMS Openstack: Add ability to simulate VMs" and "OS Test: Spawn VMs and run ping in base scenario".

However, the result does look fine to me as both commits kind of addressed the same thing in iterations, so perhaps we should just keep it this way? Thoughts @mkalcok ?

@mkalcok
Copy link
Contributor Author

mkalcok commented Oct 25, 2023

yeah, I'm fine with them being squashed. I separated them for easier review as one added logic to the CMS Openstack implementation and the other used that implementation in a test scenario, but they are closely related.

@@ -5,7 +5,7 @@
import time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding the type hints!

)

protocol = "ssl" if self.cluster_cfg.enable_ssl else "tcp"
# XXX: To facilitate network nodes we'd have to make bigger change
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some discussions internally about potentially removing the ovn-fake-multinode dependency and start OVN components directly from ovn-heater. Whenever (if) we do that it might be a good moment for adding support for network nodes too.

Copy link
Collaborator

@dceara dceara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work! Thanks a lot @mkalcok @fnordahl @booxter @rjarry @felixhuettner for all the code and reviews!

I agree with @booxter let's get this in and we can address leftovers/known issues/bugs with follow up PRs.

There is one thing I'd like to fix first before merging this but I'll wait for a reply from @mkalcok or @fnordahl on #179 (review)

Thanks,
Dumitru

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
@mkalcok mkalcok force-pushed the openstack-cloud-setup branch from 48bd651 to 799324e Compare October 27, 2023 07:50
@dceara dceara merged commit 3ed6f0d into ovn-org:main Oct 27, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants