Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

45-50% performance improvement in MPPI controller using Eigen library for computation. #4621

Open
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

Ayush1285
Copy link
Contributor

@Ayush1285 Ayush1285 commented Aug 14, 2024

Basic Info

Info Please fill out this column
Ticket(s) this addresses #4237 #3351
Primary OS tested on Ubuntu
Robotic platform tested on NA, Optimizer Benchmark
Does this PR contain AI generated software? No

Description of contribution in a few bullet points

  • Replaced xtensor with Eigen library for computation.
  • Performance increased by 45-50% with footprint and critics enabled.
  • Currently WIP

Description of documentation updates required from your changes


Future work that may be required in bullet points

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

Ayush1285 added 7 commits June 6, 2024 23:47
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
… files

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Aug 14, 2024

This pull request is in conflict. Could you fix it @Ayush1285?

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Aug 16, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Aug 19, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Aug 19, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pretty fast review, so far from complete on each detail, but a good starting point!

nav2_mppi_controller/CMakeLists.txt Outdated Show resolved Hide resolved
nav2_mppi_controller/CMakeLists.txt Outdated Show resolved Hide resolved
nav2_mppi_controller/CMakeLists.txt Outdated Show resolved Hide resolved
nav2_mppi_controller/CMakeLists.txt Show resolved Hide resolved
nav2_mppi_controller/CMakeLists.txt Outdated Show resolved Hide resolved
nav2_mppi_controller/src/critics/constraint_critic.cpp Outdated Show resolved Hide resolved
nav2_mppi_controller/src/optimizer.cpp Outdated Show resolved Hide resolved
…ator

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Aug 20, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Aug 23, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

@SteveMacenski
Copy link
Member

Let me know here when you want me to take a look again! I'm quite excited for this work - even if for no reason than to move to Eigen + 10% performance boost, since Eigen's release and support is much more known than xtensor's

@Ayush1285
Copy link
Contributor Author

Let me know here when you want me to take a look again! I'm quite excited for this work - even if for no reason than to move to Eigen + 10% performance boost, since Eigen's release and support is much more known than xtensor's

Sure, I'm trying a few optimizations and will push changes once I'm done.

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Sep 2, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
@Ayush1285
Copy link
Contributor Author

Ayush1285 commented Sep 3, 2024

Let me know here when you want me to take a look again! I'm quite excited for this work - even if for no reason than to move to Eigen + 10% performance boost, since Eigen's release and support is much more known than xtensor's

I've completed the migration to Eigen. But we need to make sure that functionality-wise everything is correct or not. I'll run tests and ensure that all of them are passing. Meanwhile, you can take a look at the latest changes.

Copy link
Contributor

mergify bot commented Sep 3, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first look through (didn't analyze in detail the math in the critics, but higher level programming items first), but generally looks good to me with some details to answer!

nav2_mppi_controller/CMakeLists.txt Show resolved Hide resolved
nav2_mppi_controller/benchmark/optimizer_benchmark.cpp Outdated Show resolved Hide resolved
nav2_mppi_controller/src/optimizer.cpp Outdated Show resolved Hide resolved
nav2_mppi_controller/src/optimizer.cpp Outdated Show resolved Hide resolved
nav2_mppi_controller/src/optimizer.cpp Outdated Show resolved Hide resolved
nav2_mppi_controller/src/optimizer.cpp Show resolved Hide resolved
nav2_mppi_controller/src/optimizer.cpp Show resolved Hide resolved
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Sep 9, 2024

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Copy link
Contributor

mergify bot commented Sep 10, 2024

This pull request is in conflict. Could you fix it @Ayush1285?

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
@Ayush1285
Copy link
Contributor Author

I do want to get this in before EoY though and I've seen nothing that would stop that from happening.

Great, thanks!!.

Yes please :-) Can't merge and harder for folks to test without it

Done.

@SteveMacenski
Copy link
Member

I have 1 report from an ARM user that the path align critic is having issues, but no other critics / issues in the general solution. I took a look at path align and don't see anything weird, but it might be worth giving that another look if you had 15-20 minutes to re-scrutinize it or any unique APIs we use there that may be having ARM issues. I don't see anything but the strides.

To be noted, for the default bringup, the PathAlign critic and Cost critic are the only that use strides.

The only thing I noted was that int strided_traj_cols = data.trajectories.x.cols() / trajectory_point_step_ + 1; --> I don't know how Eigen strides handle out of bounds or if it checks it, if we +1 and that causes attempted access after the block.

The other use in the cost critic makes sure this isn't possible with int strided_traj_cols = floor(data.trajectories.x.cols() / trajectory_point_step_);

Even at a low weight, enabling it caused erratic motion. When disabled, the performance was acceptable. Unfortunately, we had an issue with that robot today and I couldn't continue testing so we don't have any performance metrics yet. I will be on leave until January 6th, after which I’ll continue working on this. Keep you posted.

Ayush1285 and others added 2 commits December 24, 2024 20:30
@Ayush1285
Copy link
Contributor Author

I have 1 report from an ARM user that the path align critic is having issues, but no other critics / issues in the general solution. I took a look at path align and don't see anything weird, but it might be worth giving that another look if you had 15-20 minutes to re-scrutinize it or any unique APIs we use there that may be having ARM issues. I don't see anything but the strides.

To be noted, for the default bringup, the PathAlign critic and Cost critic are the only that use strides.

The only thing I noted was that int strided_traj_cols = data.trajectories.x.cols() / trajectory_point_step_ + 1; --> I don't know how Eigen strides handle out of bounds or if it checks it, if we +1 and that causes attempted access after the block.

The other use in the cost critic makes sure this isn't possible with int strided_traj_cols = floor(data.trajectories.x.cols() / trajectory_point_step_);

Even at a low weight, enabling it caused erratic motion. When disabled, the performance was acceptable. Unfortunately, we had an issue with that robot today and I couldn't continue testing so we don't have any performance metrics yet. I will be on leave until January 6th, after which I’ll continue working on this. Keep you posted.

You're correct. Problem was in the calculation of strided_traj_cols. Since, first column in the trajectory will always be selected, the correct way should be: strided_traj_cols = floor((data.trajectories.x.cols() - 1) / trajectory_point_step_) + 1;.
Subtracting 1 for the first column and then adding it in the last.
For trajectories with even no. of columns, it was picking up one extra column with garbage values. It should be fixed now.

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
@SteveMacenski
Copy link
Member

Ah, glad we caught that! I just sent that update to the ARM user that was testing this -- lets see what they say (we're all getting back on the horse from the Holidays, so I imagine it'll be next week at this point)

@rickarmstrong
Copy link

@Ayush1285 FYI I took a crack at building the eigen_mppi branch (commit aab0b55497b6fe53fa4683a32d41379feffbd79b (HEAD -> eigen_mppi, origin/eigen_mppi). Strangely, building nav2_amcl failed with

Starting >>> nav2_amcl
--- stderr: nav2_amcl                                   
In file included from /root/tmp/nav2_ayushi_ws/src/navigation2/nav2_amcl/src/amcl_node.cpp:23:
/root/tmp/nav2_ayushi_ws/src/navigation2/nav2_amcl/include/nav2_amcl/amcl_node.hpp:32:10: fatal error: message_filters/subscriber.hpp: No such file or directory
   32 | #include "message_filters/subscriber.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This was in a Docker container created from image osrf/ros jazzy-desktop-full 8ea079870e70 4 weeks ago 4.42GB. Probably just something goofy in my setup, but figured I'd let you know anyhow.

@Ayush1285
Copy link
Contributor Author

@Ayush1285 FYI I took a crack at building the eigen_mppi branch (commit aab0b55497b6fe53fa4683a32d41379feffbd79b (HEAD -> eigen_mppi, origin/eigen_mppi). Strangely, building nav2_amcl failed with

Starting >>> nav2_amcl
--- stderr: nav2_amcl                                   
In file included from /root/tmp/nav2_ayushi_ws/src/navigation2/nav2_amcl/src/amcl_node.cpp:23:
/root/tmp/nav2_ayushi_ws/src/navigation2/nav2_amcl/include/nav2_amcl/amcl_node.hpp:32:10: fatal error: message_filters/subscriber.hpp: No such file or directory
   32 | #include "message_filters/subscriber.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This was in a Docker container created from image osrf/ros jazzy-desktop-full 8ea079870e70 4 weeks ago 4.42GB. Probably just something goofy in my setup, but figured I'd let you know anyhow.

I think it is due to because you're trying to build rolling branch of nav2 on Jazzy machine. More precisely, there are lot of C style headers, which were deprecated after launch of Jazzy. For example error that you are facing is due this PR: ros2/message_filters#135
And I'm not sure if this change is back ported to Jazzy.

@mikeferguson
Copy link
Contributor


And I'm not sure if this change is back ported to Jazzy.

The tf2 headers are back ported to humble and jazzy - but I'm not sure they've sync'd yet to public debs (and even if the have, would have been in the last week or so and so you need to make sure to update your debs)

@SteveMacenski
Copy link
Member

From the ARM testing folks at Kiwibot:

We tested the fix you mentioned on simulation and on a robot today and it seems to be working fine (including the PathAlignCritic), except for some oscillations that we think might require some tuning to fix. I will be working on that. Keep you posted.

It looks like that fix did the job. I'm helping them with some more tests and metrics gathering but its looking good :-)

@jdgalviss
Copy link

Thanks for this PR! We (Kiwibot) tested it on one of our robots running ROS 2 Iron on Ubuntu 22.04 (Jetson AGX, ARM64). The compilation issues we previously encountered with xtensor are now resolved.

Using the default bringup parameters, we observed the following performance metrics:

17.1 ms average controller execution time
55–65% CPU usage on a single core during navigation
25% CPU usage on a single core when idle
0.5% overall memory usage
As @SteveMacenski mentioned, the controller’s performance has been good, and the issue with the PathAlignCritic is now fixed.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jan 14, 2025

I've heard enough good news at this point (only 2 weeks late from EOY)!

@Ayush1285 please rebase / pull in main. If CI looks good, I'll merge! Thanks so much for your tireless work and effort here, this is a big improvement in performance and long term maintainability. Even better, this enables this to work well and effectively on Jetson board which were not previously possible due to their relatively weak CPUs. The metrics that @jdgalviss reports from Kiwibot are a major improvement and make it possible to use MPPI even on those boards for production without using insane amount of resources or only able to run at sub-20hz.

I can't be happier with how this came out and I look forward to any other contributions you'd be interested in helping out with in Nav2! 😄

@Ayush1285
Copy link
Contributor Author

Ayush1285 commented Jan 14, 2025

I've heard enough good news at this point (only 2 weeks late from EOY)!

@Ayush1285 please rebase / pull in main. If CI looks good, I'll merge! Thanks so much for your tireless work and effort here, this is a big improvement in performance and long term maintainability. Even better, this enables this to work well and effectively on Jetson board which were not previously possible due to their relatively weak CPUs. The metrics that @jdgalviss reports from Kiwibot are a major improvement and make it possible to use MPPI even on those boards for production without using insane amount of resources or only able to run at sub-20hz.

I can't be happier with how this came out and I look forward to any other contributions you'd be interested in helping out with in Nav2! 😄

I've merged it with the latest main. There are two unrelated test failures. One in collision monitor test:
"[ERROR] [1736830701.557335092] [collision_monitor]: Error while getting parameters: parameter 'observation_sources' is not initialized. Another one in the docking server test:
[opennav_docking-1] [INFO] [1736830869.904053683] [docking_server]: Creating bond (docking_server) to lifecycle manager. [INFO] [lifecycle_manager-2]: sending signal 'SIGINT' to process[lifecycle_manager-2]. It seems flaky failure to me.

I can't be happier with how this came out and I look forward to any other contributions you'd be interested in helping out with in Nav2! 😄

Sure, I'm open to more contributions.

Copy link
Contributor

mergify bot commented Jan 14, 2025

This pull request is in conflict. Could you fix it @Ayush1285?

@SteveMacenski
Copy link
Member

SteveMacenski commented Jan 14, 2025

Ugh, I'm sorry to do this to you, but it looks like there are merge conflicts from #4812 which changed the use of path to the actual goal pose to fix a bug in MPPI using the transformed path's end goal as the global goal and shortcutting some critics.

Otherwise, I tested and this is good to go 😄

Sure, I'm open to more contributions.

This was a pretty big undertaking - what's your feeling about working on MPPI some more or switching gears into another part of the codebase / new algorithms?

On the MPPI front, there are a few open tickets that would be really nice to hvae someone look at:

On other fronts:

  • I have some thoughts on some ideas for other controllers based on sampling of spirals and other feasible primitives that I haven't really had the time to flush out but could be really nice potential algorithms for users to have access to. For example: the graceful controller computes a smooth spiral from the robot pose to a goal pose from a set of hyper parameters. We could instead sample those hyper parameter space with the same fixed start and goal poses to find the 'best' spiral to get us from A->B minimizing some cost function of curvature, cost, length, etc. We could even have multiple time-steps of sampling as well to create more sophisticated maneuvers
  • There have been regular interests in creating AI-based costmap layers for object detection and semantic segmentation, but none of these efforts have culmanated in a ready-to-merge PR with a tested capability. This would be a high value and ROSCon worthy talk subject
  • [collision_monitor] Add temporal axis to min_points behavior #4364 - the collision monitor currently uses single-measurements to decide if the robot could be soon in collision, its been requested that it use a buffer of data so that a single erroneous measurement doesn't trigger the system. This is a pretty straight-forward sounding task that would probably involve some clever architectural changes and modifications if you're interested in a jigsaw puzzle

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Jan 15, 2025

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
Copy link
Contributor

mergify bot commented Jan 15, 2025

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
@Ayush1285
Copy link
Contributor Author

Ugh, I'm sorry to do this to you, but it looks like there are merge conflicts from #4812 which changed the use of path to the actual goal pose to fix a bug in MPPI using the transformed path's end goal as the global goal and shortcutting some critics.

I've fixed the merge conflicts. Let's wait for the CI to run. The optimizer_benchmark was missing a change; it was not updated with the latest evalControl() API changes.

This was a pretty big undertaking - what's your feeling about working on MPPI some more or switching gears into another part of the codebase / new algorithms?

I would like to contribute on other fronts. Playing with new control algorithms and improvising the costmap layer seems exciting ideas to me. Currently, I'm a little short on bandwidth but we can discuss/explore potential ideas for now.

Copy link
Contributor

mergify bot commented Jan 15, 2025

@Ayush1285, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

Signed-off-by: Ayush1285 <ayush.singhftw@gmail.com>
@SteveMacenski
Copy link
Member

I would like to contribute on other fronts. Playing with new control algorithms and improvising the costmap layer seems exciting ideas to me. Currently, I'm a little short on bandwidth but we can discuss/explore potential ideas for now.

Understood - want to chat a bit on the Nav2 Slack? By no means feel limited to those thoughts, the world's our oyster 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants