Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: update to CUDA 12.3 #3956

Merged
merged 6 commits into from
Nov 13, 2023
Merged

build: update to CUDA 12.3 #3956

merged 6 commits into from
Nov 13, 2023

Conversation

esteve
Copy link
Contributor

@esteve esteve commented Nov 2, 2023

Description

This PR updates the ansible scripts CUDA to 12.3.

Fixes #3943

Related links

Tests performed

Notes for reviewers

Interface changes

Effects on system behavior

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

  • The PR follows the pull request guidelines.
  • The PR has been properly tested.
  • The PR has been reviewed by the code owners.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

  • There are no open discussions or they are tracked via tickets.
  • The PR is ready for merge.

After all checkboxes are checked, anyone who has write access can merge the PR.

Signed-off-by: Esteve Fernandez <esteve.fernandez@tier4.jp>
@esteve esteve requested a review from xmfcx November 2, 2023 14:20
@esteve esteve mentioned this pull request Nov 2, 2023
6 tasks
@esteve esteve marked this pull request as draft November 2, 2023 14:25
@esteve
Copy link
Contributor Author

esteve commented Nov 2, 2023

Back to draft, I hadn't realized the TensorRT packages are in a separate Ansible script

…e script

Signed-off-by: Esteve Fernandez <esteve.fernandez@tier4.jp>
@esteve esteve marked this pull request as ready for review November 2, 2023 14:29
@esteve
Copy link
Contributor Author

esteve commented Nov 2, 2023

@xmfcx this is ready for review now.

@esteve esteve enabled auto-merge (squash) November 2, 2023 15:26
ansible/roles/cuda/README.md Outdated Show resolved Hide resolved
@xmfcx
Copy link
Contributor

xmfcx commented Nov 2, 2023

I will try to test it tomorrow, thank you for your efforts!

Signed-off-by: Esteve Fernandez <esteve.fernandez@tier4.jp>
@esteve
Copy link
Contributor Author

esteve commented Nov 2, 2023

@xmfcx thanks for the prompt review. I've incorporated your feedback and pushed a new commit. I'm runnning this locally, though I won't be able to fully test it, just make sure that it can be built.

@xmfcx
Copy link
Contributor

xmfcx commented Nov 2, 2023

It seems there is:

# Override amd64's settings

We might need to override the TensorRT version.

I've checked that sbsa repository contains:

But it doesn't contain:

  • tensorrt-dev_8.6.1.6-1+cuda12.0

Instead, it contains:

  • tensorrt-dev_8.6.2.2-1+cuda12.0

So, we might need to add:

  • tensorrt_version=8.6.2.2-1+cuda12.0
    to the arm64.env file.

Also we need to check where in the code it does the overriding.

@esteve
Copy link
Contributor Author

esteve commented Nov 2, 2023

@xmfcx do we need the packages from the sbsa repository? 8.6.1.6-1+cuda12.0 is in the x86_64 repository.

@esteve
Copy link
Contributor Author

esteve commented Nov 2, 2023

@xmfcx do we need the packages from the sbsa repository? 8.6.1.6-1+cuda12.0 is in the x86_64 repository.

Just to reply my own question, I think we do need the packages from sbsa for ARM-based non-Jetson platforms.

@xmfcx
Copy link
Contributor

xmfcx commented Nov 2, 2023

Signed-off-by: Esteve Fernandez <esteve.fernandez@tier4.jp>
@esteve
Copy link
Contributor Author

esteve commented Nov 3, 2023

@xmfcx I'm building this branch on an ARM virtual machine, so far it seems to be using the correct version of TensorRT for ARM (8.6.2.2-1+cuda12.0)

@esteve
Copy link
Contributor Author

esteve commented Nov 3, 2023

Spoke too soon, the cuda-nvprof-12-3 package doesn't seem to be available for ARM, only cuda-nvprof-11-7

Signed-off-by: Esteve Fernandez <esteve.fernandez@tier4.jp>
Signed-off-by: Esteve Fernandez <esteve.fernandez@tier4.jp>
@esteve
Copy link
Contributor Author

esteve commented Nov 6, 2023

@xmfcx I've built Autoware in an ARM VM and CUDA gets detected and the perception packages are built, I assume the result is correct.

@xmfcx
Copy link
Contributor

xmfcx commented Nov 13, 2023

Thanks for your efforts @esteve on this PR.
The changes look alright. To make sure nothing is wrong with the CI, I've started:

and will merge when they are done.

@esteve
Copy link
Contributor Author

esteve commented Nov 13, 2023

@xmfcx it seems that both jobs passed! 🥳

Copy link
Contributor

@xmfcx xmfcx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎊🪩🎊🪩🎊

@esteve esteve merged commit a08fc46 into main Nov 13, 2023
18 checks passed
@kaspermeck-arm
Copy link

Is it possible to downgrade to 12.2 to match https://developer.nvidia.com/embedded/jetpack?

@xmfcx
Copy link
Contributor

xmfcx commented Dec 11, 2023

@kaspermeck-arm we could, this PR didn't make any changes in the code.
Only changed the dependencies.

For Jetson devices, I assume they won't install CUDA separately since it would come with the JetPack.

And even looking at the changes here, as long as it is CUDA 12, they seem to be playing well with each other.
So keeping it same shouldn't be a problem.

cuda_version=12.3
cudnn_version=8.9.5.29-1+cuda12.2
tensorrt_version=8.6.1.6-1+cuda12.0

Do you think we should still change the version?

@kaspermeck-arm
Copy link

@xmfcx
That's good to know, so technically setting cuda_version to 12.2 wouldn't interfere with any CUDA applications in Autoware.

We wouldn't install the driver on the Jetson platforms, but we would deploy CUDA capable containers on it. The driver is backwards compatible, i.e., you can run CUDA 12.1 on the 12.2 driver. Doing it the other way around can lead to complications.

Yes, I do think we should downgrade to 12.2 but only if we don't explicitly are using any features from 12.3. This should simplify deployment on the Jetson platforms.

@oguzkaganozt @ambroise-arm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to CUDA 12
3 participants