-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why doesn't my Kubernetes node recognize the GPU after successfully installing my drivers and Containerd? #788
Comments
First, the value of default_runtime_name in containerd should be nvidia. After setting the value, you need to follow the documentation to enable GPU Support in Kubernetes Just one command. $ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml Remember to restart containerd and kubelet |
same error with a tesla card running a rancher deployment:
EDIT: after reviewing more documetnation and fixing some issues within my config file, I get a different error:
|
Please ask if this problem has been solved? |
Can someone share any findings on this issue? I've spent the entire last weekend to get this working. But can't seem to make it work. |
I patched daemonset kubectl -n nvidia-device-plugin patch ds nvdp-nvidia-device-plugin \
--type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args", "value": ["--device-discovery-strategy=tegra"]}]' This is equivalent to manually specifying the detection strategy to |
Thanks @MasonXon ! It worked. |
I had to set the default_runtime_name to nvidia, like @ZYWNB666 recommend. After manually editing |
Adding
|
This worked for me, but GPUs show up on all my nodes, even ones without GPUs..... |
Maybe |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
1. Quick Debug Information
2. Issue or feature description
Why doesn't my Kubernetes node recognize the GPU after successfully installing my drivers and Containerd?
This is the content of /etc/containerd/config.toml.
This is the content of nvidia-smi
but
Tasks
The text was updated successfully, but these errors were encountered: