-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable #2652
Comments
Volta is not supported since 0.14 |
Thanks for your reply. Are there other ways that I can run Qwen2.5 with trtlllm in V100 GPU? |
@nv-guomingz Hi,I got the same questions with this command when using RTX4090 python3 /tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py --model_dir /root/7b \
--output_dir /root/converted/7b/f16/1gpu-int4_gptq \
--dtype auto \
--use_weight_only \
--tp_size 1 \
--weight_only_precision int4_gptq Traceback (most recent call last):
File "/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 335, in <module>
main()
File "/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 327, in main
convert_and_save_hf(args)
File "/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 283, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 290, in execute
f(args, rank)
File "/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 275, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(model_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/model.py", line 438, in from_hugging_face
loader.generate_tllm_weights(model, arg_dict)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 391, in generate_tllm_weights
self.load(tllm_key,
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 305, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1067, in postprocess
return postprocess_weight_only_groupwise(tllm_key, weights, torch_dtype,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/functional.py", line 937, in postprocess_weight_only_groupwise
torch.cat(weights[i::len(weights) // 3], dim=1)
TypeError: expected Tensor as element 0 in argument 0, but got NoneType
Exception ignored in: <function PretrainedModel.__del__ at 0x7f24e388fd80>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/modeling_utils.py", line 607, in __del__
self.release()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/modeling_utils.py", line 604, in release
release_gc()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_utils.py", line 533, in release_gc
torch.cuda.ipc_collect()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 968, in ipc_collect
_lazy_init()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 338, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
CUDA call was originally invoked at:
File "/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 7, in <module>
from transformers import AutoConfig
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System Info
-Libraries
-TensorRT-LLM branch : tag 0.16.0
-Versions of CUDA:12.1
-Container used : nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
-NVIDlA driver version: 530.30.02
Who can help?
@Tracin @kaiyux @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
.Qwen2.5-32B-Instruct-GPTQ-Int4
.convert_checkpoint.py
withpython3 convert_checkpoint.py --model_dir /root/models/Qwen2.5-32B-Instruct-GPTQ-Int4/ --output_dir /root/checkpoint/qwen2.5 --dtype float16 --use_weight_only --weight_only_precision int4_gptq --per_group --tp_size 2
Expected behavior
conver success.
actual behavior
additional notes
No.
The text was updated successfully, but these errors were encountered: