gemma 2 convert_checkpoint takes gpu ram more than needed #2647
Labels
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
System Info
a100
Who can help?
@kaiyux
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Expected behavior: load model once.
actual behavior
actual behavior: the model is loaded world-size times.
additional notes
the problem is in line 254 to 267 in https://github.com/NVIDIA/TensorRT-LLM/blob/v0.16.0/examples/gemma/convert_checkpoint.py:
The text was updated successfully, but these errors were encountered: