Replies: 2 comments
-
Already exists |
Beta Was this translation helpful? Give feedback.
-
There is no way to run out of ram, it is pretty cheap and you can easily put like 256GB into your PC if you use registered ecc ram The problem is that ram bandwidth is not enough, on x86 architecture cpu. as one memory stick bandwidth even at ddr5 is like 30Gb/s 7gb/s is almost nothing useful unless you are fine with 0.1 tokens/s even on 70b model You can split model over several GPUs but even there you will have some penalty to transfer data between those GPUs |
Beta Was this translation helpful? Give feedback.
-
We are able to split models in half and generate the first part on one PC, then transfer the activation and generate the rest of the model on another PC. If we only have one PC, we could offload the unused part into system RAM. However, let's imagine that our model is so large that this approach is not feasible, or we simply don't have enough space in system RAM. Instead, we could read the current model part from storage. This would slow the process down , but at least the model would be able to run regardless. In the future, models will become much larger, necessitating more system/V RAM. If we consider NVMe storage, the model part switching could become quite fast. With PCIe Gen 4, we can read at 7GB/s, meaning we can transfer almost 24 GB of data into VRAM in just 3 seconds. With PCIe Gen 5, we will achieve double that speed. Additionally, NVMe drives are much easier to upgrade than GPU / VRAM. And if the CPU becomes a bottleneck, we might be able to utilize direct storage technology to load the model even faster into VRAM, bypassing the CPU. what do u think about that?
Tldr:
Split models into many parts. Only load a part into vram. Use nvme drives and direct storage to make model switching faster. Just save the activations and simular needed data for the next part. Do this for all parts. This will make it possible to run giant models in low vram gpus.
Beta Was this translation helpful? Give feedback.
All reactions