0.43.2: significant QLoRA mem savings due to bug fix, CUDA 12.5 support #1291

Titus-von-Koeller announced in Announcements

Titus-von-Koeller
Jul 23, 2024
Maintainer

0.43.2

This release is quite significant as the QLoRA bug fix big implications for higher seqlen and batch sizes.

For each sequence (i.e. batch size increase of one) we expect memory savings of:

405B: 39GB for seqlen 1024, and 4888GB for 128k
70B: 20.1GB for 1024 and 2516GB for 128k

This was due to activations being unnecessary for frozen parameters, yet the memory for them was still erroneously allocated due to the fixed bug.

Improvements:

docs: FSDP+QLoRA and CPU install guide ([docs] Clarify FSDP-QLoRA #1211 cpu install guide #1227, thanks @stevhliu)
Add CUDA 12.5 and update 12.4 builds (Add CUDA 12.5 and update 12.4 builds #1284)

Bug Fixes

4bit getstate and 8bit deepcopy (FIX_ Prevent __getstate__ from mutating Params4bit #1230 FIX Make Int8Params deepcopy-able #1231, thanks @BenjaminBossan)
missing optimizers in str2optimizer32bit (Add "lamb" to str2optimizer32bit #1222, thanks @EtienneDosSantos)
CUDA 12.5 build issue (Fix CUDA 12.5 build issue #1273, thanks @HennerM)
fix for min_8bit_size functionality in Optimizer base classes (Edenzzzz's fix for min_8bit_size functionality in Optimizer base classes #1286, thanks @Edenzzzz)
QLoRA mem bug (chore: delete useless buffered activation #1270, thanks @Ther-nullptr)
tests for cpu only platforms (Fixed tests for cpu only platforms #1259, thanks @galqiwi)
restoration of quant_storage for CPU offloading (Fixes for quant_storage and CPU offloading #1279)
optim update error with non-contiguous grads/params (deepspeed) (Fixed optim update error with non-contiguous grads/params #1187)

This discussion was created from the release 0.43.2: significant QLoRA mem savings due to bug fix, CUDA 12.5 support.

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment