Add the i32 dtype #2432

EricLBuehler · 2024-08-18T18:15:24Z

This PR adds DType::I32. Besides being a useful and more memory-efficient alternative to our I64 dtype, it is also commonly used in GPTQ and AWQ. If we were to implement loading those formats from safetensors (I have the code already for GPTQ, please let me know if that would be of interest!), they would be upcast to I64, and take up twice the memory.

LaurentMazare · 2024-08-18T18:31:56Z

The idea is to reduce the number of supported dtypes to keep the complexity low, we've added up i64 later than the others as we didn't have any signed int to start with but for now I don't think we want more.

EricLBuehler · 2024-08-18T18:33:49Z

Sounds good! I'll close this.

Qubitium · 2024-12-22T16:02:36Z

@LaurentMazare int32 is required for efficient awq/gptq loading and inference. Why load in int64 if int32 is optimal? If no other code uses it than perhaps gptq/awq, is it a valid reason to exclude it?

@EricLBuehler I am the maintainer for GPTQModel and would be very interested in helping as much as I can with gptq integration into candle.

(I have the code already for GPTQ, please let me know if that would be of interest!),

I would be very interested in helping to get this gptq code yours merged in a separate PR if possible. Have no background in rs but I can contribute in testing and validation.

EricLBuehler added 10 commits August 7, 2024 18:36

Add i32 dtype for cpu and cuda, with kernels

3eb9eb8

Fix cuda i32

b2c8160

Fix cpu i32

984ee1d

Add cuda map impls for i32

85cbb33

Start to add to metal

7e530ca

Add the kernels

87229dc

Oops

4eabbd9

Fix dtype cast in safetensors

bf837d9

Add bf16 to i32 and vice versa casts

cf2494b

Fixes

ca4b0d6

EricLBuehler closed this Aug 18, 2024

EricLBuehler deleted the dtype_i32 branch August 18, 2024 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the i32 dtype #2432

Add the i32 dtype #2432

EricLBuehler commented Aug 18, 2024

LaurentMazare commented Aug 18, 2024

EricLBuehler commented Aug 18, 2024

Qubitium commented Dec 22, 2024 •

edited

Loading

Add the i32 dtype #2432

Add the i32 dtype #2432

Conversation

EricLBuehler commented Aug 18, 2024

LaurentMazare commented Aug 18, 2024

EricLBuehler commented Aug 18, 2024

Qubitium commented Dec 22, 2024 • edited Loading

Qubitium commented Dec 22, 2024 •

edited

Loading