Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the i32 dtype #2432

Closed
wants to merge 10 commits into from
Closed

Conversation

EricLBuehler
Copy link
Member

This PR adds DType::I32. Besides being a useful and more memory-efficient alternative to our I64 dtype, it is also commonly used in GPTQ and AWQ. If we were to implement loading those formats from safetensors (I have the code already for GPTQ, please let me know if that would be of interest!), they would be upcast to I64, and take up twice the memory.

@LaurentMazare
Copy link
Collaborator

The idea is to reduce the number of supported dtypes to keep the complexity low, we've added up i64 later than the others as we didn't have any signed int to start with but for now I don't think we want more.

@EricLBuehler
Copy link
Member Author

Sounds good! I'll close this.

@EricLBuehler EricLBuehler deleted the dtype_i32 branch August 18, 2024 18:33
@Qubitium
Copy link

Qubitium commented Dec 22, 2024

@LaurentMazare int32 is required for efficient awq/gptq loading and inference. Why load in int64 if int32 is optimal? If no other code uses it than perhaps gptq/awq, is it a valid reason to exclude it?

@EricLBuehler I am the maintainer for GPTQModel and would be very interested in helping as much as I can with gptq integration into candle.

(I have the code already for GPTQ, please let me know if that would be of interest!),

I would be very interested in helping to get this gptq code yours merged in a separate PR if possible. Have no background in rs but I can contribute in testing and validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants