Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp error: 'llama_model_loader: failed to load model' on Snapdragon X Elite with Q4_0_4_8 models #306

Open
JakoDel opened this issue Jan 15, 2025 · 1 comment

Comments

@JakoDel
Copy link

JakoDel commented Jan 15, 2025

this happens with everything set to default. ctx 4096, 0 layers offloaded to GPU, 9 threads, FA disabled, etc.

LM Studio 0.3.6

🥲 Failed to load the model

Failed to load model

llama.cpp error: 'llama_model_loader: failed to load model from C:\Users\user\.lmstudio\models\bartowski\QwQ-32B-Preview-GGUF\QwQ-32B-Preview-Q4_0_4_8.gguf
'

edit: this seems to be an issue with all q4_0_4_8 models as this is happening with llama 3.2 3b too. 3b q8 runs painfully slow but fine.

@JakoDel JakoDel changed the title llama.cpp error: 'llama_model_loader: failed to load model' on Snapdragon X Elite with QwQ Q4_0_4_8 llama.cpp error: 'llama_model_loader: failed to load model' on Snapdragon X Elite with Q4_0_4_8 models Jan 15, 2025
@MovGP0
Copy link

MovGP0 commented Jan 15, 2025

I have a Snapdragon X Elite myself running llama.cpp v1.8.0.

Q4_0 and Q4_K_M quantization seems to works fine, while q4_0_4_8 is not working.

Note

Most models from HuggingFace are not working.
Models under staff picks , which are not marked as Likely too large for this machine, seem to work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants