Is text-gen-webui able to load the new meta-llama_Llama-3.2-11B-Vision? & Cannot load multimodal ext #6412
Replies: 3 comments 7 replies
-
Hey? Anybody reads this? |
Beta Was this translation helpful? Give feedback.
-
Even if you loaded it, wouldn't oobabooga need to also add support for importing images for it to do anything? As I understand it Llama 3.2 "vision" models are about "image to text". Basically the opposite of stable diffusion. So you'd drag a photo into the (hypothetical) Web UI in the future, and then you could ask the text engine questions about it. |
Beta Was this translation helpful? Give feedback.
-
You never gave your command line. Which multimodal pipeline are you trying to use? For example on the WIKI page, there's a list of command lines invoking with different pipelines. Such as: Did you just retarget the model file like so? If so, did you try the original command line (from the WIKI) first as a baseline to see whether that worked? |
Beta Was this translation helpful? Give feedback.
-
I have the model, downloaded it manually, but I cannot load it, cause Transformers doesn't recognize its architecture called "mllama". I understand that 'm' in "mllama" means multimodal, so probably I'd need the multimodal extension, but the multimodal extension doesn't want to load either, with these errors:
Beta Was this translation helpful? Give feedback.
All reactions