-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel Arc thread #3761
Comments
OK, so some notes:
|
Sorry, this is going to be a somewhat long post. I've been looking into this a bit but unlike with other areas of user-facing ML, the LLM community vs others communities involved in user-facing ML seems to have a lot more limited options in what can be used to get anything Intel working easily at full speed. For example, in the image generation space, it's much easier to slot in Intel's Extension for Pytorch (IPEX) because everyone is using Pytorch directly one way or another in the software projects and the extension is designed and intended to be pretty easy to insert into a project already using Pytorch. In stark comparison, backends in the LLM space do not use Pytorch directly, there's a lot of lower level programming going into C/C++ and custom libraries and model deployment due to performance considerations and the RAM needed to load these projects which were all but unavailable for the average consumer to acquire. So this means there is no "easy" option to slot in something which would make things easy with the lack of something like Pytorch in the picture. That wouldn't really be a problem if there is a lower level solution. However, we get into the real main issue which is that Intel is not taking the same path as AMD when it comes to CUDA compatibility. They have a different strategy they have been approaching with regards to what they have been doing as a hardware company for the last couple of years. They consolidated all their software and have unified them under something called oneAPI which is their intention to write something once and deploy everywhere in their ecosystem. That goes from anything higher level like Intel's Extension for Pytorch/TensorFlow to middleware libraries like oneMKL/oneDNN all the way down to Intel's compilers and runtime. As a result, there is nothing like HIP which Intel is providing to anyone (There is a community project called chipStar trying to take that approach but it still seems too early and when I tried it, it isn't ready to try and even start tackling complex projects). What Intel intends is for people to port their software directly from CUDA into SYCL, a Khronos standard that basically is like OpenCL but with C++ instead of which they had provided an automatic tool here to port over CUDA code. The intention is that the output of the conversion can then with very little effort be modified to support their SYCL extensions with DPC++ and pulling in their libraries which interface with SYCL and then this would be able to target everything Intel from CPU to GPU to FPGAs to custom hardware AI accelerators and etc. SYCL then either will get compiled down to Level-Zero, which is the actual API that will run on Intel's devices or it can compile into AMD ROCm and Nvidia's CUDA too which was announced by Codeplay last year. And as fallback, it will compile to OpenCL which everyone supports. As a result of the above, I would say that it would take some serious effort to get Intel GPUs working at the moment at full speed for anything. That is not to say it is impossible, but it would take either a new software project to make a backend or some sort of large patch to existing backends to make it happen. It's not like I don't see where Intel's coming from and if their vision actually works, things wouldn't be as difficult to deal with given a possible "write once run anywhere" approach. But as is at the moment, it's not tested enough for people to make that effort and it is very incompatible with CUDA and ROCm efforts even if the APIs roughly do the same thing. Using OpenCL if we're talking about Intel GPUs will get users about roughly halfway but it will never be as optimized as CUDA/ROCm and the extra effort needed to get that last portion of optimization even if CLBLast tomorrow can optimize their existing OpenCL code to run on Intel GPUs is a pretty dim prospect in my opinion. I have no clue what can be done about that in a planned fashion but that seems to be the situation at the moment. |
it appears that HF Transformers might support XPU now huggingface/transformers#25714 which would mean that even if nothing else works, this might. (no quants because no bitsandbytes, but that's also being worked on it seems here: bitsandbytes-foundation/bitsandbytes#747) |
I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61 The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader. |
Keep in mind Windows native does not work yet because Intel botched their release process and I suspect most people wanting to try would have that. So only Linux and WSL 2 for now. Windows also doesn't support Ahead of Time compilation in earlier version of the Windows Pytorch pip package too which makes running the first pass of anything painful. See intel/intel-extension-for-pytorch#398 and intel/intel-extension-for-pytorch#399 for more information. |
Intel always manages to botch something when it comes to Arc, so not surprised. Will test this out once i get my WSL2 install back working again. |
This doesn't work, it checks if CUDA is available and then uses the CPU, rather than trying the extension. |
For now, it seems like there are now unofficial Windows PIP packages available here that address both the issues I stated above from one of the WebUI contributors for getting IPEX working optimally on Windows natively. Install at your own risk knowing they are not from Intel and not official. |
intel extension for pytorch supports one version of pytorch and if we change to it in the one click installer file, it is downloading but when as per requirements file, the code is downloading same requirements file which is overwriting the exisiting supporting file and the system is unable to use the intel gpu. can anyone provide a get around to this problem. We need to check which pytorch version is compatible with the intel extension for pytorch module and download those versions only. |
Changed the one_click.py so that it downloads and installs the (hopefully) correct pytorch and torch packages, and created a requiremts.txt which may or may not be correct, for Intel Arc since there was none and also added the calls for them in one_click.py. It downloads and installs the packages but I am stuck at Installing extensions requirements. As soon as this part starts it seems to swtich back to CPU (!?) and installs nvidia packages and uninstalls the intel torch versions. Update: it looks like the requirements from the various extensions subfolders request the nvidia packages as dependencies for the required packages. |
New all-in-one Pytorch for Windows packages are available here which is preferable to the other packages I linked earlier as they had dependencies which couldn't easily be satisfied without a requirements.txt detailing them. There does seem to be a bug in the newest Windows drivers as seen in intel/intel-extension-for-pytorch#442, you have to revert to something older than version 4885. Version 4676 here is recommended as that was what was used to build the pip packages. |
Wouldn't it be easiest to make an option to compile llama.cpp with CLBlast? |
hello :) transformers giving me error after another clean install now i have this error
|
https://github.com/intel-analytics/BigDL/tree/main/python/llm
can this be used with webui and intel arc gpus? |
Seems that Intel has broken the pytorch extension for xpu repo and it's going to a HTTP site instead of https. But seeing other errors related to the PyTorch version: |
Hello, does it currently work with Intel Arc (on Arch Linux) without much of a problem? I can run Vladmir's automatic1111 on this computer, so maybe I think this could also run, but I am not sure. PS: I ran the installer and it exited with the following error:
|
As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work. In order to get the three necessary wheel files (torch 2.0.1a0, torchvision 0.15.2a0, intel_extension_for_pytorch 2.0.110+xpu) I had to download them as files from the URL provided, then install them with pip. This is not enough to get ARC support working. The answer still seems to be "it should work, in theory, but nobody's actually done it yet". |
Isn't a stupid move from Intel? I mean, Intel should have done their best to make their GPU work with the latest A.I stuff and help developers to achieve it, instead of focusing on games. These days, people constantly talk about A.I., not about triple-A 3D games. This kind of constant frustration with the A.I. apps makes me think about switching to NVidia (if they fix the damn Wayland problem). Any way, please let us know when it works again. |
The packages are there at https://developer.intel.com/ipex-whl-stable-xpu which you can browse, pip just isn't picking them up for whatever reason now with the URL. You need to manually install the packages or directly link the packages that are needed for install. For my Linux install, I had to do the following:
The package versions needed for install will vary depending on what OS platform and Python version is being used on your machine. |
It says that the environment is externally managed and try |
As of right now, there are 3 possible ways to get this to work with ARC GPUs:
|
As I posted in #3761 (comment), Windows does work with Intel Extension for Pytorch but you need to install a third party package since Intel does not do it at this time. Using the latest Windows drivers now work too. Intel has started in the issue tracker on GitHub they will Windows packaging soon. IPEX is also due for an update soon. |
I was under the impression there were still driver issues, but if it works now that's great. |
I'm not sure if this is the right place to post this. I receive the below error after installing OobaBooga using the default Arc install option on Windows. The install seemed to go well but running it results in the below DLL load error. Other threads that mentioned this loading error suggested it might be a PATH issue. I tried adding a few paths to the OS environment but couldn't resolve it. Any suggestions? It's an Arc A770 on Windows 10. Intel® Graphics Driver 31.0.101.5081/31.0.101.5122 (WHQL Certified). I also tried rolling back to driver 4676 and doing a clean install with the same results. Some of the paths I added were those listed here. I'm also not seeing any of the DLL's listed at that link in those directories. Instead, I have intel-ext-pt-gpu.dll and intel-ext-pt-python.dll in "%PYTHON_ENV_DIR%\lib\site-packages\intel_extension_for_pytorch\bin" and no DLL's in "%PYTHON_ENV_DIR%\lib\site-packages\torch\lib". backend_with_compiler.dll is there.
|
I updated the code and run it again (did not do anything else). This time, it passed the previous crash "No matching distribution found for torch==2.0.1a0", but after downloading a lot of stuff, it crashed with the following. If I run the script again, I get the same output as below again.
|
@HubKing Run "source /opt/intel/oneapi/setvars.sh" and try again. If you don't have it, make sure to install the oneAPI Basekit. |
#5191 would fix most of the env issues for IPEX. |
It builds now but on starting I get the attached errors The Nvidia version runs just fine. (same version, rebuilt, both builds tested from C drive (logs are D drive build but same errors)) Running Windows Server 2019 |
Same here. Adding 'share' flag does remove localhost error message, but when I try to get through localhost or even a gradio link, it loads a blank screen. Basically links works, but there's nothing on them. |
It now builds and interface loads from main branch version. Not sure how to run models from the card though, AWQ and GPTQ don't work at all and error out and GGUF just works from the CPU. |
I'm running an Intel Arc A770 as a non-display GPU on Ubuntu 23.10. (Intel i7-13700k handles the display.) Selecting the Intel GPU option during oobabooga's first run did not load models to the GPU. In case anyone else experiences this problem, here's what worked for me. This assumes the following in Ubuntu:
Intel suggests several different ways to initialize OneApi. Per their directions, I added the following line to
This eliminates the error The Intel extension for pytorch was correctly installed along with all of the other dependencies. No issues there, but it still wasn't loading anything in the GPU. To fix this, I needed to recompile llama-cpp-python. I'm leaving the below for now because it did eliminate some errors. However, it's a mirage. It's not actually using the GPU.
For the cmake arguments, I used llama.cpp's Intel OneMKL arguments. And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:
|
Did you use intel-gpu-top to verify that it is actually using the GPU? |
I'm getting some really odd intel-gpu-top results. It blips when the model loads and then does nothing, leading me to think this is another mirage. In comparison, in llama.cpp, Blitter hits 80% with 30 layers on the same model. But that's compiled with clblast and needs platform and device environment variables. |
I found the same thing. Using -DLLAMA_BLAS_VENDOR=Intel10_64lp doesn't actually offload the processing to the Intel GPU. I compiled with clblast and that actually was using my ARC GPU but the LLM was spitting out gibbish. Still some bug hunting needed. |
So after spending a few hours experimenting with llama.cpp and llama-cpp-python, I got them both running on the gpu last night. I got oobabooga running on the Intel arc gpu a few minutes ago. This is using llama-2-7b-chat.Q5_K_M.gguf with llama.cpp and 30 n-gpu-layers. No gibberish and it corrected the grammar error in the prompt. :) I'm not sure how user-friendly we'll be able to make running this nor have I stress tested this beyond a few pithy prompts. For reference, I'm using Ubuntu 23.10 (mantic). To compile with clblast, I needed libclblast-dev >= 1.6.1-1 and the most recent stable Intel drivers. I'm happy to dig into the dependencies more, if needed. (The below assumes you've run ./start_linux.sh for the first time.) Step 1Open 2 terminals. In the first, run
In the second, run
Here's the output from my system. As you can see, conda doesn't know a GPU exists. Ubuntu output:
Inside conda:
Note, installing olc-icd-system in conda (the semi-official fix) did not work. Step 2Conda needs your system's OpenCL vendor .icd files. On Ubuntu, these are at In terminal, cd into the Run
This deletes conda's OpenCL vendors directory, recreates it, and then creates symlinks to Ubuntu's icd files.
Recheck conda's clinfo.
My output is now:
The platform numbers are different from what they are in Ubuntu, which changes llama.cpp's GGML_OPENCL_PLATFORM environment variable. (For now, just paste the output somewhere. You'll need it in a minute.) Step 3Recompile llama-cpp-python in the .\cmd_linux.sh terminal.
Step 4In terminal (not .\cmd_linux.sh), cd into the text-generation-webui directory if you're not still there. Go to conda' clinfo -l output and note the platform number for your graphics card and the card name beside it's device. You don't need the full name, just the letters and number. I'm using this bit:
Edit your platform number and device name. Then run the exports in the terminal.
It worked. Admittedly, it's not as snappy as running llama2-7b in BigDL on the same GPU, but it's a massive speed improvement over the cpu. On my system, this only works if I use the exports to tell it what to use. I don't know if you'll need to do that on a system that only has one display option. (I'm using the cpu for display.) Oobabooga was a fresh download. |
Draft Guide for Running Ooobabooga on Intel ArcMore eyes and testers are needed before considering submission to the main repository. Installation NotesAlthough editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option. Working Model Loaders
The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDL model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.) Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community. Models Tested
What Isn't Tested
Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension. Install Notes
The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates. Test Machine Details
Bash ScriptsBelow are 2 bash scripts: Getting Started
Both the scripts below were uploaded to github. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py. install_arch.sh
run_arch.sh
|
@kcyarn Great work on getting XPU/OpenCL more integrated with text-generation-webui! |
Tried this in WSL running Ubuntu 22.04, here are some notes:
Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are. |
It sounds like either the GPU isn't passing through to WSL2 or there's a missing dependency.
Which version of Ubuntu are you using on WSL2? I'm the using the most recent release, not the LTS, because the newer kernels work better with this card. You may want to try upgrading the release.
Have you tried this Intel guide to get the card running in WSL2?
https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html
It'll be a few days before I can run any WSL2 tests.
…________________________________
From: thejacer ***@***.***>
Sent: Sunday, January 28, 2024 5:30:34 AM
To: oobabooga/text-generation-webui ***@***.***>
Cc: Kristle Chester ***@***.***>; Mention ***@***.***>
Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)
Draft Guide for Running Ooobabooga on Intel Arc
More eyes and testers are needed before considering submission to the main repository.
Installation Notes
Although editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option.
Working Model Loaders
* llama.cpp
* transformers
The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDL<https://github.com/intel-analytics/BigDL> model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.)
Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community.
Models Tested
* transformers
* llama2-7b-chat-hf
* mistralai_Mistral-7B-Instruct-v0.2
* llama.cpp
* llama-2-7b-chat.Q5_K_M.gguf
* mistral-7b-instruct-v0.2.Q5_K_M.gguf
What Isn't Tested
* Most models
* Training
* Parameters
* Extensions
* Regular use beyond "does it load and run a few simple prompts"
Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension.
Install Notes
* Latest Intel Arc drivers installed. See Intel client GPU installation docs.<https://dgpu-docs.intel.com/driver/client/overview.html>
* Intel OneAPI basekit installed<https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>
* Install opencl-headers ocl-icd libclblast-dev python3 python3-pip python3-venv libgl1 libglib2.0-0 libgomp1 libjemalloc-dev
Note: libclblast-dev >= 1.6
* Your username is part of the renderer group.
* You have hangcheck disabled in grub.
The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates.
Test Machine Details
* Ubuntu 23.10
* 6.5.0.14.16 generic linux
* i7-13700k CPU (runs the display)
* Intel Arc A770 (non-display)
Bash Scripts
Below are 2 bash scripts: install_arch.sh and run_arch.sh. They need to be saved or symlinked to the text-generation-webui directory.
Getting Started
1. Download or clone a fresh copy of Oobabooga.
2. Save the below scripts into text-generation-webui. These should be in the same folder as one_click.py, cmd_linux.sh, etc.
3. Make them executable.
cd text-generation-webui
./install_arch.sh
4. Check clinfo for your hardware information.
clinfo -l
5. In run_arc.sh, find GGML_OPENCL_PLATFORM and change it to your platform number. Then change the GGML_OPENCL_DEVICE to your device name. Save the file.
6. Start the server with run_arch.sh. This uses any flags you've saved in CMD_FLAGS.txt. You can also use flags like --listen --api with the script.
./run_arch.sh
Both the scripts below were uploaded to github<https://github.com/kcyarn/oobabooga_intel_arc>. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py.
install_arch.sh
#!/bin/bash
# Check if the virtual environment already exists
if [[ ! -d "venv" ]]; then
# Create the virtual environment
python -m venv venv
fi
# Activate the virtual environment
source venv/bin/activate
# Intel extension for transformers recently added Arc support.
# See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies.
# Working model loaders:
# - llama.cpp
# - transformers
pip install intel-extension-for-transformers
# Install xpu intel pytorch, not cpu.
pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# Installing these from requriements_cpu_only.txt causes dependency with intel pytorch.
# Install a few of the dependencies for the below.
pip install coloredlogs datasets sentencepiece
pip install --no-deps peft==0.7.* optimum==1.16.* optimum-intel accelerate==0.25.*
# Skip llama-cpp-python install and all installed above without their deps.
grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt
pip install -r temp_requirements.txt
# Install the cpuinfo dependency installed by one_click
pip install py-cpuinfo==9.0.0
# Use the correct cmake args for llama-cpp
export CMAKE_ARGS="-DLLAMA_CLBLAST=ON"
export FORCE_CMAKE=1
pip install --no-cache-dir llama-cpp-python
# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
cd extensions
extensions=() # Create an empty array to store folder names
# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2)
for folder in */; do
extensions+=($folder)
done
echo "${extensions[*]}"
install_extensions=()
for ext in "${extensions[@]}"; do
should_exclude=false
for exclude_ext in "${exclude_extensions[@]}"; do
if [[ "$ext" == *"$exclude_ext"* ]]; then
should_exclude=true
break
fi
done
if [ "$should_exclude" = false ]; then
install_extensions+=("$ext")
fi
done
# Print the install_extensions
# echo "${install_extensions[@]}"
for extension in ${install_extensions[@]}; do
cd "$extension"
echo -e "\n\n$extension\n\n"
# Install dependencies from requirements.txt
if [ -e "requirements.txt" ]; then
echo "Installing requirements in $dir"
pip install -r requirements.txt
else
echo "No requirements.txt found in $dir"
fi
cd ..
done
# Leave the extension directory.
cd ..
# Delete the temp_requirements.txt file.
rm temp_requirements.txt
run_arch.sh
#!/bin/bash
# Uncomment if oneapi is not in your .bashrc
# source /opt/intel/oneapi/setvars.sh
# Activate virtual environment built with install_arc.sh. (Not conda!)
source venv/bin/activate
# Change these values to match your card in clinfo -l
# Needed by llama.cpp
export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770
# Use sudo intel_gpu_top to view your card.
# Capture command-line arguments
flags_from_cmdline=$@
# Read flags from CMD_FLAGS.txt
flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$')
# Combine flags from both sources
all_flags="$flags_from_file $flags_from_cmdline"
# Run the Python script with the combined flags
python server.py $all_flags
Tried this in WSL running Ubuntu 22.04, here are some notes:
1. libclblast-dev >= 1.6 - this package is only available via default repos in 1.6+ on Ubuntu 23.10 (might be available on other flavors idk)
2. I was able to go grab and install the 1.6 .deb from the repos plus a libclblast1 package listed as dependency from the repos and install them.
3. After following your instructions on a new Ubuntu "python -m venv venv" wouldn't work, I had to change it to "python3 -m venv venv"
4. Despite no errors other than what I've outlined here I still get 0 platforms for clinfo
5. Again despite no errors other than what's above I get "OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory" when I run run_arch.sh
Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are.
—
Reply to this email directly, view it on GitHub<#3761 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVH5H5R22HQ7FL5AKTOWLYQYSEVAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU2DQMJRG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I'm sorry work required I make a short (no) notice trip out of town and I can't experiment remotely cause it might shut down my system. I'll be back in town in a day or two and able to start working on it again. Regarding the WSL version, I was using WSL 1 just because the WSL instructions for oobabooga said to use WSL1 for Windows 10 and WSL2 for Windows 11. |
I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however:
clpeak indicates 512 compute units etc. but Oobabooga fails to find my device. EDIT: activated the venv and from within it was able to run clinfo -l with the same results as above and clpeak also sees gpu with 512 compute units as well. I honestly don't understand because intel_gpu_top also says there's no gpu installed. |
Please double check that you've got all the drivers installed in WSL2. See https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-0/configure-wsl-2-for-gpu-workflows.html
Then run clinfo and change the script to use your numbers.
I rarely use Windows 11, but I do have it installed. Windows 10 is in a virtual machine. If I have time, I'll see if I can get it running.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: thejacer ***@***.***>
Sent: Sunday, February 4, 2024 6:12:34 AM
To: oobabooga/text-generation-webui ***@***.***>
Cc: Kristle Chester ***@***.***>; Mention ***@***.***>
Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)
I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however:
clinfo -l
Platform #0: Intel(R) OpenCL Graphics
-- Device #0: Intel(R) Graphics [0x56a0]
clpeak indicates 512 compute units etc.
but Oobabooga fails to find my device.
—
Reply to this email directly, view it on GitHub<#3761 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVH5BXA6XPOQQP5ABPVDDYR5UKFAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVG4YDKNBRGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Just a thought. Run clinfo without -l and check the entire output for the graphics card. That's probably the easier than double checking the entire install.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: thejacer ***@***.***>
Sent: Sunday, February 4, 2024 6:12:34 AM
To: oobabooga/text-generation-webui ***@***.***>
Cc: Kristle Chester ***@***.***>; Mention ***@***.***>
Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)
I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however:
clinfo -l
Platform #0: Intel(R) OpenCL Graphics
-- Device #0: Intel(R) Graphics [0x56a0]
clpeak indicates 512 compute units etc.
but Oobabooga fails to find my device.
—
Reply to this email directly, view it on GitHub<#3761 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVH5BXA6XPOQQP5ABPVDDYR5UKFAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVG4YDKNBRGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
clinfo definitely sees my GPU and has it correctly at 16GB vram.
That is the current set up in run_arch.sh but intel_gpu_top is still not finding my gpu.
I've seen some comments online that the OpenGL renderer string shouldn't be llvm if I'm using gpu but I haven't figured out how to change that yet. |
That's good news.
Have you edited run_arc.sh to use your opencl values?
Instead of
```
export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770
```
It needs to be something like
```
export GGML_OPENCL_PLATFORM=0
export GGML_OPENCL_DEVICE=Intel(R)
```
…________________________________
From: thejacer ***@***.***>
Sent: Sunday, February 4, 2024 1:26:05 PM
To: oobabooga/text-generation-webui ***@***.***>
Cc: Kristle Chester ***@***.***>; Mention ***@***.***>
Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761)
Platform Name Intel(R) OpenCL Graphics
Number of devices 1
Device Name Intel(R) Graphics [0x56a0]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Device UUID 8680a056-0800-0000-0300-000000000000
Driver UUID 32332e33-352e-3237-3139-312e34320000
Valid Device LUID No
Device LUID 4005-9721ff7f0000
Device Node Mask 0
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 23.35.27191.42
Device OpenCL C Version OpenCL C 1.2
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0)
__opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0)
__opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0)
__opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0)
__opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0)
Latest conformance test passed v2023-05-16-00
Device Type GPU
Device PCI bus info (KHR) PCI-E, 0000:03:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 512
Max clock frequency 2400MHz
Device IP (Intel) 0x30dc008 (12.220.8)
Device ID (Intel) 22176
Slices (Intel) 8
Sub-slices per slice (Intel) 8
EUs per sub-slice (Intel) 8
Threads per EU (Intel) 8
Feature capabilities (Intel) DP4A, DPAS
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (device) 64
Preferred work group size multiple (kernel) 64
Max sub-groups per work group 128
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
External memory handle types DMA buffer
Global memory size 16704737280 (15.56GiB)
clinfo definitely sees my GPU and has it correctly at 16GB vram.
—
Reply to this email directly, view it on GitHub<#3761 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACVH5EKJJFUXGCUEMUVUVTYR7HD3AVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVHA3TGMZQGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Edited above, sorry.
…On Sun, Feb 4, 2024 at 12:32 PM Kristle Chester ***@***.***> wrote:
That's good news.
Have you edited run_arc.sh to use your opencl values?
Instead of
```
export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770
```
It needs to be something like
```
export GGML_OPENCL_PLATFORM=0
export GGML_OPENCL_DEVICE=Intel(R)
```
________________________________
From: thejacer ***@***.***>
Sent: Sunday, February 4, 2024 1:26:05 PM
To: oobabooga/text-generation-webui ***@***.***>
Cc: Kristle Chester ***@***.***>; Mention ***@***.***>
Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue
#3761)
Platform Name Intel(R) OpenCL Graphics
Number of devices 1
Device Name Intel(R) Graphics [0x56a0]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Device UUID 8680a056-0800-0000-0300-000000000000
Driver UUID 32332e33-352e-3237-3139-312e34320000
Valid Device LUID No
Device LUID 4005-9721ff7f0000
Device Node Mask 0
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 23.35.27191.42
Device OpenCL C Version OpenCL C 1.2
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0)
__opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0)
__opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0)
__opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0)
__opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0)
Latest conformance test passed v2023-05-16-00
Device Type GPU
Device PCI bus info (KHR) PCI-E, 0000:03:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 512
Max clock frequency 2400MHz
Device IP (Intel) 0x30dc008 (12.220.8)
Device ID (Intel) 22176
Slices (Intel) 8
Sub-slices per slice (Intel) 8
EUs per sub-slice (Intel) 8
Threads per EU (Intel) 8
Feature capabilities (Intel) DP4A, DPAS
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (device) 64
Preferred work group size multiple (kernel) 64
Max sub-groups per work group 128
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
External memory handle types DMA buffer
Global memory size 16704737280 (15.56GiB)
clinfo definitely sees my GPU and has it correctly at 16GB vram.
—
Reply to this email directly, view it on GitHub<
#3761 (comment)>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AACVH5EKJJFUXGCUEMUVUVTYR7HD3AVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVHA3TGMZQGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APWKRXXT2NVYLZHZIGZKGQTYR7H3TAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVHA3TIOJSHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I now have oobabooga llama.cpp (gguf only) working in WSL2 Ubuntu 22.04. This uses the older backend clblast. The newer ones are really nice, but I went with what I was familiar with. I've added everything to the wsl_scripts folder at oobabooga_intel_arc. Given the complexity on the wsl side, Docker might be the best direction for this one. Here's a screenshot showing it using the gpu with WSL2 on Windows 11. You may need the insiders version on Windows 10. |
Amazing. I’ll run this tonight. If there’s anything you want me to test
please let me know.
…On Tue, Feb 6, 2024 at 8:10 PM Kristle Chester ***@***.***> wrote:
Edited above, sorry.
… <#m_2065921319814216459_>
I now have oobabooga llama.cpp (gguf only) working in WSL2 Ubuntu 22.04.
This uses the older backend clblast. The newer ones are really nice, but I
went with what I was familiar with. I've added everything to the
wsl_scripts folder at oobabooga_intel_arc
<https://github.com/kcyarn/oobabooga_intel_arc>.
Given the complexity on the wsl side, Docker might be the best direction
for this one.
Here's a screenshot showing it using the gpu with WSL2 on Windows 11. You
may need the insiders version on Windows 10.
Screenshot.2024-02-07.013213.png (view on web)
<https://github.com/oobabooga/text-generation-webui/assets/349172/23ff6268-42d6-4f0f-820d-c105de7f6f00>
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APWKRXWAGGPXSEEBKPIYX4LYSLPBZAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZRGEZDGMJRGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
All of this installed new packages:
This was new:
And this installed new packages:
I also added all of those lines, which were all missing, to the bottom of my .bashrc. With those changes I was able to see new information for my cpu and integrated graphics when running clinfo, I could see my gpu when running vainfo and my renderer string is now my A770 when I run glxinfo | grep OpenGL. I still can't see my gpu when I run intel_gpu_top though. All of this resulted in a .gguf loading into my gpu(!) without rebuilding clblast when running on the text gen ui I tried setting up days ago. It still utilized about 80% of my cpu and only ~20% gpu when running inference though. About to rebuild clblast and see what how it goes. No change after rebuilding clblast and llama.cpp. I might have messed this part up though, I got lost in the comments on that block. I'll keep trying. |
Is there a native windows solution? |
Not that I'm aware of. Theoretically, it's possible to install native Windows python and the Intel drivers and then use the Linux install without Anaconda shell scripts as a guide to install and run using pip. It depends on whether the Intel drivers support Windows for the necessary libraries and whether there are wheels. If you want to give it a go, I'd start with llama.cpp. If you can get it running natively on the Windows side, move on to llama-cpp-python. Once you have that running (I used a jupyter notebook when I was troubleshooting this), then you have the foundation for oobabooga. The WSL2 solutions work, but they're really slow. I suspect WSL needs a major kernel update. It flies in Ubuntu, which is my daily driver. |
I tried compiling llama-cpp-python for sycl and replacing llama-cpp-python for webui, but it didn't work |
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
This thread is dedicated to discussing the setup of the webui on Intel Arc GPUs.
You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all Intel Arc users.
The text was updated successfully, but these errors were encountered: