Technologies Getting Started Deployment • GGUF Converter API Endpoints Collaborators Contribute
This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end
- Python 3.10+
- Modal serverless GPU's
- Poetry for dependency management
- llama.cpp
- HuggingFace Hub
The majority of this project will be run via Modal services, meaning all of the building and dependency installation will be handled by that
Here you list all prerequisites necessary for running your project. For example:
- Python ^10.10
- Modal pip package
- Poetry for dependency management
How to clone your project
git clone https://github.com/Supahands/llm-comparison-backend.git
There are two components to this project, the ollama
api server as well as the litellm
server which will be what our frontend uses to connect to and retrieve different models from.
I have added both the applications into a single deploy file which can be run to allow both apps to be spun up at the same time using:
modal deploy --env dev deploy
Production Deploy:
modal deploy --env dev deploy
Local Testing:
modal serve --env dev deploy
- Create Modal secret:
modal secret create my-huggingface-secret HUGGING_FACE_HUB_TOKEN="your_token"
- Run converter:
modal run --detach hugging_face_to_guff.py \
--modelowner tencent \
--modelname Tencent-Hunyuan-Large \
--quanttypes q8_0 \
--username Supa-AI \
--ollama-upload \
--hf-upload \
--clean-run
- The
--detach
command is used to allow this program to run even if your terminal disconnects from the modal servers modelowner
is the repo owner that you are trying to get the model frommodelname
is the exact name of the model from that model owner you want to convertquanttype
is the size of quantization, default isq8_0
which is the largest this supportsusername
is used to determine which account it should upload to and create a repo forollama-upload
is a boolean determiner for whether it should also upload the newly created quantized models to ollama under your username.- Important note! Before uploading, make sure that the volume called
ollama
is created, once created you must runollama serve
on your own machine to retrieve the public and private sh keys to add to ollama, more details can be found here
- Important note! Before uploading, make sure that the volume called
hf-upload
another boolean determiner on whether it should upload these models to your hugging face repoclean-run
is a boolean determiner on whether it should clean up all the existing model files in your ollama volume before running, this can fix issues where ollama won't let you re-run due to the model already existing in your volume from a previous run.
- Uses Modal volumes (model-storage)
- Persists between runs and should use existing models when running again (will continue downloads from what it has as well)
- Supports large models (>10GB)
- Parallel downloads (8 connections) thanks to booday's hugging face downloader
- Progress tracking with ETA
- Two-step conversion:
- FP16 format
- Quantization (Q4_K_M, Q5_K_M etc)
Currently, we do not support Anthropic models (Claude) on the official site due to API costs. We are actively seeking sponsors to help integrate these models. If you have suggestions for implementing Anthropic models or would like to contribute, please open an issue!
We welcome any creative solutions or partnerships that could help bring Anthropic model support to this comparison platform.
- Uses llama.cpp for GGUF conversion
- Two-step process:
- Convert to FP16 format
- Quantize to desired format (Q4_K_M, Q5_K_M etc)
- Supports importance matrix for optimized quantization
- Can split large models into manageable shards
Special thank you for all people that contributed for this project.
Noah Rijkaard |
EvanZJ |
Here you will explain how other developers can contribute to your project. For example, explaining how can create their branches, which patterns to follow and how to open an pull request
git clone https://github.com/Supahands/llm-comparison-backend
git checkout -b feature/NAME
- Follow commit patterns
- Open a Pull Request explaining the problem solved or feature made, if exists, append screenshot of visual modifications and wait for the review!