LLM Comparison backend 💻

Technologies Getting Started Deployment • GGUF Converter API Endpoints Collaborators Contribute

This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end

📱 Visit this Project

💻 Technologies

Python 3.10+
Modal serverless GPU's
Poetry for dependency management
llama.cpp
HuggingFace Hub

🚀 Getting started

The majority of this project will be run via Modal services, meaning all of the building and dependency installation will be handled by that

Prerequisites

Here you list all prerequisites necessary for running your project. For example:

Python ^10.10
Modal pip package
Poetry for dependency management

Cloning

How to clone your project

git clone https://github.com/Supahands/llm-comparison-backend.git

Starting

There are two components to this project, the ollama api server as well as the litellm server which will be what our frontend uses to connect to and retrieve different models from.

I have added both the applications into a single deploy file which can be run to allow both apps to be spun up at the same time using:

modal deploy --env dev deploy

Manual Deployment

Production Deploy:

modal deploy --env dev deploy

Local Testing:

modal serve --env dev deploy

🔄 GGUF Converter

Setup

Create Modal secret:

modal secret create my-huggingface-secret HUGGING_FACE_HUB_TOKEN="your_token"

Run converter:

modal run --detach hugging_face_to_guff.py \
  --modelowner tencent \
  --modelname Tencent-Hunyuan-Large \
  --quanttypes q8_0 \
  --username Supa-AI \
  --ollama-upload \
  --hf-upload \
  --clean-run

The --detach command is used to allow this program to run even if your terminal disconnects from the modal servers
modelowner is the repo owner that you are trying to get the model from
modelname is the exact name of the model from that model owner you want to convert
quanttype is the size of quantization, default is q8_0 which is the largest this supports
username is used to determine which account it should upload to and create a repo for
ollama-upload is a boolean determiner for whether it should also upload the newly created quantized models to ollama under your username.
- Important note! Before uploading, make sure that the volume called ollama is created, once created you must run ollama serve on your own machine to retrieve the public and private sh keys to add to ollama, more details can be found here
hf-upload another boolean determiner on whether it should upload these models to your hugging face repo
clean-run is a boolean determiner on whether it should clean up all the existing model files in your ollama volume before running, this can fix issues where ollama won't let you re-run due to the model already existing in your volume from a previous run.

Technical Details

Storage

Uses Modal volumes (model-storage)
Persists between runs and should use existing models when running again (will continue downloads from what it has as well)
Supports large models (>10GB)

Features

Parallel downloads (8 connections) thanks to booday's hugging face downloader
Progress tracking with ETA
Two-step conversion:
1. FP16 format
2. Quantization (Q4_K_M, Q5_K_M etc)

⚠️ Disclaimer

Currently, we do not support Anthropic models (Claude) on the official site due to API costs. We are actively seeking sponsors to help integrate these models. If you have suggestions for implementing Anthropic models or would like to contribute, please open an issue!

We welcome any creative solutions or partnerships that could help bring Anthropic model support to this comparison platform.

Conversion Process

Uses llama.cpp for GGUF conversion
Two-step process:
1. Convert to FP16 format
2. Quantize to desired format (Q4_K_M, Q5_K_M etc)
Supports importance matrix for optimized quantization
Can split large models into manageable shards

🤝 Collaborators

Special thank you for all people that contributed for this project.

_{Noah Rijkaard}

_EvanZJ

📫 Contribute

Here you will explain how other developers can contribute to your project. For example, explaining how can create their branches, which patterns to follow and how to open an pull request

git clone https://github.com/Supahands/llm-comparison-backend
git checkout -b feature/NAME
Follow commit patterns
Open a Pull Request explaining the problem solved or feature made, if exists, append screenshot of visual modifications and wait for the review!

Documentations that might help

📝 How to create a Pull Request

💾 Commit pattern

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.github/workflows		.github/workflows
.vscode		.vscode
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ai_router.py		ai_router.py
const.py		const.py
deploy.py		deploy.py
entrypoint.sh		entrypoint.sh
hugging_face_to_guff.py		hugging_face_to_guff.py
ollama_service.py		ollama_service.py
ollama_upload_entrypoint.sh		ollama_upload_entrypoint.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Comparison backend 💻

💻 Technologies

🚀 Getting started

Prerequisites

Cloning

Starting

Manual Deployment

🔄 GGUF Converter

Setup

Technical Details

Storage

Features

⚠️ Disclaimer

Conversion Process

🤝 Collaborators

📫 Contribute

Documentations that might help

About

Releases

Packages

Contributors 2

Languages

License

Supahands/llm-comparison-backend

Folders and files

Latest commit

History

Repository files navigation

LLM Comparison backend 💻

💻 Technologies

🚀 Getting started

Prerequisites

Cloning

Starting

Manual Deployment

🔄 GGUF Converter

Setup

Technical Details

Storage

Features

⚠️ Disclaimer

Conversion Process

🤝 Collaborators

📫 Contribute

Documentations that might help

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages