Skip to content

This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end

License

Notifications You must be signed in to change notification settings

Supahands/llm-comparison-backend

Repository files navigation

LLM Comparison backend 💻

Technologies Getting Started DeploymentGGUF Converter API Endpoints Collaborators Contribute

This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end

📱 Visit this Project

💻 Technologies

  • Python 3.10+
  • Modal serverless GPU's
  • Poetry for dependency management
  • llama.cpp
  • HuggingFace Hub

🚀 Getting started

The majority of this project will be run via Modal services, meaning all of the building and dependency installation will be handled by that

Prerequisites

Here you list all prerequisites necessary for running your project. For example:

Cloning

How to clone your project

git clone https://github.com/Supahands/llm-comparison-backend.git

Starting

There are two components to this project, the ollama api server as well as the litellm server which will be what our frontend uses to connect to and retrieve different models from.

I have added both the applications into a single deploy file which can be run to allow both apps to be spun up at the same time using:

modal deploy --env dev deploy

Manual Deployment

Production Deploy:

modal deploy --env dev deploy

Local Testing:

modal serve --env dev deploy

🔄 GGUF Converter

Setup

  1. Create Modal secret:
modal secret create my-huggingface-secret HUGGING_FACE_HUB_TOKEN="your_token"
  1. Run converter:
modal run --detach hugging_face_to_guff.py \
  --modelowner tencent \
  --modelname Tencent-Hunyuan-Large \
  --quanttypes q8_0 \
  --username Supa-AI \
  --ollama-upload \
  --hf-upload \
  --clean-run
  • The --detach command is used to allow this program to run even if your terminal disconnects from the modal servers
  • modelowner is the repo owner that you are trying to get the model from
  • modelname is the exact name of the model from that model owner you want to convert
  • quanttype is the size of quantization, default is q8_0 which is the largest this supports
  • username is used to determine which account it should upload to and create a repo for
  • ollama-upload is a boolean determiner for whether it should also upload the newly created quantized models to ollama under your username.
    • Important note! Before uploading, make sure that the volume called ollama is created, once created you must run ollama serve on your own machine to retrieve the public and private sh keys to add to ollama, more details can be found here
  • hf-upload another boolean determiner on whether it should upload these models to your hugging face repo
  • clean-run is a boolean determiner on whether it should clean up all the existing model files in your ollama volume before running, this can fix issues where ollama won't let you re-run due to the model already existing in your volume from a previous run.

Technical Details

Storage

  • Uses Modal volumes (model-storage)
  • Persists between runs and should use existing models when running again (will continue downloads from what it has as well)
  • Supports large models (>10GB)

Features

  • Parallel downloads (8 connections) thanks to booday's hugging face downloader
  • Progress tracking with ETA
  • Two-step conversion:
    1. FP16 format
    2. Quantization (Q4_K_M, Q5_K_M etc)

⚠️ Disclaimer

Currently, we do not support Anthropic models (Claude) on the official site due to API costs. We are actively seeking sponsors to help integrate these models. If you have suggestions for implementing Anthropic models or would like to contribute, please open an issue!

We welcome any creative solutions or partnerships that could help bring Anthropic model support to this comparison platform.

Conversion Process

  • Uses llama.cpp for GGUF conversion
  • Two-step process:
    1. Convert to FP16 format
    2. Quantize to desired format (Q4_K_M, Q5_K_M etc)
  • Supports importance matrix for optimized quantization
  • Can split large models into manageable shards

🤝 Collaborators

Special thank you for all people that contributed for this project.

Noah Profile Picture
Noah Rijkaard
Noah Profile Picture
EvanZJ

📫 Contribute

Here you will explain how other developers can contribute to your project. For example, explaining how can create their branches, which patterns to follow and how to open an pull request

  1. git clone https://github.com/Supahands/llm-comparison-backend
  2. git checkout -b feature/NAME
  3. Follow commit patterns
  4. Open a Pull Request explaining the problem solved or feature made, if exists, append screenshot of visual modifications and wait for the review!

Documentations that might help

📝 How to create a Pull Request

💾 Commit pattern

About

This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published