Skip to content

This repository contains the local web application poc for the CodeMatch system. It enables users to input a code snippet and determine whether it is a clone or an original, using the final LLM-based similarity detection system.

Notifications You must be signed in to change notification settings

codematch-llm/system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeMatch - System

The system repository hosts the final application of CodeMatch. In this repository, we outline the complete workflow, from retrieving code from the web to detecting code clones for a given code snippet. The system is divided into three core services: Backend, Frontend, and Vector Database.

🖥️ User Interface Overview

  • Main Page: Enter the desired code snippet to find existing GitHub projects with similar code.

    Inputted Code
  • Similar GitHub Projects Page: Displays all the GitHub projects with code similar to the input.

    Search Results

⚙️ General Components

The system consists of two main components essential for its operation:

System Workflow

This includes the structure of the backend and frontend, along with their integration with the database.

Workflow

Populate Database

This step involves retrieving code projects from GitHub to populate the database with data.

Workflow

(This step is done in the following process - populate_database.py)

📦🛠️ Installation and Run

📥 Prerequisites:

  1. Python: 3.9+

  2. Docker Desktop:
    Download Docker Desktop

  3. Qdrant:
    Download and Run Qdrant – Follow the Download and Run section for installation.

  4. Node.js and npm:
    a. Install from Node.js – Keep the option checked to install necessary tools.
    b. Verify installation:

    npm -v
  5. Clone the Repository:

    git clone https://github.com/codematch-llm/system.git
  6. Acquire Access to The-Stack-V2 Dataset:
    a. Get access to The-Stack-V2 dataset
    b. Create a Hugging Face personal access token
    c. Add the token to the .env file in the root directory:

    HUGGING_FACE_TOKEN=<paste your token here>
    

🐳 Running with Docker

  1. Ensure you are in the project root directory (system).

  2. Start all services:

    docker-compose up
  3. Populate the Database:

    cd backend
    $env:PYTHONPATH = (Get-Location).Path  # Ensure the path ends with 'backend' (check with `Write-Output $env:PYTHONPATH`)
    python populate_database.py

💻 Running Locally

Installation

  1. Backend:

    cd backend
    pip install -r requirements.txt
  2. Frontend:

    cd frontend
    npm install

Run

  1. Ensure you are in the project root directory (ends with system).

  2. Terminal 1: Start the frontend:

    cd frontend
    npm run serve
  3. Terminal 2: Start the backend:

    $env:PYTHONPATH = (Get-Location).Path  # Ensure the path ends with 'backend' (check with `Write-Output $env:PYTHONPATH`)
    cd backend
    uvicorn backend.main:app --reload
  4. Terminal 3: Start the Qdrant database:
    If installed separately, run Qdrant per its documentation. If a script (qdrant_server.py) is included in this project:

    cd backend
    python qdrant_server.py
  5. Populate the Vector Database:

    cd backend
    python populate_database.py

About

This repository contains the local web application poc for the CodeMatch system. It enables users to input a code snippet and determine whether it is a clone or an original, using the final LLM-based similarity detection system.

Topics

Resources

Stars

Watchers

Forks