flowaicom · sariola · Oct 8, 2024 · Oct 6, 2024 · Oct 7, 2024 · Oct 7, 2024
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -0,0 +1,48 @@
+name: Python package
+
+on:
+  push:
+    branches: [ "feat/llamafile" ]
+  pull_request:
+    branches: [ "feat/llamafile" ]
+
+jobs:
+  build:
+    runs-on: self-hosted
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10", "3.11", "3.12"]
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v5
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        python -m pip install .[dev,vllm,hf,llamafile]
+    - name: Lint with ruff
+      run: |
+        ruff check . || true
+    - name: Format with black
+      run: |
+        black --check --diff . || true
+    - name: Sort imports with isort
+      run: |
+        isort --check-only --diff . || true
+    - name: Test with pytest and generate coverage
+      run: |
+        pytest --cov=./  --junitxml=junit.xml
+    - name: Upload coverage to Codecov
+      uses: codecov/codecov-action@v4
+      with:
+        token: ${{ secrets.CODECOV_TOKEN }}
+        fail_ci_if_error: true
+    - name: Upload test results to Codecov
+      if: ${{ !cancelled() }}
+      uses: codecov/test-results-action@v1
+      with:
+        token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.gitignore b/.gitignore
@@ -45,3 +45,8 @@ output/
 
 # data
 data/
+
+.cache
+
+flake.nix
+flake.lock
diff --git a/README.md b/README.md
@@ -1,4 +1,5 @@
 # `flow-judge`
+
 <p align="center">
   <img src="img/flow_judge_banner.png" alt="Flow Judge Banner">
 </p>
@@ -16,6 +17,27 @@
   <code>flow-judge</code> is a lightweight library for evaluating LLM applications with <code>Flow-Judge-v0.1</code>.
 </p>
 
+<p align="center">
+<a href="https://github.com/flowaicom/flow-judge/stargazers/" target="_blank">
+    <img src="https://img.shields.io/github/stars/flowaicom/flow-judge?style=social&label=Star&maxAge=2592000" alt="GitHub stars">
+</a>
+<a href="https://github.com/flowaicom/flow-judge/releases" target="_blank">
+    <img src="https://img.shields.io/github/v/release/flowaicom/flow-judge?color=white" alt="Release">
+</a>
+<a href="https://www.youtube.com/@flowaicom" target="_blank">
+    <img alt="YouTube Channel Views" src="https://img.shields.io/youtube/channel/views/UCo2qL1nIQRHiPc0TF9xbqwg?style=social">
+</a>
+<a href="https://github.com/flowaicom/flow-judge/actions/workflows/python-package.yml" target="_blank">
+    <img src="https://github.com/flowaicom/flow-judge/actions/workflows/python-package.yml/badge.svg" alt="Build">
+</a>
+<a href="https://codecov.io/gh/flowaicom/flow-judge" target="_blank">
+    <img src="https://codecov.io/gh/flowaicom/flow-judge/branch/feat%2Fllamafile/graph/badge.svg?token=AEGC7W3DGE" alt="Code coverage">
+</a>
+<a href="https://github.com/flowaicom/flow-judge/blob/main/LICENSE" target="_blank">
+    <img src="https://img.shields.io/static/v1?label=license&message=Apache%202.0&color=white" alt="License">
+</a>
+</p>
+
 ## Model
 `Flow-Judge-v0.1` is an open, small yet powerful language model evaluator trained on a synthetic dataset containing LLM system evaluation data by Flow AI.
 
@@ -41,22 +63,31 @@ pip install 'flash_attn>=2.6.3' --no-build-isolation
 ```
 
 Extras available:
-- `dev` for development dependencies
-- `hf` for Hugging Face Transformers support
-- `vllm` for vLLM support
+- `dev` to install development dependencies
+- `hf` to install Hugging Face Transformers dependencies
+- `vllm` to install vLLM dependencies
+- `llamafile` to install Llamafile dependencies
 
 ## Quick Start
 
 Here's a simple example to get you started:
 
 ```python
-from flow_judge.models.model_factory import ModelFactory
-from flow_judge.flow_judge import EvalInput, FlowJudge
+from flow_judge import Vllm, Llamafile, Hf, EvalInput, FlowJudge
 from flow_judge.metrics import RESPONSE_FAITHFULNESS_5POINT
 from IPython.display import Markdown, display
 
-# Create a model using ModelFactory
-model = ModelFactory.create_model("Flow-Judge-v0.1-AWQ")
+# If you are running on an Ampere GPU or newer, create a model using VLLM
+model = Vllm()
+
+# If you have other applications open taking up VRAM, you can use less VRAM by setting gpu_memory_utilization to a lower value.
+# model = Vllm(gpu_memory_utilization=0.70)
+
+# Or if not running on Ampere GPU or newer, create a model using no flash attn and Hugging Face Transformers
+# model = Hf(flash_attn=False)
+
+# Or create a model using Llamafile if not running an Nvidia GPU & running a Silicon MacOS for example
+# model = Llamafile()
 
 # Initialize the judge
 faithfulness_judge = FlowJudge(
@@ -88,14 +119,57 @@ display(Markdown(f"__Feedback:__\n{result.feedback}\n\n__Score:__\n{result.score
 
 ## Usage
 
-### Supported Model Types
+### Inference Options
+
+The library supports multiple inference backends to accommodate different hardware configurations and performance needs:
+
+1. **vLLM**:
+   - Best for NVIDIA GPUs with Ampere architecture or newer (e.g., RTX 3000 series, A100, H100)
+   - Offers the highest performance and throughput
+   - Requires CUDA-compatible GPU
+
+   ```python
+   from flow_judge import Vllm
+
+   model = Vllm()
+   ```
+
+2. **Hugging Face Transformers**:
+   - Compatible with a wide range of hardware, including older NVIDIA GPUs
+   - Supports CPU inference (slower but universally compatible)
+   - It is slower than vLLM but generally compatible with more hardware.
+
+    If you are running on an Ampere GPU or newer:
+   ```python
+   from flow_judge import Hf
+
+   model = Hf()
+   ```
+
+   If you are not running on an Ampere GPU or newer, disable flash attention:
+   ```python
+   from flow_judge import Hf
+
+   model = Hf(flash_attn=False)
+   ```
+
+3. **Llamafile**:
+   - Ideal for non-NVIDIA hardware, including Apple Silicon
+   - Provides good performance on CPUs
+   - Self-contained, easy to deploy option
+
+   ```python
+   from flow_judge import Llamafile
+
+   model = Llamafile()
+   ```
+
+Choose the inference backend that best matches your hardware and performance requirements. The library provides a unified interface for all these options, making it easy to switch between them as needed.
 
-- Hugging Face Transformers (`hf_transformers`)
-- vLLM (`vllm`)
 
 ### Evaluation Metrics
 
-Flow-Judge-v0.1 was trained to handle any custom metric that can be expressed as a combination of evaluation criteria and rubric, and required inputs and outputs.
+`Flow-Judge-v0.1` was trained to handle any custom metric that can be expressed as a combination of evaluation criteria and rubric, and required inputs and outputs.
 
 #### Pre-defined Metrics
 
@@ -114,21 +188,20 @@ For efficient processing of multiple inputs, you can use the `batch_evaluate` me
 ```python
 # Read the sample data
 import json
-from flow_judge.models.model_factory import ModelFactory
-from flow_judge.flow_judge import EvalInput, FlowJudge
+from flow_judge import Vllm, EvalInput, FlowJudge
 from flow_judge.metrics import RESPONSE_FAITHFULNESS_5POINT
 from IPython.display import Markdown, display
 
-# Create a model using ModelFactory
-model = ModelFactory.create_model("Flow-Judge-v0.1-AWQ")
+# Initialize the model
+model = Vllm()
 
 # Initialize the judge
 faithfulness_judge = FlowJudge(
     metric=RESPONSE_FAITHFULNESS_5POINT,
     model=model
 )
 
-# Load data
+# Load some sampledata
 with open("sample_data/csr_assistant.json", "r") as f:
     data = json.load(f)
 
@@ -157,13 +230,9 @@ for i, result in enumerate(results):
 
 ## Advanced Usage
 
-### Model configurations
 > [!WARNING]
 > There is a reported issue with Phi-3 models that produces gibberish outputs with contexts longer than 4096 tokens, including input and output. This issue has been recently fixed in the transformers library so we recommend using the `Flow-Judge-v0.1_HF` model configuration for longer contexts at the moment. For more details, refer to: [#33129](https://github.com/huggingface/transformers/pull/33129) and [#6135](https://github.com/vllm-project/vllm/issues/6135)
 
-We currently support vLLM engine (recommended) and Hugging Face Transformers.
-
-We are working on adding API-based usage as well as better options for CPU.
 
 ### Custom Metrics
-Original file line number
+Diff line change
@@ Expand Up / @@ -45,3 +45,8 @@ output/ @@
     # data
     data/
+    .cache
+    flake.nix
+    flake.lock