Feat/llamafile: adding llamafile as engine & ModelFactory mechanism rewrite suggestion & haystack parsing/write enchancements #10

sariola · 2024-10-06T08:24:20Z

Added:

Llamafile as engine
- Spawns a server with robust checking implementation
- Llamafile downloading
- Uses with-context to spawn and clean the server process during usage on-demand
- Handles nested contexts and context and no-context invocations subsequently
- Intelligent checking if server already spawned
- Generate features
  - Uses OpenAI as the client library
  - Sync has been tested with Haystack integration and it passes parsing 20/20, generate and batch_generate.
  - Caveat: for batch_generate we need to make changes in batch_evaluate at the higher abstraction level since this approach needs the server and pass it within the context, where in the previous implementation the batch_evaluate just calls generate repeatedly. We check for instance Llamafile and use the functionality of the class.
  - TODO: Test async generate and async batch generate. Untested and also might not make sense for most systems that this llamafile approach targets.
ModelConfig can be used directly to instantiate an engine
- TODO: Show how this can be used, in a notebook example
- TODO: Show how ModelConfig templates can be imported and altered conveniently in a notebook
- This change implies removing ModelFactory and having that very slight logic in the init of the engine classes instead. This is my suggestion. This also helps with the importing and auto-completion of the engines by the user, removing the referring by string.
- If this approach is something that is liked by our dev, then TODO: to update the inits of the VLLM and HF engines to benefit and standardize on the same.

Standard usage, with-context spawns the server and automatically cleans it up.

This also works without the context because both generate and batch_generate ensure a server exists and select the already running one. Additionally Llamafile class exposes useful methods for checking and controlling the server if you want to use it in a custom way.

Creating a custom configured engine looks like this

Haystack parsing was throwing errors
- Fixed a parsing case where score existed within the feedback tags and/or the latter feedback tag didn't exist
- Fixed the code refusing to produce results in a case where you had parsing errors even if boolean False was given to stop producing in the case of parsing errors
- Added try block to result writer

This PR also implies the usage of OpenAI client for the Baseten sync and async functionality as it shares the same OpenAI API as does Llamafile and incurs no custom work to implement other than for the async batching to receive results via the proxy.

Additional todos to the already mentioned:

Create a few unit tests
Test on MacOS (8 GB and 16 GB RAM)
Potential improvements to the init values based on VRAM & RAM available
Test on CPU only (will be very slow)

bergr7 · 2024-10-07T09:53:22Z

REMINDER: We must update all notebooks and readme to accommodate the ModelFactory replacement.

cc: @sariola

bergr7 · 2024-10-07T10:27:10Z

Another thing that should be fixed with the ModelFactory replacement is that currently, the model factory forces you to have all the extras installed. No bueno

… original

sariola · 2024-10-07T11:14:01Z

Reverted haystack.py and flow-judge.py: Notebook works still normally like it should ✔️

Kept the parsing addition where score can be found inside or outside feedback tags.

…sync & async

… classes, and can take sensible args

bergr7 · 2024-10-07T16:02:42Z

REMINDER - Update model cards on HF

Python package workflow

codecov · 2024-10-07T20:04:28Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

sariola · 2024-10-07T21:16:26Z

Okay seems like so far the shortcut imports from init.py seem to be forcing to install all extras. Will need to fix this in the morning.

For now just power through it with python -m pip install -e .[dev,vllm,hf,llamafile]

Additional

Added self-host github actions test runner in tsukuyomi (Hetzner) which we can have gpu enabled for E2E tests, runs right now python 3.10/3.11/3.12 and calculates code coverage & creates the batches for README. Pretty neat.

Quite a bit of tests to create to close coverage gap but we could storm through that collaboratively I think. Registered also to TestPypi and we can use my Pypi account that we used for autoeval to publish to the index.

sariola · 2024-10-07T21:46:49Z

Note to self:

Macbook with 8GB RAM and VSCode open needs -ctk q4_0 -ctv q4_0 -fa on line 213 in llamafile.py
nkvo enable/disable does not work currently due to args not passed correctly but it seems generating works even without it using 7.2 GB out of 8 GB RAM, if we use q4 cache and flash attention on Metal. Evaluation scores parse correctly! 🎉

bergr7 · 2024-10-08T04:17:38Z

Okay seems like so far the shortcut imports from init.py seem to be forcing to install all extras. Will need to fix this in the morning.

For now just power through it with python -m pip install -e .[dev,vllm,hf,llamafile]
### Additional Added self-host [github actions](https://github.com/flowaicom/flow-judge/actions) test runner in tsukuyomi (Hetzner) which we can have gpu enabled for E2E tests, runs right now [python 3.10/3.11/3.12](https://github.com/flowaicom/flow-judge/blob/feat/llamafile/.github/workflows/python-package.yml) and calculates code coverage & creates the batches for README. Pretty neat.
Quite a bit of tests to create to close coverage gap but we could storm through that collaboratively I think. Registered also to TestPypi and we can use my Pypi account that we used for autoeval to publish to the index.

Ahh I see. Sorry that didn't pay enough attention to these imports.

bergr7 · 2024-10-08T04:17:53Z

Starting the review already

bergr7 · 2024-10-08T04:21:59Z

Added self-host github actions test runner in tsukuyomi (Hetzner) which we can have gpu enabled for E2E tests, runs right now python 3.10/3.11/3.12 and calculates code coverage & creates the batches for README. Pretty neat.

Quite a bit of tests to create to close coverage gap but we could storm through that collaboratively I think. Registered also to TestPypi and we can use my Pypi account that we used for autoeval to publish to the index.

This is great to include model tests as well, not just code tests.

sariola · 2024-10-08T06:01:07Z

Great!!

Completely untested on the async & llamaindex tutorial, I'll also get into that right away.

bergr7

Overall looks very solid!

I added minor comments here and there.

I executed Haystack with success but the regex is resulting in many parsing errors. I haven't pushed the executed notebook.

README.md

flow_judge/models/common.py

flow_judge/models/huggingface.py

flow_judge/utils/result_writer.py

tests/README.md

examples/1_quickstart.ipynb

…t kv + fa

bergr7

Approved ✅

Only a minor comment on them metadata file extension

flow_judge/utils/result_writer.py

sariola · 2024-10-08T11:20:43Z

Approved ✅

Only a minor comment on them metadata file extension

Thanks I'll double check that and also I'll check the llamafile server lingering to make it dependent on the existence of the kernel process and create a trap exit for that to take care of edge cases.

Edit: Checked it out and also I think now with 11514bd we have more robust server clean up as the server spawns as part of a child process group which stays tied to the existence of the object and that group is wiped out.

llamafile & modelfactory removal suggestion & haystack parsing

7af7b1b

sariola requested a review from bergr7 October 6, 2024 08:24

sariola marked this pull request as draft October 7, 2024 06:24

revert flow_judge.py and haystack.py & make eval_data_types closer to…

ca85a43

… original

Base automatically changed from feat/haystack-integration to main October 7, 2024 11:19

sariola added 5 commits October 7, 2024 14:50

revert haystack notebook & first edits for vllm and hf; combine vllm …

761f3d6

…sync & async

model configs into engine files; model types and base into common

f4f7124

changed imports from modelfactory to direct init

fad0f23

harmonized the configs and provided structure where they are extended…

b34a64b

… classes, and can take sensible args

updated notebooks to load the models using the new init

4a3ad23

sariola and others added 8 commits October 7, 2024 19:22

added import checks for the extras and reverted eval_data_types parsing

50c173d

Create python-package.yml

afa3660

Merge pull request #12 from flowaicom/python-package-workflow

3276739

Python package workflow

ruff format & isort run

fe6aa2f

ruff format & isort run & test [dev,vllm,hf,llamafile]

64c411c

updated readme & added tests readme with icicle viz

0aa829a

updated codecov action

a0421c2

added test results upload to codecov

cbf8075

sariola added 7 commits October 7, 2024 23:05

upgrade actions setup-python to v5

7641e03

update youtube badge

985ff41

test codecov badge

ed6ace7

python versions badge

edf91ef

rm py versions badge

2c630ec

clean up a misplaced title

1d028b5

realign

84b7e61

chore: update readme

60036bc

sariola and others added 2 commits October 8, 2024 10:04

init fix for extras

3435564

chore: executed notebooks + minor update

992ca46

bergr7 requested changes Oct 8, 2024

View reviewed changes

sariola added 10 commits October 8, 2024 11:26

standardized genparams & non-supported model warning & llamafile quan…

a54bbee

…t kv + fa

add torch to llamafile extra as dep

6a6b772

add gpu check into ci flow

bad2065

fixed tests README

3dd4bf3

fixed redundant vllm import error

f0598d3

fixed model to model_id in vllm and llamafile

1b56a93

fixed llamafile args quoting

2d7f16b

fixed metadata file writing to json from jsonl

5b6719d

fixed default model name for Llamafile & tests graph to starburst

5968caa

small fix in the readme from old usage of Flow-Judge-v0.1_HF to Hf()

91937b8

sariola marked this pull request as ready for review October 8, 2024 11:01

bergr7 approved these changes Oct 8, 2024

View reviewed changes

flow_judge/utils/result_writer.py Outdated Show resolved Hide resolved

sariola added 3 commits October 8, 2024 15:05

testing out llamafile server cleanup from abrupt situations

11514bd

fixed gen params passing for vllm and hf init

c309fc9

change vllm default param dtype back to bfloat16 from auto

ebc9ef6

sariola merged commit b20ca8d into main Oct 8, 2024
7 checks passed

gdevakumar mentioned this pull request Oct 22, 2024

ModuleNotFoundError: No module named 'flow_judge.models.model_factory' #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/llamafile: adding llamafile as engine & ModelFactory mechanism rewrite suggestion & haystack parsing/write enchancements #10

Feat/llamafile: adding llamafile as engine & ModelFactory mechanism rewrite suggestion & haystack parsing/write enchancements #10

sariola commented Oct 6, 2024 •

edited

Loading

bergr7 commented Oct 7, 2024

bergr7 commented Oct 7, 2024

sariola commented Oct 7, 2024

bergr7 commented Oct 7, 2024

codecov bot commented Oct 7, 2024

sariola commented Oct 7, 2024 •

edited

Loading

sariola commented Oct 7, 2024 •

edited

Loading

bergr7 commented Oct 8, 2024

bergr7 commented Oct 8, 2024

bergr7 commented Oct 8, 2024 •

edited

Loading

sariola commented Oct 8, 2024

bergr7 left a comment

bergr7 left a comment

sariola commented Oct 8, 2024 •

edited

Loading

Feat/llamafile: adding llamafile as engine & ModelFactory mechanism rewrite suggestion & haystack parsing/write enchancements #10

Feat/llamafile: adding llamafile as engine & ModelFactory mechanism rewrite suggestion & haystack parsing/write enchancements #10

Conversation

sariola commented Oct 6, 2024 • edited Loading

bergr7 commented Oct 7, 2024

bergr7 commented Oct 7, 2024

sariola commented Oct 7, 2024

bergr7 commented Oct 7, 2024

codecov bot commented Oct 7, 2024

Welcome to Codecov 🎉

sariola commented Oct 7, 2024 • edited Loading

Additional

sariola commented Oct 7, 2024 • edited Loading

bergr7 commented Oct 8, 2024

bergr7 commented Oct 8, 2024

bergr7 commented Oct 8, 2024 • edited Loading

sariola commented Oct 8, 2024

bergr7 left a comment

Choose a reason for hiding this comment

bergr7 left a comment

Choose a reason for hiding this comment

sariola commented Oct 8, 2024 • edited Loading

sariola commented Oct 6, 2024 •

edited

Loading

sariola commented Oct 7, 2024 •

edited

Loading

sariola commented Oct 7, 2024 •

edited

Loading

bergr7 commented Oct 8, 2024 •

edited

Loading

sariola commented Oct 8, 2024 •

edited

Loading