-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/llamafile: adding llamafile as engine & ModelFactory mechanism rewrite suggestion & haystack parsing/write enchancements #10
Conversation
REMINDER: We must update all notebooks and readme to accommodate the cc: @sariola |
Another thing that should be fixed with the ModelFactory replacement is that currently, the model factory forces you to have all the extras installed. No bueno |
Reverted haystack.py and flow-judge.py: Notebook works still normally like it should ✔️ Kept the parsing addition where score can be found inside or outside feedback tags. |
… classes, and can take sensible args
REMINDER - Update model cards on HF |
Python package workflow
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
Okay seems like so far the shortcut imports from init.py seem to be forcing to install all extras. Will need to fix this in the morning. For now just power through it with AdditionalAdded self-host github actions test runner in tsukuyomi (Hetzner) which we can have gpu enabled for E2E tests, runs right now python 3.10/3.11/3.12 and calculates code coverage & creates the batches for README. Pretty neat. Quite a bit of tests to create to close coverage gap but we could storm through that collaboratively I think. Registered also to TestPypi and we can use my Pypi account that we used for autoeval to publish to the index. |
Starting the review already |
This is great to include model tests as well, not just code tests. |
Great!! Completely untested on the async & llamaindex tutorial, I'll also get into that right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks very solid!
I added minor comments here and there.
I executed Haystack with success but the regex is resulting in many parsing errors. I haven't pushed the executed notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved ✅
Only a minor comment on them metadata file extension
Thanks I'll double check that and also I'll check the llamafile server lingering to make it dependent on the existence of the kernel process and create a trap exit for that to take care of edge cases. Edit: Checked it out and also I think now with 11514bd we have more robust server clean up as the server spawns as part of a child process group which stays tied to the existence of the object and that group is wiped out. |
Added:
Llamafile as engine
with-context
to spawn and clean the server process during usage on-demandgenerate
andbatch_generate
.batch_generate
we need to make changes inbatch_evaluate
at the higher abstraction level since this approach needs the server and pass it within the context, where in the previous implementation thebatch_evaluate
just calls generate repeatedly. We check for instance Llamafile and use the functionality of the class.ModelConfig can be used directly to instantiate an engine
ModelConfig
templates can be imported and altered conveniently in a notebookStandard usage,
with-context
spawns the server and automatically cleans it up.This also works without the context because both
generate
andbatch_generate
ensure a server exists and select the already running one. Additionally Llamafile class exposes useful methods for checking and controlling the server if you want to use it in a custom way.Creating a custom configured engine looks like this
This PR also implies the usage of OpenAI client for the Baseten sync and async functionality as it shares the same OpenAI API as does Llamafile and incurs no custom work to implement other than for the async batching to receive results via the proxy.
Additional todos to the already mentioned: