Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OUTDATED
More user-centric approach to managing evaluation rubrics within the flow-judge library
(Draft PR: edited description last time 12:32 Finnish time / Thursday 10th Oct)
Description
New feature for requesting rubrics and example rubrics added. The changes aim to improve the usability of the flow-judge library by providing a more structured process for creating and loading evaluation rubrics.
Key Changes: (code snippets to follow)
New Rubric Request Functionality:
CLI Integration:
Example Rubrics:
Documentation:
New rubrics/README.md explaining the rubric structure and request process
Testing:
Project Configuration:
Implementation Details:
The rubrics.py file introduces key functions such as load_rubric_from_yaml, load_rubric_templates, create_metric_from_template, and request_rubric. These functions facilitate the creation and management of rubrics within the flow-judge ecosystem.
The CLI integration in cli.py provides a user-friendly interface for creating rubric requests directly from the command line, enhancing the library's accessibility.
Example rubrics in YAML format serve as templates and references for users, covering various evaluation aspects like clarity, completeness, objectivity, and source attribution for articles, as well as sub-query coverage for query decomposition.
Impact:
This feature set enhances the flow-judge library by:
These changes make the library more flexible and user-friendly, especially for those looking to create custom evaluation criteria for their language model applications.
Still very much work in progress and in flux
Potential next steps:
Finalize and polish the CLI and Jupyter Notebook interfaces
Expand the test coverage for the new functionality
Update the main documentation to include information about the new rubric request feature
Consider creating more example rubrics to cover a wider range of use cases