Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sariola
Copy link
Collaborator

@sariola sariola commented Oct 10, 2024

OUTDATED

More user-centric approach to managing evaluation rubrics within the flow-judge library

(Draft PR: edited description last time 12:32 Finnish time / Thursday 10th Oct)

Description
New feature for requesting rubrics and example rubrics added. The changes aim to improve the usability of the flow-judge library by providing a more structured process for creating and loading evaluation rubrics.

Key Changes: (code snippets to follow)

New Rubric Request Functionality:

  • Implemented in flow_judge/utils/rubrics.py
  • Allows users to create new rubric requests easily
    • Supports both CLI and Jupyter Notebook interfaces

CLI Integration:

  • New file: flow_judge/utils/cli.py
  • Adds a command-line interface for creating rubric requests, benefits from flow-judge being distributed as a package now

Example Rubrics:

  • Added first YAML files in the rubrics/ directory
  • Includes rubrics for article evaluation and query decomposition

Documentation:
New rubrics/README.md explaining the rubric structure and request process

Testing:

  • New test file: tests/test_rubric_functionality.py

Project Configuration:

  • Updated pyproject.toml to include new dependencies and CLI entry point

Implementation Details:

The rubrics.py file introduces key functions such as load_rubric_from_yaml, load_rubric_templates, create_metric_from_template, and request_rubric. These functions facilitate the creation and management of rubrics within the flow-judge ecosystem.

The CLI integration in cli.py provides a user-friendly interface for creating rubric requests directly from the command line, enhancing the library's accessibility.

Example rubrics in YAML format serve as templates and references for users, covering various evaluation aspects like clarity, completeness, objectivity, and source attribution for articles, as well as sub-query coverage for query decomposition.
Impact:

This feature set enhances the flow-judge library by:

  • Simplifying the process of creating custom evaluation rubrics
  • Providing a standardized format for rubrics
  • Offering example rubrics as starting points or references
  • Enabling both programmatic and interactive ways to request new rubrics

These changes make the library more flexible and user-friendly, especially for those looking to create custom evaluation criteria for their language model applications.

Still very much work in progress and in flux

Potential next steps:

Finalize and polish the CLI and Jupyter Notebook interfaces
Expand the test coverage for the new functionality
Update the main documentation to include information about the new rubric request feature
Consider creating more example rubrics to cover a wider range of use cases

Copy link

codecov bot commented Oct 10, 2024

❌ 7 Tests Failed:

Tests completed Failed Passed Skipped
7 7 0 0
View the top 3 failed tests by shortest run time
 tests.integrations.test_llama_index_e2e
Stack Traces | 0s run time
No failure message available
 tests.models.test_llamafile_unit
Stack Traces | 0s run time
No failure message available
 tests.test_utils
Stack Traces | 0s run time
No failure message available

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant