Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17

sariola · 2024-10-10T09:38:39Z

OUTDATED

More user-centric approach to managing evaluation rubrics within the flow-judge library

(Draft PR: edited description last time 12:32 Finnish time / Thursday 10th Oct)

Description
New feature for requesting rubrics and example rubrics added. The changes aim to improve the usability of the flow-judge library by providing a more structured process for creating and loading evaluation rubrics.

Key Changes: (code snippets to follow)

New Rubric Request Functionality:

Implemented in flow_judge/utils/rubrics.py
Allows users to create new rubric requests easily
- Supports both CLI and Jupyter Notebook interfaces

CLI Integration:

New file: flow_judge/utils/cli.py
Adds a command-line interface for creating rubric requests, benefits from flow-judge being distributed as a package now

Example Rubrics:

Added first YAML files in the rubrics/ directory
Includes rubrics for article evaluation and query decomposition

Documentation:
New rubrics/README.md explaining the rubric structure and request process

Testing:

New test file: tests/test_rubric_functionality.py

Project Configuration:

Updated pyproject.toml to include new dependencies and CLI entry point

Implementation Details:

The rubrics.py file introduces key functions such as load_rubric_from_yaml, load_rubric_templates, create_metric_from_template, and request_rubric. These functions facilitate the creation and management of rubrics within the flow-judge ecosystem.

The CLI integration in cli.py provides a user-friendly interface for creating rubric requests directly from the command line, enhancing the library's accessibility.

Example rubrics in YAML format serve as templates and references for users, covering various evaluation aspects like clarity, completeness, objectivity, and source attribution for articles, as well as sub-query coverage for query decomposition.
Impact:

This feature set enhances the flow-judge library by:

Simplifying the process of creating custom evaluation rubrics
Providing a standardized format for rubrics
Offering example rubrics as starting points or references
Enabling both programmatic and interactive ways to request new rubrics

These changes make the library more flexible and user-friendly, especially for those looking to create custom evaluation criteria for their language model applications.

Still very much work in progress and in flux

Potential next steps:

Finalize and polish the CLI and Jupyter Notebook interfaces
Expand the test coverage for the new functionality
Update the main documentation to include information about the new rubric request feature
Consider creating more example rubrics to cover a wider range of use cases

…ting

codecov · 2024-10-10T09:41:43Z

❌ 7 Tests Failed:

Tests completed	Failed	Passed	Skipped
7	7	0	0

View the top 3 failed tests by shortest run time

 tests.integrations.test_llama_index_e2e

Stack Traces | 0s run time

No failure message available

 tests.models.test_llamafile_unit

Stack Traces | 0s run time

No failure message available

 tests.test_utils
Stack Traces | 0s run time
No failure message available

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

…cate formatting

feat: (experimental) request rubric and use rubrics from already exis…

768fe28

…ting

feat: create dir with _data at metrics/_data/{metrics,prompts} | colo…

7531014

…cate formatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17

Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17

sariola commented Oct 10, 2024 •

edited

Loading

codecov bot commented Oct 10, 2024 •

edited

Loading

Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17

Are you sure you want to change the base?

Feat: (experimental) More user-centric approach to managing evaluation rubrics within the flow-judge library #17

Conversation

sariola commented Oct 10, 2024 • edited Loading

OUTDATED

More user-centric approach to managing evaluation rubrics within the flow-judge library

Key Changes: (code snippets to follow)

Implementation Details:

Potential next steps:

codecov bot commented Oct 10, 2024 • edited Loading

❌ 7 Tests Failed:

sariola commented Oct 10, 2024 •

edited

Loading

codecov bot commented Oct 10, 2024 •

edited

Loading