Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dev UI] Evaluators - default eval for deep equal #1565

Open
DenisVCode opened this issue Dec 20, 2024 · 0 comments
Open

[Dev UI] Evaluators - default eval for deep equal #1565

DenisVCode opened this issue Dec 20, 2024 · 0 comments
Labels

Comments

@DenisVCode
Copy link
Contributor

Overview

Genkit provides number of already created evaluators. I think deep equal evaluator should be one of them for comparing outputs of structured responses.

Related Discord thread: https://discord.com/channels/1255578482214305893/1315783915964993730/1319736165800218734

I will probably create custom evaluator because I don't want to wait for this, so when I am done I would be happy to send a PR if that is welcome.

User goal(s)

This is useful for using evaluation to check if upgrading model or changing prompt did not break previously known good generations, for cases like text extraction, where the output should always be the same.

Requirements

Acceptance Criteria

    1. Deep equal of reference and structured output
    1. Returns pass/fail

Nice to have:

  • Diff display
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

1 participant