You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Genkit provides number of already created evaluators. I think deep equal evaluator should be one of them for comparing outputs of structured responses.
I will probably create custom evaluator because I don't want to wait for this, so when I am done I would be happy to send a PR if that is welcome.
User goal(s)
This is useful for using evaluation to check if upgrading model or changing prompt did not break previously known good generations, for cases like text extraction, where the output should always be the same.
Requirements
Acceptance Criteria
Deep equal of reference and structured output
Returns pass/fail
Nice to have:
Diff display
The text was updated successfully, but these errors were encountered:
Overview
Genkit provides number of already created evaluators. I think deep equal evaluator should be one of them for comparing outputs of structured responses.
Related Discord thread: https://discord.com/channels/1255578482214305893/1315783915964993730/1319736165800218734
I will probably create custom evaluator because I don't want to wait for this, so when I am done I would be happy to send a PR if that is welcome.
User goal(s)
This is useful for using evaluation to check if upgrading model or changing prompt did not break previously known good generations, for cases like text extraction, where the output should always be the same.
Requirements
Acceptance Criteria
Nice to have:
The text was updated successfully, but these errors were encountered: