[Dev UI] Evaluators - default eval for deep equal #1565

DenisVCode · 2024-12-20T19:15:33Z

Overview

Genkit provides number of already created evaluators. I think deep equal evaluator should be one of them for comparing outputs of structured responses.

Related Discord thread: https://discord.com/channels/1255578482214305893/1315783915964993730/1319736165800218734

I will probably create custom evaluator because I don't want to wait for this, so when I am done I would be happy to send a PR if that is welcome.

User goal(s)

This is useful for using evaluation to check if upgrading model or changing prompt did not break previously known good generations, for cases like text extraction, where the output should always be the same.

Requirements

Acceptance Criteria

1. Deep equal of reference and structured output
1. Returns pass/fail

Nice to have:

Diff display

DenisVCode added the devui label Dec 20, 2024

github-project-automation bot added this to Genkit Backlog Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dev UI] Evaluators - default eval for deep equal #1565

[Dev UI] Evaluators - default eval for deep equal #1565

DenisVCode commented Dec 20, 2024

[Dev UI] Evaluators - default eval for deep equal #1565

[Dev UI] Evaluators - default eval for deep equal #1565

Comments

DenisVCode commented Dec 20, 2024

Overview

User goal(s)

Requirements

Acceptance Criteria