You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many common LLM tasks can only be evaluated with judge metrics. Unitxt supports such metrics but lm-eval-harness does not due to the requirement of a second hosted model. We should implement support in this tool. Details to be hashed out.
The text was updated successfully, but these errors were encountered:
Many common LLM tasks can only be evaluated with judge metrics. Unitxt supports such metrics but lm-eval-harness does not due to the requirement of a second hosted model. We should implement support in this tool. Details to be hashed out.
The text was updated successfully, but these errors were encountered: