v0.2.0
Major changes since 0.1.0:
- added blimp (#237)
- added qasper (#264)
- added asdiv (#244)
- added truthfulqa (#219)
- added gsm (#260)
- implemented description dict and deprecated provide_description (#226)
- new
--check_integrity
flag to run integrity unit tests at eval time (#290) - positional arguments to
evaluate
andsimple_evaluate
are now deprecated _CITATION
attribute on task modules (#292)- lots of bug fixes and task fixes (always remember to report task versions for comparability!)