minif2f-Isabella acc #30

wangzhihao-coder · 2024-08-12T05:20:03Z

I use the docker image from the PISA repository and the prediction file from output.zip of your repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc is about 10% compared to the result of 24.6%. I'd like to know what is the reason for this difference.

wyt2000 · 2024-09-02T08:24:13Z

I also tried to reproduce the same results as @wangzhihao-coder without using docker. When following the tutorial in PISA, I encountered a mismatch of the package version in SBT. After fixing it, I started the PISA server successfully. However, the evaluation results (miniF2F-Isabelle-test: 21.72, miniF2F-Isabelle-valid: 22.13) were also worse than those mentioned in the paper. Is there anyone who can help?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minif2f-Isabella acc #30

minif2f-Isabella acc #30

wangzhihao-coder commented Aug 12, 2024

wyt2000 commented Sep 2, 2024 •

edited

Loading

minif2f-Isabella acc #30

minif2f-Isabella acc #30

Comments

wangzhihao-coder commented Aug 12, 2024

wyt2000 commented Sep 2, 2024 • edited Loading

wyt2000 commented Sep 2, 2024 •

edited

Loading