You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use the docker image from the PISA repository and the prediction file from output.zip of your repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc is about 10% compared to the result of 24.6%. I'd like to know what is the reason for this difference.
The text was updated successfully, but these errors were encountered:
I also tried to reproduce the same results as @wangzhihao-coder without using docker. When following the tutorial in PISA, I encountered a mismatch of the package version in SBT. After fixing it, I started the PISA server successfully. However, the evaluation results (miniF2F-Isabelle-test: 21.72, miniF2F-Isabelle-valid: 22.13) were also worse than those mentioned in the paper. Is there anyone who can help?
I use the docker image from the PISA repository and the prediction file from output.zip of your repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc is about 10% compared to the result of 24.6%. I'd like to know what is the reason for this difference.
The text was updated successfully, but these errors were encountered: