You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have forked the repository and created custom tasks in the lm_eval -> tasks directory. All of them are MMLU-like tasks and are in the Kazakh language.
I am running evaluations using the following command:
The problem is that for the LLama 3 8B models, I get an average accuracy of around 0.3–0.4, which is correct. I have verified this by running each model manually with the generation part.
However, for the larger models like LLama instruct 3.3-70B, LLama 3.1-70B, gemma-2-27b-it and others, I keep getting an accuracy score of ~0.1 across all tasks.
What am I doing wrong? its
The text was updated successfully, but these errors were encountered:
I have forked the repository and created custom tasks in the lm_eval -> tasks directory. All of them are MMLU-like tasks and are in the Kazakh language.
I am running evaluations using the following command:
lm_eval --model hf --model_args pretrained=meta-llama/Meta-Llama-3-8B-Instruct --batch_size 2 --num_fewshot 0 --apply_chat_template --tasks mmlu_translated_kk,kazakh_and_literature_unt_mc,kk_biology_unt_mc,kk_constitution_mc,kk_dastur_mc,kk_english_unt_mc,kk_geography_unt_mc,kk_history_of_kazakhstan_unt_mc,kk_human_society_rights_unt_mc,kk_world_history_unt_mc --output output --device cuda:0
The problem is that for the LLama 3 8B models, I get an average accuracy of around 0.3–0.4, which is correct. I have verified this by running each model manually with the generation part.
However, for the larger models like LLama instruct 3.3-70B, LLama 3.1-70B, gemma-2-27b-it and others, I keep getting an accuracy score of ~0.1 across all tasks.
What am I doing wrong? its
The text was updated successfully, but these errors were encountered: