Skip to content

Commit

Permalink
Merge pull request #1060 from EleutherAI/fix-mmlu
Browse files Browse the repository at this point in the history
[Refactor] Fix fewshot cot mmlu descriptions
  • Loading branch information
haileyschoelkopf authored Dec 4, 2023
2 parents 7afae7b + 57e017f commit c9bbec6
Show file tree
Hide file tree
Showing 59 changed files with 61 additions and 61 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,23 +150,23 @@ It is on our roadmap to create task variants designed to enable models which do

### Other Frameworks

A number of other libraries contain scripts for calling the eval harness through their library. These include [GPT-NeoX](https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py), [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples/MoE/readme_evalharness.md), and [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/eval_harness.py).
A number of other libraries contain scripts for calling the eval harness through their library. These include [GPT-NeoX](https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py), [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples/MoE/readme_evalharness.md), and [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/eval_harness.py).

### Additional Features

If you have a Metal compatible Mac, you can run the eval harness using the MPS back-end by replacing `--device cuda:0` with `--device mps` (requires PyTorch version 2.1 or higher).

> [!Note]
> You can inspect what the LM inputs look like by running the following command:
>
>
> ```bash
> python write_out.py \
> --tasks all_tasks \
> --num_fewshot 5 \
> --num_examples 10 \
> --output_base_path /path/to/output/folder
> ```
>
>
> This will write out one text file for each task.
To verify the data integrity of the tasks you're performing in addition to running the tasks themselves, you can use the `--check_integrity` flag:
Expand Down
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
\ then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for\
\ x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2\
\ has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only\
\ if c = 1. The answer is (B)."
\ if c = 1. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_abstract_algebra"
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
\ of the hyoid bone; therefore, the embryological origin of the hyoid bone are the\
\ second and the third pharyngeal arches—this information is covered in the last\
\ option (D). Therefore, we conclude that (D) must be the correct answer. The answer\
\ is (D)."
\ is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_anatomy"
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
\ red. Options (C) and (D) are not specific enough about why the color of the surface\
\ would be red, while (A) is correct because it explains that the surface is red\
\ due to the rusted materials on the surface and the red color comes from the rust.\
\ So the correct option is (A). The answer is (A)."
\ So the correct option is (A). The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_astronomy"
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
\ that best uses the possible options above is “Beyond the business case for engaging\
\ the CSR there are a number of moral arguments relating to: negative *externalities*,\
\ the *power* that corporations possess and the *mutual independence* of business\
\ and society. The answer is (D)."
\ and society. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_business_ethics"
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
\ (D) oxidative phosphorylation.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. The energy for muscular contraction is\
\ provided by ATP (adenosine triphosphate), which is the powerhouse of the cell.\
\ The answer is (A)."
\ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_clinical_knowledge"
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
\ resemblance of structures that have different origins, which is not the case for\
\ the human and bird forearms, which rules out (D). Humans and birds do belong to\
\ the same clade - a group of organisms composed of a common ancestor. The answer\
\ is (C)."
\ is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_biology"
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
\ hyperfine interaction with the 13C (nuclear spin $I = \nrac{1}{2}$) which will\
\ split the spectrum into 2 lines. This will be further split into 4 lines by the\
\ interaction with three equivalent 1H nuclei. The total number of lines is therefore\
\ $2 \\cdot 4 = 8$. The answer is (E)."
\ $2 \\cdot 4 = 8$. The answer is (E).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_chemistry"
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
Thus we can see that on average a single processor will lock the bus for:\nlock_ns_per_miss\
\ * misses_per_instruction * instructions_per_ns =\n(1000 nanoseconds / cache miss)\
\ * (1 cache miss / 50 instructions) * (50 instructions / 27000 nanoseconds) = 1000\
\ * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer is (B)."
\ * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_computer_science"
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
\ $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$. Then, for all\
\ $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and so $s(t)=2+K e^{-t\
\ / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K e^{-100 / 25}=2+1\
\ \\cdot e^{-4}=2+e^{-4}$. The answer is (D)."
\ \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_mathematics"
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
\ monocarbylic acid transporters.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. Glucose (also known as the blood sugar) is the\
\ main sugar found in the human body. It is transported into the muscle cell via\
\ diffusion through protein transporters called GLUT4. The answer is (A)."
\ diffusion through protein transporters called GLUT4. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_medicine"
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
\ go into the gases internal energy or work done against an external force. However,\
\ if the volume of the gas container is constant, no work will be done (since work\
\ is pressure times change in volume). So, at constant volume, all of the heat goes\
\ into the internal energy. The answer is (B)."
\ into the internal energy. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_physics"
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
\ resulted from improper input validation (due to a missing bounds check) in the\
\ implementation of the TLS heartbeat extension. The vulnerability was classified\
\ as a buffer over-read, a situation where more data can be read than should be\
\ allowed. The answer is (C)."
\ allowed. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_computer_security"
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
\ speed in the direction of the wind is greater than it would be in the absence\
\ of wind, and its direction orthogonal to the wind is the same as it would be in\
\ the absence of the wind. The total speed, which is these two components added\
\ in quadrature, is thus greater than the speed in still air. The answer is (B)."
\ in quadrature, is thus greater than the speed in still air. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_conceptual_physics"
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
\ die away (B) Persist indefinitely (C) Grow exponentially (D) Never occur\nA: Let's\
\ think step by step. We refer to Wikipedia articles on econometrics for help. This\
\ is a formal logic problem about stationally process. For a stationary autoregressive\
\ process, shocks will eventually die away. The answer is (A)."
\ process, shocks will eventually die away. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_econometrics"
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
\ is 100. Find the total resistance\n(A) 200Ω (B) 100Ω (C) 50Ω (D) 10Ω\nA: Let's\
\ think step by step. In lap winding, effectively two resistors are connected in\
\ parallel, so the actual resistance of each pair is 1 Ohm. Since we have 50 pairs,\
\ we get a total resistance of 50 Ohms. The answer is (C)."
\ we get a total resistance of 50 Ohms. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_electrical_engineering"
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
\nQ: Which expression is equivalent to 5 x 9?\n(A) (5 x 4) x (6 x 5)\n(B) (5 x 5)\
\ + (5 x 4)\n(C) (5 x 5) + (5 x 9)\n(D) (5 x 9) x (6 x 9)\nA: Let's think step by\
\ step. We know that 9 = (5 + 4), so 5 x 9 = 5 x (5 + 4) = (5 x 5) + (5 x 4). The\
\ answer is (B)."
\ answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_elementary_mathematics"
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
\ (∀x)(Px ⊃ ~Dx) → For all x, x is on Mars implies that x do not drive on Mars.\n\
Option (D): ~Dp: → p do not drive on Mars.\nOf all these options, Option (C) appears\
\ to be the best and most meaningful interpretation of the argument “No people drive\
\ on Mars.” The answer is (C)."
\ on Mars.” The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_formal_logic"
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
\ of their nation or the world.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on global facts for help. As of 2019, most people tend to be optimistic\
\ about their own future but pessimistic about the future of their nation or the\
\ world. The answer is (B)."
\ world. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_global_facts"
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
\ proceed with cell division. Cues like these act by changing the activity of core\
\ cell cycle regulators inside the cell. The most common regulators are cyclins\
\ and cyclin-dependent kinases. Fibroblast cells do not play any role in cell division.\
\ The answer is (D)."
\ The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_biology"
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
\ (aq) + H_{2}O \nightarrow H_{3}O^{+} + CH3COO^{-}$. The conjugate base is therefore\
\ the acetate ion. The added strong acid, Nitric acid, will react with the conjugate\
\ base. Therefore the maximum amount of acid that can be added will be equal to\
\ the amount of acetate ion, or 2 moles. The answer is (C)."
\ the amount of acetate ion, or 2 moles. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_chemistry"
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
\ Choice C is incorrect because it incorrectly increments the variable count until\
\ its value is greater than 100, regardless of the elements in the list. Choice\
\ D is incorrect because its step 3 does not increment the value of position, so\
\ it will repeat forever. The answer is (B)."
\ it will repeat forever. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_computer_science"
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@
\ think step by step. We refer to Wikipedia articles on european history for help.\
\ Baron Montesquieu was a 18th centrury French philsopher who wrote extensively\
\ against the monoplization of power and advocated for a system of checks and balances\
\ in government to prevent the rise of despotism. The answer is (B)."
\ in government to prevent the rise of despotism. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_european_history"
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
\ the crude death rate. (C) doubling time from the crude birth rate. (D) fertility\
\ rate from the crude death rate.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on geography for help. The difference between number of births and deaths\
\ gives the population increase at any given time. The answer is (A)."
\ gives the population increase at any given time. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_geography"
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
\ have greatly increased presidential powers\nA: Let's think step by step. We refer\
\ to Wikipedia articles on government and politics for help. The US Constitution\
\ is not very specific about the powers of the president, leading to uncertainty\
\ over its limits. The answer is (A)."
\ over its limits. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_government_and_politics"
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
\ dozens of American cities.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on macroeconomics for help. The economic transactions related to the\
\ performance of the American pop-singer in Paris happens entirely outside the U.S.\
\ and hence is not included in the GDP numbers. The answer is (C)."
\ and hence is not included in the GDP numbers. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_macroeconomics"
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
\ (Assume that all three lights blink simultaneously at the very beginning of the\
\ dance.)\n(A) 3 (B) 15 (C) 6 (D) 5\nA: Let's think step by step. The least common\
\ multiple of 2, 3 and 5 is 30, so during a 7 minute dance, all the three lights\
\ will come on at the same time $2*7+1=15$ times. The answer is (B)."
\ will come on at the same time $2*7+1=15$ times. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_mathematics"
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
\ effect.\nA: Let's think step by step. We refer to Wikipedia articles on microeconomics\
\ for help. An increase in the construction of new houses means an increase demand\
\ of in-house painting, thus increases the demand for housepainters. The answer\
\ is (C)."
\ is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_microeconomics"
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
A: Let's think step by step. At the closed end of the pipe, the particles cannot\
\ have any net displacement because the pipe closure stops them. So the particle\
\ displacement is at a node. This closure also causes the pressure to be maximal,\
\ i.e. an antinode. The answer is (B)."
\ i.e. an antinode. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_physics"
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
\ internal locus of control.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. People with an external locus of control believes\
\ fate and luck play an important role in their lives, while people with an internal\
\ locus of control believes they control their lives. The answer is (D)."
\ locus of control believes they control their lives. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_psychology"
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
\ can occur even when the two variables are not perfectly correlated. Statement\
\ B is false because uncorrelated variables regression lines can have slope zero.\
\ Statement C is false because correlation is symmetric in the two random variables.\
\ The answer is (D)."
\ The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_statistics"
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@
\ on us history for help. Anti-Federalists do not believe centralized government\
\ power, and suspect Washington's military response to Whiskey Rebellion. Bacon's\
\ Rebellion and Pontiac's Rebellion happen before the Revolution and they can be\
\ ruled out. The answer is (C)."
\ ruled out. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_us_history"
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
\ most likely belong?\n(A) Hinduism (B) Buddhism (C) Shintoism (D) Zoroastrianism\n\
A: Let's think step by step. We refer to Wikipedia articles on world history for\
\ help. Brahman refers to the ultimate reality of all things in the Hindu religion.\
\ In contrast, Buddhism does not have a concept of supreme God. The answer is (A)."
\ In contrast, Buddhism does not have a concept of supreme God. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_world_history"
2 changes: 1 addition & 1 deletion lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
\ likely destination?\n(A) Texas (B) California (C) Hawaii (D) Vermont\nA: Let's\
\ think step by step. We refer to Wikipedia articles on human aging for help. Texas\
\ does not have state tax, and has low cost of living compared with the other three\
\ options. The answer is (A)."
\ options. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_human_aging"
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
\ during the third trimester (D) all through the pregnancy\nA: Let's think step\
\ by step. We refer to Wikipedia articles on human sexuality for help. Morning sickness\
\ usually begins by nine weeks after conception, corresponding to the first trimester.\
\ The answer is (A)."
\ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_human_sexuality"
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
\ affairs of States (D) Article 2(4) encompasses force directed only against a State's\
\ territorial integrity\nA: Let's think step by step. We refer to Wikipedia articles\
\ on international law for help. Article 2(4) of the UN Charter prohibits states\
\ from using armed forces in their international relations. The answer is (A)."
\ from using armed forces in their international relations. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_international_law"
Loading

0 comments on commit c9bbec6

Please sign in to comment.