Skip to content

Latest commit

 

History

History
49 lines (36 loc) · 3.17 KB

File metadata and controls

49 lines (36 loc) · 3.17 KB

Answerability score prediction with ChatGPT

We consider two settings to predict the answerability of a question: given a passage (analogous to the passage-level setup) and given a set of passages as input (analogous to the ranking-level setup). We prompt the model to verify whether the question is answerable in the provided passage(s) and return 0 or 1 accordingly.

Passage level

In the passage-level setup, the passage-level predictions returned by ChatGPT are aggregated using fixed thresholds (0.33 or 0.66) to obtain a ranking-level prediction. To evaluate the answerability predictions generated by ChatGPT on passage level, run the following command:

python -m answerability_prediction.chatgpt.answerability_prediction --zero_shot --openai_organization {OpenAI_organization_ID} --openai_api_key {OpenAI_API_key} --passage_level

Note: OpenAI_organization_ID and OpenAI_API_key can be obtained by logging in to OpenAI Platform.

In the passage-level answerability prediction, the data is generated only in zero-shot setting.

Zero-shot prompt:

{
    "role": "system",
    "content": "You are an assistant verifying whether the question is answerable in the provided passage. Return 1 if the answer or partial answer to the question is provided in the passage and 0 otherwise. Return only a number without explanation.",
}

Ranking level

In the ranking-level setup, we experiment with both a zero-shot setting, where neither examples nor context is given to the model, and a two-shot setting, where two examples (one positive and one negative) containing a question followed by two sentences extracted from the passage are provided.

Two-shot prompt:

{
    "role": "system",
    "content": "You are an assistant verifying whether the answer to the question is included in the provided text. Return 0 if the answer is not given in the text or 1 if the text containts answer to the question. Return only a number without explanaition.",
},
{
    "role": "user",
    "content": "Question: Why does waste compaction slow the biodegradation of organic waste? Text: Introduction. It is illegal to burn household or garden waste at home or in your garden. Burning waste is not only a nuisance to neighbours, it can release many harmful chemicals into the air you breathe.",
},
{"role": "assistant", "content": "0"},
{
    "role": "user",
    "content": "Question: I remember Glasgow hosting COP26 last year, but unfortunately I was out of the loop. What was the conference about? Text: The 2021 United Nations Climate Change Conference, also known as COP26, is the 26th United Nations Climate Change conference. This conference will be the most important intergovernmental meeting on the climate crisis since the Paris agreement was passed in 2015. ",
},
{"role": "assistant", "content": "1"},

To evaluate the generated ChatGPT answerability predictions on ranking level, run the following command:

python -m answerability_prediction.chatgpt.answerability_prediction --zero_shot/--two_shot --openai_organization {OpenAI_organization_ID} --openai_api_key {OpenAI_API_key}

To regenerate the results with ChatGPT add --regenerate_chatgpt_results option to the command.