We consider two settings to predict the answerability of a question: given a passage (analogous to the passage-level setup) and given a set of passages as input (analogous to the ranking-level setup). We prompt the model to verify whether the question is answerable in the provided passage(s) and return 0 or 1 accordingly.
In the passage-level setup, the passage-level predictions returned by ChatGPT are aggregated using fixed thresholds (0.33 or 0.66) to obtain a ranking-level prediction. To evaluate the answerability predictions generated by ChatGPT on passage level, run the following command:
python -m answerability_prediction.chatgpt.answerability_prediction --zero_shot --openai_organization {OpenAI_organization_ID} --openai_api_key {OpenAI_API_key} --passage_level
Note: OpenAI_organization_ID and OpenAI_API_key can be obtained by logging in to OpenAI Platform.
In the passage-level answerability prediction, the data is generated only in zero-shot setting.
Zero-shot prompt:
{
"role": "system",
"content": "You are an assistant verifying whether the question is answerable in the provided passage. Return 1 if the answer or partial answer to the question is provided in the passage and 0 otherwise. Return only a number without explanation.",
}
In the ranking-level setup, we experiment with both a zero-shot setting, where neither examples nor context is given to the model, and a two-shot setting, where two examples (one positive and one negative) containing a question followed by two sentences extracted from the passage are provided.
Two-shot prompt:
{
"role": "system",
"content": "You are an assistant verifying whether the answer to the question is included in the provided text. Return 0 if the answer is not given in the text or 1 if the text containts answer to the question. Return only a number without explanaition.",
},
{
"role": "user",
"content": "Question: Why does waste compaction slow the biodegradation of organic waste? Text: Introduction. It is illegal to burn household or garden waste at home or in your garden. Burning waste is not only a nuisance to neighbours, it can release many harmful chemicals into the air you breathe.",
},
{"role": "assistant", "content": "0"},
{
"role": "user",
"content": "Question: I remember Glasgow hosting COP26 last year, but unfortunately I was out of the loop. What was the conference about? Text: The 2021 United Nations Climate Change Conference, also known as COP26, is the 26th United Nations Climate Change conference. This conference will be the most important intergovernmental meeting on the climate crisis since the Paris agreement was passed in 2015. ",
},
{"role": "assistant", "content": "1"},
To evaluate the generated ChatGPT answerability predictions on ranking level, run the following command:
python -m answerability_prediction.chatgpt.answerability_prediction --zero_shot/--two_shot --openai_organization {OpenAI_organization_ID} --openai_api_key {OpenAI_API_key}
To regenerate the results with ChatGPT add --regenerate_chatgpt_results
option to the command.