Datasets:
training.csv
: based on the MED dataset and HELP dataset, converted to fit our schema (seesarn/convert/med.py
andsarn/convert/help.py
) and then concatenatedevaluation.csv
: based on the SuperGlue diagnostics dataset filtered by the Logic categories "Quantification" and "Monotonicity" and the FraCaS problem set filtered by the category "1 GENERALIZED QUANTIFIERS", converted to fit our schema (seesarn/convert/superglue.py
andsarn/convert/fracas.py
) and then concatenatedtraining-adj.csv
: based on the MED dataset, adjectives that occur in premise and hypothesis are replaced with their WordNet opposites once on both sides, only one type of adjective is replaced at a time, and only on one side at a time, thus generating multipl output variants of the same input pair (seesarn/convert/med_adjectives.py
andsarn/adjectives.py
). As we were not able to automatically generate labels using MonaLog or ccg2lambda (both would always return "unknown", no matter which sequence pair we provided), we had to label the entire dataset by hand. Therefore, we decided to reduce the size of the dataset to 1200 rows. Additionally, we took this opportunity to correct some entries by hand where it makes sense, e.g., to get an entailment from an opposite adjective instead of a contradiction. We also added the six examples from FraCas 5.3 "Opposites" (seesarn/convert/fracas_adjectives.py
).evaluation-adj.csv
: based on our evaluation datasetevaluation.csv
, adjectives are replaced with WordNet, results labelled by hand, so same procedure as fortraining-adj.csv
(seesarn/convert/evaluation_adjectives.py
andsarn/adjectives.py
), although we didn't have to reduce the dataset size because it only contained 144 rows.