Visual-Bert-Applicaton

AIC 24-25 Short term Project

Readme for VQA-RAD, a dataset of visual questions and answers in radiology

GENERAL INFORMATION

Title of dataset: VQA-RAD
Dina Demner-Fushman (ddemner@mail.nih.gov)
Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA

2.0 DATA AND FILE OVERVIEW

Link: Lau, Jason. Open Science Framework. doi: https://osf.io/89kps/?view_only=521f76b347b146ccbe85ee24396849c8 (2018)

VQA_RAD Dataset.json	VQA-RAD full dataset of question and answers referencing images. File formats include XML, JSON, and EXCEL. Additional metadata includes categories and labels, see DATA SPECIFIC INFORMATION
VQA_RAD Dataset.xml
VQA_RAD Dataset.xlsx
VQA_RAD Image folder	Folder of 315 radiological images referenced from question and answers. Images are varying sizes, all are .jpeg.

3.0 DATA SPECIFIC INFORMATION

VQA_RAD Dataset 2018_06_011 for JSON, XML, and Excel formats

Number of variables	14
Numbers of rows	2248

Variable	Description	Section
Image_name	Name of image to “VQA_RAD Images” file	4.1
Image_case_url	Image link to MedPix® case which includes original image, caption, and other contextual information	4.1
Image_organ	Type of image organ system e.g. Head, Chest, Abdomen	4.1

question	Visual question about image	4.2
Qid	Unique identifier for all free-form and paraphrased questions
Phrase_type	Whether question is original free-form question or rephrased from another question Freeform = original question Para = rephrased from another question Test_freeform = original question used for test data Test_paraphrase = rephrased questions of the test_freeform	4.2
Question_type	Type of question: MODALITY PLANE ORGAN (Organ System) ABN (Abnormality) PRES (Object/Condition Presence) POS (Positional Reasoning) COLOR SIZE ATTRIB (Attribute Other) COUNT (Counting) Other	4.3

Answer	Answer to the question	4.2
Answer_type	type of answer, e.g. closed-ended, open-ended	4.4

Evaluation	Whether question-answer pair was clinically evaluated by a 2nd clinician, e.g evaluated = two clinical annotators reviewed image and QA pair, not evaluated = one clinical annotator	4.5
Question_relation	Relationship between linked question-answer pairs e.g. Strict agreement Loose agreement Inversion Conversion Subsumption Not similar	4.5
Qid_linked_id	Unique identifier for every pair of free-form and paraphrased questions that can be used to link original and rephrasing
Question_rephrase	Rephrasing of ‘question’, can be freeform or para, linked through qid_linked_id	4.2
Question_frame	Rephrasing of ‘question’ following a templated structure	4.2

VQA_RAD Image folder

Number of images	315
Format	.jpeg

4.0 METHODOLOGICAL INFORMATION

4.1 IMAGE SELECTION
We sampled images from teaching cases in MedPix, https://medpix.nlm.nih.gov/, an open-access database of radiology images and teaching cases. Our sampling criteria were as follows: (1) Only one image for each teaching case, so that all images represented unique patients. (2) All images are sharp enough to identify individual structures. (3) Images are clean of radiology markings, such as arrows or circles. (4) Images have captions that correspond to the image and are detailed enough to describe at least one structure. Captions include plane, modality, and image findings that were generated and reviewed by expert radiologists. In total, we selected 104 head axial CTs or MRIs, 107 chest x-rays, and 104 abdominal axial CTs. The balanced distribution from head, chest, and abdomen should help determine if visual questions differ for each organ system and if the algorithms perform differently on different regions.

4.2 QUESTION AND ANSWER GENERATION

Questions and answers were generated by 15 volunteer clinical trainees using a web-interface developed for collecting the questions and the answers. All participants had completed the core rotations of medical school, which typically occurs during the 3rd year of school and exposes students to major fields of medicine such as surgery, internal medicine, neurology, etc. This ensures that all participants have basic clinical radiology reading skills and were exposed to a variety of settings where radiology was vital to the management of patients. Our participants had training from different regions of the U.S. and have interests in different specialties including radiology, orthopedics, family medicine, etc.
Participants generated questions and answers in a two-part evaluation (shown in Figure 1) from December 2017 to April 2018. Each participant reviewed at least 40 randomized images. For the first 20 images, participants provided “free-form” questions and answers without any restrictions. We instructed participants to create “free-form” question about the images by phrasing them in a natural way as if they are asking a colleague or another physician. The image alone had to be sufficient to answer the question and there should only be a single correct answer. We asked that answers to the visual questions be based off their level of knowledge. Since many of the participants were still in medical training, we provided captions with some image findings, plane, and modality information to provide additional ground truth reassurance.
For next 20 images, participants were randomly paired and given another participant’s images and questions. They were asked to generate “paraphrased” and “framed” questions based off the given “free-form” questions with corresponding image and caption. We asked the participants to paraphrase the question in a natural way and generate an answer that agreed with both the original and the paraphrased questions.
Participants generated “framed” questions by finding the closest question structure from a list of templates and filling in the blank spaces to retain the answer to the original questions.

4.3 QUESTION TYPES

Question Type	Description
Modality	How an image is taken – CT, x-ray, T2 weighted MRI, etc.
Plane	Orientation of an image slicing through the body – axial, sagittal, coronal
Organ System	Categorization that connects anatomical structures with pathophysiology, diagnosis, and treatment – pulmonary, cardiac, musculoskeletal system
Abnormality	Normalcy of an image or object. For example, “is there something wrong with the image?” or “What is abnormal about the lung?”, “Does the liver look normal?”
Object/Condition Presence	Objects could be normal structures like organs or body parts but could also be abnormal objects such as masses or lesions. Clinicians may refer to the presence of conditions in an image or patient – fractures, midline shift, infarction
Positional reasoning	position or location of an object or organ, including what side of a patient, in respect to the image borders, or relative to other objects in the image
Color	signal intensity including enhancement or opaqueness
Size	measurement of size of an object, e.g., enlargement, atrophy
Attribute Other	other types of description questions
Counting	focusing on a quantity of objects, e.g., number of lesions
Other	catch-all categorization for questions that do not fall into the previous categories

We identify three categories, modality, plane, and organ system questions, that contribute to baseline knowledge for every radiological image. Modality questions, which refer to how an image is taken (CT with contrast, x-ray, T2 weighted MRI, or etc), help give context for identifying white and black structures. Active bleeding can appear as a white mass on a CT scan while only a dull grey on MRI. Plane questions, which refer to the orientation of the image slicing through the body, helps to understand anatomical structures. Organ system is a subjective categorization that depends on the clinical context. However, it is an important concept frequently taught to all clinicians to connect pieces of anatomy with pathophysiology, diagnoses, and treatment. An image of a chest can contain multiple organ systems such as pulmonary, cardiac, gastrointestinal, or orthopedic systems. These boundaries help decision making, for example, a mass near the heart could represent a pneumonia that does not directly affect the heart itself despite the proximity. We consider these three categorizations as baseline question types because most clinicians, regardless of experience, have some understanding while the public knowledge may not.
Abnormality questions ask about the normalcy of an image or object. For example, “is there something wrong with the image?” or “What is abnormal about the lung?”, “Does the liver look
normal?”
Presence questions represent questions focusing on the presence of an object or condition. Objects could be normal structures like organs or body parts but could also be abnormal objects such as masses or lesions. Clinicians may refer to the presence of conditions in an image or patient; however, they are more difficult to delineate. For example, ‘fracture’ is condition that can happen to a bone which one might say “a fractured femur”, “a fracture in the femur”, or “the femur has a fracture”. All of these linguistic variations are still referring to the presence of a fracture in the image. Other examples of common conditions include ‘midline shift’, ‘pneumothorax’, and ‘infarction’.
Positional reasoning questions focus on the position or location of an object or organ, including what side of a patient. Easily labeled questions are “Where is the mass?” or “Is the lesion located on the left?”. Some difficulty in typing arose between positional and presence questions. The question, “Is the mass in the left lung?” is both asking the presence of a mass and the position of the mass on the left. In this dataset, we chose to label such questions as presence questions rather than positional though ideally these questions would have both labels.
Attribute questions include color, size, and other attribute which are questions that focus on the description of an object rather than its position or presence. Since most radiological images are grey, Color questions refer to the signal intensity including enhancement or opaqueness. Size questions are ones that need a measurement of size of an object to answer it. These included words like enlargement, atrophy, dilation. These categories are important to separate because unique tools may be needed to answer these questions, such as size questions needing a reference of normal size or color questions needing to normalize the signal intensities. Since there may be other common attributes size and color, the Attribute Other question type is tagged for other types of description questions.
Counting questions are fairly straight forward to label. Any question focusing on a quantity of objects. Care is taken to ensure that answers are only based on a single image. Some radiological captions may describe a series of images showing multiple slices of the body, but we limit this dataset to what can be answered with one image.
Other question type is a catch-all categorization for questions that do not fall into the previous categories. Examples include questions that require epidemiological knowledge or next step treatments.

4.4 ANSWER TYPES

Answer Type	Description
Close-ended	yes/no and other limited choices. For example, “Is the mass on the left or right?”
Open-ended	Do not have a limited question structure and could have multiple correct answers

Answer types are labeled after the evaluation completion. Closed-ended and open-ended answers are the only categories we used. Closed-ended answer include yes/no and other limited choices. For example, “Is the mass on the left or right?” is a closed-ended structure. Open-ended answers did not have a limited question structure and could have multiple correct answers.

4.5 QUESTION ANSWER VALIDATION
After completion of the evaluations, we used several methods to validate questions answer pairs and question types. During the paraphrasing part of the evaluation, participants answered another person’s questions. The answers could have strict or loose agreement. We defined strict agreement when the question and answer format and topic were the same. In loose agreement, the topic of the questions is the same or similar even though the answers may differ. Three subcategories of loose agreement are defined: inversion, conversion, and subsuming.
Examples of each as follows:
Inversion: Q1:”Are there abnormal findings in the lower lung fields?” is a negation of Q2:” “Are the lower lung fields normal?”
Conversion: Q1:“How would you describe the abnormalities?” is open-ended while Q2:“Are the lesions ring-enhancing?” is closed-ended
Subsumption: Q1: “Is the heart seen in the image?” subsumes Q2:“is the heart seen on the left?”
Questions are considered ‘evaluated’ when they were reviewed by two annotators. Disagreements in answers are resolved by research team consensus and expert radiologist review. Questions are labeled as ‘not evaluated’ if they are not reviewed by a second participant or the paraphrased question is not similar enough to be used as validation. Both the evaluated and not evaluated questions are used as part of the test and training set.
We validated question types assigned by the participants. Final categorization was determined through consensus with the research team to resolve disagreements.

5.0 TRAINING AND TEST SET
To demonstrate a use case of VQA-RAD, we created a training and test set by randomly sampling the free-form questions and then matching the corresponding paraphrased questions. The resultant test set is composed of 300 randomly chosen free-form questions and 151 corresponding paraphrased questions. We used the remainder of the free-form and paraphrased questions as the training set.
Other training and test sets can be created using the free-form, paraphrasing, and framed questions. Since these question sets can share a single answer, we recommend isolating a phrase type (phrase_type), randomly selecting the proportion of questions, and finding matched questions using the qid_linked_id and the question_frame variables. This method can limit bias that may occur if, for example, a free-form question is used in the training and the paired paraphrased question is in the test set.

6.0 DATASET CHARACTERISTICS

Figure 2. Closed vs Open Ended Questions and Breakdown of different types (free form questions only) Certain questions types more likely to be open-ended: positional, counting questions and other.

Figure 3. Question type per Image Organ type (free form questions only). Most HEAD questions about color/signal intensity. Most CHEST questions about size. Fewer positional questions about the ABDOMEN than other image organs.

DISTINCT WORD DISTRIBUTION AND FREQUENCY (free-form questions only)
Tables for each question type and answer type showing total number of questions, median number of words per question, total number of words, and number of distinct words. Distinct words determined by tokenizing sentences and making all words lowercase. Also shown are top 10 most common words with percent frequency word appears for category. Stop words removed and bold words demonstrate words only appearing in the 10 top of question type (i.e. MRI, weighted, and IV only appear in Modality Questions).

	MODALITY				PLANE				ORGAN
	closed		open		closed	* *	open		closed		open
#questions	79		67		55		47		15		36
median question length (#words)	6		6		6		6		7		7.5
#words total	487		439		342		296		103		265
#distinct words	94		44		55		40		38		66
1	CONTRAST	48.1%	IMAGE	68.7%	IMAGE	40.0%	PLANE	76.6%	IMAGE	60.0%	ORGAN	66.7%
2	IMAGE	36.7%	MODALITY	52.2%	PLANE	40.0%	IMAGE	63.8%	SYSTEM	46.7%	SYSTEM	50.0%
3	CT	31.6%	TYPE	50.7%	AXIAL	36.4%	TAKEN	46.8%	BRAIN	33.3%	IMAGE	47.2%
4	MRI	19.0%	IMAGING	28.4%	PA	23.6%	WHICH	34.0%	CHEST	20.0%	IMAGED	33.3%
5	WEIGHTED	11.4%	TAKEN	16.4%	FILM	16.4%	ABOVE	12.8%	DISPLAY	13.3%	PART	11.1%
6	IV	11.4%	MRI	13.4%	TAKEN	12.7%	ACQUIRED	6.4%	PATHOLOGY	13.3%	ABOVE	8.3%
7	GIVEN	10.1%	CONTRAST	7.5%	AP	12.7%	BODY	4.3%	PULMONARY	13.3%	BODY	8.3%
8	PATIENT	7.6%	KIND	7.5%	CORONAL	10.9%	CUT	4.3%	STUDY	13.3%	EVALUATED	8.3%
9	SCAN	7.6%	ABOVE	7.5%	BRAIN	9.1%	FILM	4.3%	ABDOMEN	6.7%	PRIMARILY	8.3%
10	TAKEN	7.6%	ACQUIRE	6.0%	SAGGITAL	7.3%	WHERE	4.3%	CARDIOVASCULAR	6.7%	SHOWN	8.3%

	ABNORMALITY				PRESENCE				POSITION
	closed		open		closed		open		closed		open
#questions	78		32		379		104		19		154
median question length (#words)	5		6		6		8		9		7
#words total	441		204		2428		899		185		1032
#distinct words	112		69		395		246		69		202
1	NORMAL	52.6%	PATHOLOGY	34.4%	PRESENT	15.0%	IMAGE	26.9%	LEFT	31.6%	WHERE	32.5%
2	IMAGE	29.5%	IMAGE	28.1%	IMAGE	10.6%	LEFT	10.6%	LOCATED	31.6%	WHICH	27.9%
3	ABNORMAL	23.1%	ABNORMAL	9.4%	EVIDENCE	9.5%	RIGHT	9.6%	LUNG	26.3%	LOCATED	20.8%
4	LIVER	20.5%	ABNORMALITY	9.4%	AIR	6.3%	ORGAN	7.7%	OPACITIES	21.1%	LESION	16.9%
5	ABNORMALITIES	9.0%	INVOLVED	9.4%	MASS	5.8%	MASS	5.8%	RIGHT	21.1%	MASS	13.6%
6	FINDINGS	7.7%	LUNG	9.4%	FRACTURE	5.3%	SIDE	5.8%	SIDE	21.1%	SIDE	11.7%
7	AIR	6.4%	ABNORMALITIES	6.3%	LEFT	5.3%	ANTERIOR	4.8%	CONTRAST	15.8%	IMAGE	9.7%
8	LUNGS	6.4%	HAPPENING	6.3%	PNEUMOTHORAX	4.7%	BRAIN	4.8%	LESION	15.8%	ABNORMALITY	9.1%
9	ABNORMALITY	5.1%	LESION	6.3%	RIGHT	4.7%	BRIGHT	4.8%	AORTA	10.5%	BRAIN	9.1%
10	BRAIN	5.1%	PANCREAS	6.3%	BOWEL	4.2%	HYPERDENSITIES	4.8%	BOWELS	10.5%	LOBE	4.5%

	COLOR				SIZE				ATTRIBUTE (OTHER)
	closed		open		closed		open		closed		open
#questions	25		7		91		10		29		17
median question length (#words)	6		7		5		7		6		6
#words total	171		56		502		67		177		108
#distinct words	69		26		48		22		71		41
1	LESION	24.0%	INTENSITY	42.9%	ENLARGED	28.6%	MASS	70.0%	MASS	34.5%	DESCRIBE	58.8%
2	MASS	20.0%	ABNORMALITY	28.6%	HEART	25.3%	LESION	40.0%	LESION	20.7%	LESION	29.4%
3	ENHANCING	16.0%	DENSITY	28.6%	NORMAL	11.0%	SIZE	40.0%	CYSTIC	13.8%	ABNORMAL	11.8%
4	HYPER	16.0%	DESCRIBE	28.6%	SIZE	9.9%	LARGE	30.0%	ENHANCING	10.3%	IMAGE	11.8%
5	MORE	16.0%	LESION	28.6%	DILATED	8.8%	BIG	20.0%	HOMOGENEOUS	10.3%	MASS	11.8%
6	THAN	16.0%	SIGNAL	28.6%	CARDIAC	7.7%	CM	10.0%	RING	10.3%	ABNORMALITIES	5.9%
7	ABNORMALITY	12.0%	AREA	14.3%	AORTA	6.6%	DENSITY	10.0%	CIRCUMSCRIBED	6.9%	ADJECTIVE	5.9%
8	ATTENUATED	12.0%	BLACK	14.3%	CARDIOMEGALY	6.6%	DESCRIBE	10.0%	CONTOUR	6.9%	APPENDIX	5.9%
9	CONTRAST	12.0%	CENTRAL	14.3%	ENLARGEMENT	6.6%	LOCATED	10.0%	FLATTENED	6.9%	ARTERY	5.9%
10	DENSE	12.0%	COLOR	14.3%	SILHOUETTE	6.6%	QUADRANT	10.0%	HEMIDIAPHRAGMS	6.9%	BORDERS	5.9%

	COUNT				OTHER
	closed		open		closed		open
#questions	6		9		34		51
median question length (#words)	10.5		7		8		9
#words total	57		63		274		485
#distinct words	41		23		130		205
1	JUST	33.3%	MANY	100.0%	PATIENT	29.4%	IMAGE	19.6%
2	MORE	33.3%	IMAGE	55.6%	IMAGE	23.5%	PATHOLOGY	11.8%
3	MULTIPLE	33.3%	MASSES	33.3%	INJURY	11.8%	PATIENT	11.8%
4	ONE	33.3%	KIDNEYS	22.2%	DIAGNOSIS	8.8%	LEFT	9.8%
5	THAN	33.3%	LESIONS	22.2%	HEART	8.8%	SUGGEST	9.8%
6	1	16.7%	ENHANCING	11.1%	LYING	8.8%	WHY	9.8%
7	2	16.7%	FOUND	11.1%	MASS	8.8%	LIKELY	7.8%
8	5	16.7%	GALLSTONES	11.1%	PROCESS	8.8%	CXR	5.9%
9	8	16.7%	IDENTIFIED	11.1%	SUPINE	8.8%	MASS	5.9%
10	>1	16.7%	INSTANCES	11.1%	SUSPECT	8.8%	MOST	5.9%

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
VQA_RAD Image Folder		VQA_RAD Image Folder
mlruns/0		mlruns/0
vilt-vqa-rad-finetuned/runs		vilt-vqa-rad-finetuned/runs
.gitignore		.gitignore
README.md		README.md
Readme.docx		Readme.docx
VQA_RAD Dataset Public.json		VQA_RAD Dataset Public.json
VQA_RAD Dataset Public.xlsx		VQA_RAD Dataset Public.xlsx
VQA_RAD Dataset Public.xml		VQA_RAD Dataset Public.xml
deployment.py		deployment.py
model.ipynb		model.ipynb
original_ViLT_model_train.ipynb		original_ViLT_model_train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual-Bert-Applicaton

About

Releases

Packages

Contributors 2

Languages

aiclubvitbhopal/VQA_RAD_Short_Term_Project-

Folders and files

Latest commit

History

Repository files navigation

Visual-Bert-Applicaton

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages