Skip to content

Latest commit

 

History

History
200 lines (143 loc) · 10.6 KB

README.md

File metadata and controls

200 lines (143 loc) · 10.6 KB

iDiagnosis Flask Web App

Table Of Contents

Preview

Purpose

The purpose of this research is to build a classifier that can correctly distinguish between Pneumonia and Covid-19. Why lung diseases?

  • 100,000 Deaths per year due to the misdiagnosis of pneumonia. Wrongful diagnosis of pneumonia can be very life threatening given that it leads to an increase in severity due to lack of treatment. Especially in cases where the patient might have a more serious infection like COVID-19.

  • Pneumonia is the reason for 1 out of 6 childhood death making it the leading cause of fatality in kids under 5 years.

  • In the United States, the death rate of pneumonia is 10 out of every 100,000 individuals and this usually the rate in most developed countries. Meanwhile, in Africa, the death rate of pneumonia is 100 out of every 100,000 individuals and this is normal in most developing countries.

Data Augmentation

The data was imbalanced so I'll use ImageDataGenerator to create additional dataset to help our modeling training. This will allow the network to see more diversification withing the dataset without any reduction in how representative the dataset for each category is during training. I won’t do the same for the test dataset as I won’t want to tamper with the data that I’ll be validating with. My parameters here are;

  • shear_range=0.2
  • rotation_range=20
  • width_shift_range=0.2
  • height_shift_range=0.2
  • horizontal_flip=True
  • vertical_flip=False
  • zoom_range=0.2

The network used is VGG19 because it’s known for having pretty high accuracies for image classification problems so I have no doubt it would work perfectly for my problem. After importing my VGG19 model and set the appropriate weights for the type of images in the dataset and set the Include Top parameter to false. This will ensure that the last layer is drop and I did this because I don’t want to classify thousand different categories when my specific problem only has two categories. So, for this I skip the last layer. The first layer is also dropped since I can simply provide my own image size as I did.

After that, I inserted the images using flow. My parameters are; 32 images should be used for training at a given instance (batch size), my image size is 64 X 64.

Callback Parameters:

ModelCheckpoint

  • monitor = val_loss
  • mode = min
  • save_best_only = True
  • verbose = 1

EarlyStopping

  • monitor = val_loss
  • mode = min
  • save_best_only = True
  • verbose= 1

ReduceLROnPlateau

  • monitor = val_loss
  • patience = 30
  • verbose = 1
  • factor = 0.8
  • min_lr = 0.0001
  • mode = auto
  • min_delta = 0.0001
  • cooldown = 5

I go on and apply the same parameters I used for my training dataset to my test dataset and then I call my fit 100 epochs.

The accuracy is 99 % and this is the amount of time the predicted result is actually correct.

The recall percentage is 99% and this is the probability of the model diagnosing a correct positive diagnosis out of all the times it diagnosed positive. This would be the best metric in this case as we would rather give a wrong positive diagnosis than give a wrong negative diagnosis.

The model loss is 0.05 out and this is the amount the model penalizes for incorrect predictions.

The AUC score is 0.100 and this is the average probability that the model can diagnose each X-ray image correctly.

The accuracy is 94 % and this is the amount of time the predicted result is actually correct.

The recall percentage is 95% and this is the probability of the model diagnosing a correct positive diagnosis out of all the times it diagnosed positive. This would be the best metric in this case as we would rather give a wrong positive diagnosis than give a wrong negative diagnosis.

The model loss is 0.17 out and this is the amount the model penalizes for incorrect predictions.

The AUC score is 0.90 and this is the average probability that the model can diagnose each X-ray image correctly.

The Pneumonia model has a recall score of 100% for pneumonia, the covid model has a recall score of 93% for covid-19, and the pneumonia vs covid multi-classification model has a recall score of 100% for covid-19. They could be improved by trying different parameters but these scores are good enough as it is so Doctors and Radiologists are more than welcomed to integrate this models into their medical applications to help in the correct diagnosing of lung diseases, after thorough verification.

The model loss is 0.02 out and this is the amount the model penalizes for incorrect predictions.

The AUC score is 0.93 and this is the average probability that the model can diagnose each X-ray image correctly.

Recommendation

  • Use the vgg-19 model since it shows its 26% better at correctly diagnosing a covid case in the binary classification model and 15% better at correctly diagnosing a covid case in the multi-classification model

  • Add a dropout layer before the final dense layer to dropout half of the output from the prior dense layer using 512 nodes in order to reduce overfitting when using the VGG19 model.

Web App Features

  • User account sign up, sign in, password reset, all through asynchronous email confirmation.
  • Form generation.
  • Error handling.
  • HTML macros and layout file.
  • "Functional" file structure.
  • Python 3.7 compliant.
  • Asynchronous AJAX calls.
  • Administration panel.
  • Logging.
  • Stripe subscriptions. (WIP)
  • RESTful API for payments.
  • Simple RESTful API to communicate with your app.

Website Backend

Website Frontend

Web App Structure

Everything is contained in the app/ folder.

  • The models can be found in views/models folder
  • There you have the classic static/ and templates/ folders. The templates/ folder contains macros, error views and a common layout.
  • I added a views/ folder to separate the user and the website logic, which could be extended to the the admin views.
  • The same goes for the forms/ folder, as the project grows it will be useful to split the WTForms code into separate files.
  • The models.py script contains the SQLAlchemy code, for the while it only contains the logic for a users table.
  • The toolbox/ folder is a personal choice, in it I keep all the other code the application will need.
  • Management commands should be included in manage.py. Enter python manage.py -? to get a list of existing commands.
  • I added a Makefile for setup tasks.

Setup

  • Install the requirements and setup the development environment.

    make install && make dev

  • Create the database.

    python manage.py initdb

  • Run the application.

    python manage.py runserver

  • Navigate to localhost:5000.

Configuration

The goal is to keep most of the application's configuration in a single file called config.py. I added a config_dev.py and a config_prod.py who inherit from config_common.py. The trick is to symlink either of these to config.py. This is done in by running make dev or make prod.

I have included a working mail account which takes my email and password I've securedly stored in my environment so you would need to set up an environment password for your mailing account to be able to send emails here.

Read this for information on the possible configuration options.

Future Work

  • Other Lung Diseases: Create a classifier to differentiate pneumonia x-rays from other lung infections like Tuberculosis, etc.

  • Target Detection: Create a classifier to detect what section of the lungs the infection is located.

  • Model Improvement: Collect more data and tune more layers to the transfer learning model to improve its performance.

License

The MIT License (MIT). Please see the license file for more information.