Reproducible Generative Learning

How to quantify reproducibility of graph neural networks while using generative learning?

Please contact mohammedaminegh@gmail.com for inquiries. Thanks.

Introduction

This work is accepted at the PRIME workshop in MICCAI 2021.

Investigating and Quantifying the Reproducibility of Graph Neural Networks in Predictive Medicine

Mohammed Amine Gharsallaoui, Furkan Tornaci and Islem Rekik

BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey

Abstract: Graph neural networks (GNNs) have gained an unprecedented attention in many domains including dysconnectivity disorder diagnosis thanks to their high performance in tackling graph classification tasks. Despite the large stream of GNNs developed recently, prior efforts invariably focus on boosting the classification accuracy while ignoring the model reproducibility and interpretability, which are vital in pinning down disorder-specific biomarkers. Although less investigated, the discriminativeness of the original input features -biomarkers, which is reflected by their learnt weights using a GNN gives informative insights about their reliability. Intuitively, the reliability of a given biomarker is emphasized if it belongs to the sets of top discriminative regions of interest (ROIs) using different models. Therefore, we define the first axis in our work as \emph{reproducibility across models}, which evaluates the commonalities between sets of top discriminative biomarkers for a pool of GNNs. This task mainly answers this question: \emph{How likely can two models be congruent in terms of their respective sets of top discriminative biomarkers?} The second axis of research in our work is to investigate \emph{reproducibility in generated connectomic datasets}. This is addressed by answering this question: \emph{how likely would the set of top discriminative biomarkers by a trained model for a ground-truth dataset be consistent with a predicted dataset by generative learning?} In this paper, we propose a reproducibility assessment framework, a method for quantifying the commonalities in the GNN-specific learnt feature maps across models, which can complement explanatory approaches of GNNs and provide new ways to assess predictive medicine via biomarkers reliability. We evaluated our framework using four multiview connectomic datasets of healthy neurologically disordered subjects with five GNN architectures and two different learning mindsets: (a) conventional training on all samples (resourceful) and (b) a few-shot training on random samples (frugal).

Code

This code was implemented using Python 3.8 (Anaconda) on Windows 10.

Installation

Anaconda Installattion

Go to https://www.anaconda.com/products/individual
Download version for your system (We used Python 3.8 on 64bit Windows 10 )
Install the platform
Create a conda environment by typing: conda create –n env_reproducibility pip python=3.8

Dependency Installattion

Copy and paste following commands to install all packages (CPU version)

$ conda activate env_reproducibility
$ conda install pytorch==1.4.0 torchvision==0.5.0 cpuonly -c pytorch
$ pip install scikit-learn
$ pip install matplotlib
$ pip install torch-scatter==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-sparse==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-cluster==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-spline-conv==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-geometric
$ pip install annoy
$ pip install fbpca

These instructions are for CPU installation. If you want GPU installation, please visit (optional) PyTorch-Geometric’s web page (https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html) for description on installing GPU version. Code will check the version of dependencies and availability of GPU. If everything is configured correctly, it will utilize GPU automatically.

Data format

In case you want to use our framework, the input dataset should be a list of numpy arrays. Each numpy array is of size (n_r, n_r, n_v), where n_r and n_v are the number of regions and views, respectively. We provided within this code, the python file handle_data/simulate_data.py to simulate data. You should specify the number of subjects, the number of views and the number of regions. After simulating the dataset, you can predict the data via generative learning using this file topogan/main_topogan.py. The input data should be vectorized before using the generative learning. To vectorize the data, you can use handle_data/vectorize.py. The output of the generative learning is in a vectorized format. To restore the matrix format of the generated data, you can use this file handle_data/collect_generated.py. After predicting the dataset using generative learning, put both repositories (real and generated) in this path reproducibility/data.

Run reproducibility framework

After obtaining the real and generated datasets, you can run the GNN models by running this file reproducibility/demo.py. You can open up a terminal at the reproducibility directory and type in

$ conda activate env_reproducibility & python demo.py

GNN models

The GNN models included are:

Model	Paper
DiffPool	https://arxiv.org/abs/1806.08804
SAGPool	http://proceedings.mlr.press/v97/lee19c.html
GAT	https://arxiv.org/abs/1710.10903
g-U-Nets	http://proceedings.mlr.press/v97/gao19a.html
GCN	https://arxiv.org/abs/1609.02907

Main components of our Code

Component	Content
handle_data	Includes files required to simulate, vectorize and reshape the data.
reproducibility	Contains the GNN codes and the reproducibility framework implementation.
topogan	Contains the code files of the generative learning technique.

Example Result

The figure demonstrates an example of output for a population of 80 subjects where each subject has 2 views (each represented by 35 by 35 matrix). We computed the reproducibility scores of 5 GNN models using two training settings (cross-validation and few-shot). For each view, we display the scores using real and generated datasets.

Relevant References

Alaa Bessadok, Mohamed Ali Mahjoub, Islem Rekik. Brain multigraph prediction using topology-aware adversarial graph neural network. In Medical Image Analysis 72 (2021).

Nicolas George, Islem Mhiri, Islem Rekik. Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks. In Pattern Recognition 101 (2020).

YouTube video of our paper

https://youtu.be/-R1PrnX80FE

Please cite the following paper when using our framework

@inproceedings{gharsallaoui2021,
  title={Investigating and Quantifying the Reproducibility of Graph Neural Networks in Predictive Medicine},
  author={Gharsallaoui, Mohammed Amine and Tornaci, Furkan and Rekik, Islem},
  booktitle={International Workshop on PRedictive Intelligence In MEdicine},
  pages={104--116},
  year={2021},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
handle_data		handle_data
reproducibility		reproducibility
topogan		topogan
README.md		README.md
main_figure.png		main_figure.png
results_figure.png		results_figure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducible Generative Learning

Introduction

Code

Installation

Anaconda Installattion

Dependency Installattion

Data format

Run reproducibility framework

GNN models

Main components of our Code

Example Result

Relevant References

YouTube video of our paper

Please cite the following paper when using our framework

About

Releases

Packages

Contributors 2

Languages

basiralab/Reproducible-Generative-Learning

Folders and files

Latest commit

History

Repository files navigation

Reproducible Generative Learning

Introduction

Code

Installation

Anaconda Installattion

Dependency Installattion

Data format

Run reproducibility framework

GNN models

Main components of our Code

Example Result

Relevant References

YouTube video of our paper

Please cite the following paper when using our framework

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages