How to quantify reproducibility of graph neural networks while using generative learning?
Please contact mohammedaminegh@gmail.com for inquiries. Thanks.
This work is accepted at the PRIME workshop in MICCAI 2021.
Investigating and Quantifying the Reproducibility of Graph Neural Networks in Predictive Medicine
Mohammed Amine Gharsallaoui, Furkan Tornaci and Islem Rekik
BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey
Abstract: Graph neural networks (GNNs) have gained an unprecedented attention in many domains including dysconnectivity disorder diagnosis thanks to their high performance in tackling graph classification tasks. Despite the large stream of GNNs developed recently, prior efforts invariably focus on boosting the classification accuracy while ignoring the model reproducibility and interpretability, which are vital in pinning down disorder-specific biomarkers. Although less investigated, the discriminativeness of the original input features -biomarkers, which is reflected by their learnt weights using a GNN gives informative insights about their reliability. Intuitively, the reliability of a given biomarker is emphasized if it belongs to the sets of top discriminative regions of interest (ROIs) using different models. Therefore, we define the first axis in our work as \emph{reproducibility across models}, which evaluates the commonalities between sets of top discriminative biomarkers for a pool of GNNs. This task mainly answers this question: \emph{How likely can two models be congruent in terms of their respective sets of top discriminative biomarkers?} The second axis of research in our work is to investigate \emph{reproducibility in generated connectomic datasets}. This is addressed by answering this question: \emph{how likely would the set of top discriminative biomarkers by a trained model for a ground-truth dataset be consistent with a predicted dataset by generative learning?} In this paper, we propose a reproducibility assessment framework, a method for quantifying the commonalities in the GNN-specific learnt feature maps across models, which can complement explanatory approaches of GNNs and provide new ways to assess predictive medicine via biomarkers reliability. We evaluated our framework using four multiview connectomic datasets of healthy neurologically disordered subjects with five GNN architectures and two different learning mindsets: (a) conventional training on all samples (resourceful) and (b) a few-shot training on random samples (frugal).
This code was implemented using Python 3.8 (Anaconda) on Windows 10.
- Go to https://www.anaconda.com/products/individual
- Download version for your system (We used Python 3.8 on 64bit Windows 10 )
- Install the platform
- Create a conda environment by typing:
conda create –n env_reproducibility pip python=3.8
Copy and paste following commands to install all packages (CPU version)
$ conda activate env_reproducibility
$ conda install pytorch==1.4.0 torchvision==0.5.0 cpuonly -c pytorch
$ pip install scikit-learn
$ pip install matplotlib
$ pip install torch-scatter==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-sparse==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-cluster==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-spline-conv==latest+cpu -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-geometric
$ pip install annoy
$ pip install fbpca
These instructions are for CPU installation. If you want GPU installation, please visit (optional) PyTorch-Geometric’s web page (https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html) for description on installing GPU version. Code will check the version of dependencies and availability of GPU. If everything is configured correctly, it will utilize GPU automatically.
In case you want to use our framework, the input dataset should be a list of numpy arrays. Each numpy array is of size (n_r, n_r, n_v), where n_r and n_v are the number of regions and views, respectively. We provided within this code, the python file handle_data/simulate_data.py
to simulate data. You should specify the number of subjects, the number of views and the number of regions. After simulating the dataset, you can predict the data via generative learning using this file topogan/main_topogan.py
. The input data should be vectorized before using the generative learning. To vectorize the data, you can use handle_data/vectorize.py
. The output of the generative learning is in a vectorized format. To restore the matrix format of the generated data, you can use this file handle_data/collect_generated.py
. After predicting the dataset using generative learning, put both repositories (real and generated) in this path reproducibility/data
.
After obtaining the real and generated datasets, you can run the GNN models by running this file reproducibility/demo.py
. You can open up a terminal at the reproducibility
directory and type in
$ conda activate env_reproducibility & python demo.py
The GNN models included are:
Model | Paper |
---|---|
DiffPool | https://arxiv.org/abs/1806.08804 |
SAGPool | http://proceedings.mlr.press/v97/lee19c.html |
GAT | https://arxiv.org/abs/1710.10903 |
g-U-Nets | http://proceedings.mlr.press/v97/gao19a.html |
GCN | https://arxiv.org/abs/1609.02907 |
Component | Content |
---|---|
handle_data | Includes files required to simulate, vectorize and reshape the data. |
reproducibility | Contains the GNN codes and the reproducibility framework implementation. |
topogan | Contains the code files of the generative learning technique. |
The figure demonstrates an example of output for a population of 80 subjects where each subject has 2 views (each represented by 35 by 35 matrix). We computed the reproducibility scores of 5 GNN models using two training settings (cross-validation and few-shot). For each view, we display the scores using real and generated datasets.
Alaa Bessadok, Mohamed Ali Mahjoub, Islem Rekik. Brain multigraph prediction using topology-aware adversarial graph neural network. In Medical Image Analysis 72 (2021).
Nicolas George, Islem Mhiri, Islem Rekik. Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks. In Pattern Recognition 101 (2020).
@inproceedings{gharsallaoui2021,
title={Investigating and Quantifying the Reproducibility of Graph Neural Networks in Predictive Medicine},
author={Gharsallaoui, Mohammed Amine and Tornaci, Furkan and Rekik, Islem},
booktitle={International Workshop on PRedictive Intelligence In MEdicine},
pages={104--116},
year={2021},
organization={Springer}
}