This is the code repository for the paper entitled On the limits of graph neural networks for the early diagnosis of Alzheimer’s Disease. The repository follows the methodology and results presented in the abovementioned work.
The results obtained for the manuscript are organized in the following notebooks:
- 1_ADNI_GNNs_networks - for Section 3.1. Comparing results using different input networks
- 2_ADNI_GNNs_vs_nonGNNs - for Section 3.2. Benchmarking GNNs performance vs. canonical machine learning models
- 3_ADNI_GNNs_random_networks - for Section 3.3. Using randomized networks as input
- 4_LOAD_GNNs - for Section 3.4. Using another dataset as input
These notebooks use information from several scripts, organized in the following subdirectories:
- data_preprocessing - scripts for obtaining AD-related genes and genetic data from the different cohorts employed.
- networks- scripts for obtaining biological networks from different sources and build random networks.
- create_datasets - scripts for building different datasets for supervised classification models.
- create_nx_datasets.py - for building graph-datasets for Graph Neural Networks (GNNs)
- ml_models - scripts with different functions for using with other non-GNN models.
Other subdirectories present in this repository:
- data contains several data files used in this work.
- results CSV files with the results presented in this work.
- figures
Please note that several files such as raw genetic data, graph and table datasets build from it, and metadata files for cohorts' description are not available in this repository due to privacy reasons.
The code in this work was built using:
- disgenet2r for obtaining GDAs from DisGeNET.
- biomaRt for obtaining genomic coordinates of the genes of interest.
- VCFTools and Ensemble's Variant Effect Predictor (VEP) for extracting and annotating missense variants.
- NetworkX for networks' manipulation and building graph datasets.
- GraphGym for evaluating and testing GNN models on graph datasets.
- Scikit-Learn for building non-GNN models.
- SciPy for statistical analyses.
We provide an Anaconda environment including all the dependencies.
More information about the project can be found at this AIMe report.
Please refer any questions to: Laura Hernández-Lorenzo - GitHub - email