DeepRank · gcroci2 · Dec 21, 2023 · Nov 27, 2023 · Nov 27, 2023 · Nov 27, 2023
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,34 @@
+# Pull base image
+FROM --platform=linux/x86_64 continuumio/miniconda3:23.10.0-1
+
+# Add files
+ADD ./tutorials /home/deeprank2/tutorials 
+ADD ./env/environment.yml /home
+ADD ./env/requirements.txt /home
+
+# Install
+RUN \
+  apt update -y && \
+  apt install unzip -y && \
+  ## GCC
+  apt install -y gcc && \
+  ## DSSP
+  wget https://github.com/PDB-REDO/dssp/releases/download/v4.4.0/mkdssp-4.4.0-linux-x64 && \
+  mv mkdssp-4.4.0-linux-x64 /usr/local/bin/mkdssp && \
+  chmod a+x /usr/local/bin/mkdssp && \
+  ## Conda and pip deps
+  conda env create -f /home/environment.yml && \
+  ## Get the data for running the tutorials
+  wget https://zenodo.org/records/8349335/files/data_raw.zip && \
+  unzip data_raw.zip -d data_raw && \
+  mv data_raw /home/deeprank2/tutorials
+
+# Activate the environment
+RUN echo "source activate deeprank2" > ~/.bashrc
+ENV PATH /opt/conda/envs/deeprank2/bin:$PATH
+
+# Define working directory
+WORKDIR /home/deeprank2
+
+# Define default command
+CMD ["bash"]
-CMD ["bash"]
+CMD ["/opt/conda/envs/deeprank2/bin/jupyter", "notebook"]
-CMD ["bash"]
+CMD ["/opt/conda/envs/deeprank2/bin/jupyter", "notebook"]
diff --git a/README.md b/README.md
@@ -37,42 +37,81 @@ DeepRank2 extensive documentation can be found [here](https://deeprank2.rtfd.io/
   - [Overview](#overview)
   - [Table of contents](#table-of-contents)
   - [Installation](#installation)
-    - [Dependencies](#dependencies)
-    - [Deeprank2 Package](#deeprank2-package)
-    - [Test installation](#test-installation)
+    - [Dockerfile](#dockerfile)
+    - [Non-pythonic dependencies](#non-pythonic-dependencies)
+    - [Pythonic dependencies](#pythonic-dependencies)
+      - [Test installation](#test-installation)
     - [Contributing](#contributing)
-    - [Data generation](#data-generation)
-    - [Datasets](#datasets)
-      - [GraphDataset](#graphdataset)
-      - [GridDataset](#griddataset)
-    - [Training](#training)
+  - [Data generation](#data-generation)
+  - [Datasets](#datasets)
+    - [GraphDataset](#graphdataset)
+    - [GridDataset](#griddataset)
+  - [Training](#training)
   - [Computational performances](#computational-performances)
   - [Package development](#package-development)
 
 ## Installation
 
-The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows. 
+Note that the package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows. 
 
-### Dependencies
+### Dockerfile
 
-Before installing deeprank2 you need to install some dependencies. We advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with Python >= 3.10 installed. The following dependency installation instructions are updated as of 14/09/2023, but in case of issues during installation always refer to the official documentation which is linked below:
+In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands from the root of the repository.
+
+Build the Docker image:
+```bash
+docker build -t deeprank2 .
+```
+
+SSH to a running container:
+
+```bash
+docker run -it --expose 3000  -p 3000:3000 deeprank2
+```
+
+Run the tutorials' notebooks from within the running container:
+```bash
+cd tutorials
+jupyter notebook --ip 0.0.0.0 --no-browser --allow-root --port 3000
+```
+
+Now you can run the tutorials' notebook. More details about their content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.
+
+### Non-pythonic dependencies
+
+Instructions are updated as of 14/09/2023.
+
+Before installing deeprank2 you need to install some dependencies:
 
-*  [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
-    * [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
-*  [PyTorch](https://pytorch.org/get-started/locally/)
-    * We support torch's CPU library as well as CUDA.
-*  [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
 *  [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
     * Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
       * on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
       * on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread. 
 *  [GCC](https://gcc.gnu.org/install/)
-    * Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.  
-*  For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
+    * Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`. 
+
+### Pythonic dependencies
 
-### Deeprank2 Package
+Instructions are updated as of 14/09/2023.
+
+Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):
+
+```bash
+# Create the environment
+conda env create -f env/environment.yml
+# Activate the environment
+conda activate deeprank2
+```
+
+Alternatively, if you are a MacOS user, if the .YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation you should always refer to the official documentation which is linked below:
+
+*  [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
+    * [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
+*  [PyTorch](https://pytorch.org/get-started/locally/)
+*  [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
+*  For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
 
-Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:
+Finally do:
 
 ```bash
 pip install deeprank2
@@ -88,9 +127,9 @@ pip install -e .'[test]'
 
 The `test` extra is optional, and can be used to install test-related dependencies useful during the development.
 
-### Test installation
+#### Test installation
 
-If you have installed the package from a cloned repository (second option above), you can check that all components were installed correctly, using pytest.
+If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest.
 The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.
 
 Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).
@@ -103,7 +142,7 @@ The following section serves as a first guide to start using the package, using
 as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
 For more details, see the [extended documentation](https://deeprank2.rtfd.io/).
 
-### Data generation
+## Data generation
 
 For each protein-protein complex (or protein structure containing a SRV), a query can be created and added to the `QueryCollection` object, to be processed later on. Different types of queries exist:
 - In a `ProteinProteinInterfaceResidueQuery` and `SingleResidueVariantResidueQuery`, each node represents one amino acid residue.
@@ -186,11 +225,11 @@ hdf5_paths = queries.process(
     grid_map_method = MapMethod.GAUSSIAN)
 ```
 
-### Datasets
+## Datasets
 
 Data can be split in sets implementing custom splits according to the specific application. Assuming that the training, validation and testing ids have been chosen (keys of the HDF5 file/s), then the `DeeprankDataset` objects can be defined.
 
-#### GraphDataset
+### GraphDataset
 
 For training GNNs the user can create a `GraphDataset` instance:
 
@@ -226,7 +265,7 @@ dataset_test = GraphDataset(
 )
 ```
 
-#### GridDataset
+### GridDataset
 
 For training CNNs the user can create a `GridDataset` instance:
 
@@ -260,7 +299,7 @@ dataset_test = GridDataset(
 )
 ```
 
-### Training
+## Training
 
 Let's define a `Trainer` instance, using for example of the already existing `GINet`. Because `GINet` is a GNN, it requires a dataset instance of type `GraphDataset`.
 

diff --git a/docs/installation.md b/docs/installation.md
@@ -1,28 +1,65 @@
 # Installation
 
-The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows. 
+Note that the package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows. 
 
-## Dependencies
+## Dockerfile
 
-Before installing deeprank2 you need to install some dependencies. We advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with Python >= 3.10 installed. The following dependency installation instructions are updated as of 14/09/2023, but in case of issues during installation always refer to the official documentation which is linked below:
+In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands from the root of the repository.
+
+Build the Docker image:
+```bash
+docker build -t deeprank2 .
+```
+
+SSH to a running container:
+
+```bash
+docker run -it --expose 3000  -p 3000:3000 deeprank2
+```
+
+Run the tutorials' notebooks from within the running container:
+```bash
+cd tutorials
+jupyter notebook --ip 0.0.0.0 --no-browser --allow-root --port 3000
+```
+
+Now you can run the tutorials' notebook. More details about their content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.
+
+## Non-pythonic dependencies
+
+Instructions are updated as of 14/09/2023.
+
+Before installing deeprank2 you need to install some dependencies:
 
-*  [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
-    *  [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
-*  [PyTorch](https://pytorch.org/get-started/locally/)
-    *  We support torch's CPU library as well as CUDA.
-*  [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
 *  [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
-  *  Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
-    *  on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
-    *  on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed.
+    * Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
+      * on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
+      * on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread. 
 *  [GCC](https://gcc.gnu.org/install/)
-  *  Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.  
-*  For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
+    * Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`. 
 
+## Pythonic dependencies
 
-## Deeprank2 Package
+Instructions are updated as of 14/09/2023.
 
-Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:
+Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):
+
+```bash
+# Create the environment
+conda env create -f env/environment.yml
+# Activate the environment
+conda activate deeprank2
+```
+
+Alternatively, if you are a MacOS user, if the .YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation you should always refer to the official documentation which is linked below:
+
+*  [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
+    * [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
+*  [PyTorch](https://pytorch.org/get-started/locally/)
+*  [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
+*  For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
+
+Finally do:
 
 ```bash
 pip install deeprank2
@@ -38,13 +75,17 @@ pip install -e .'[test]'
 
 The `test` extra is optional, and can be used to install test-related dependencies useful during the development.
 
-## Test installation
+### Test installation
 
-If you have installed the package from a cloned repository (second option above), you can check that all components were installed correctly, using pytest.
+If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest.
 The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.
 
 Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).
 
 ## Contributing
 
 If you would like to contribute to the package in any way, please see [our guidelines](CONTRIBUTING.rst).
+
+The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries
+as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
+For more details, see the [extended documentation](https://deeprank2.rtfd.io/).
diff --git a/env/environment.yml b/env/environment.yml
@@ -0,0 +1,19 @@
+name: deeprank2
+channels:
+  - pytorch
+  - pyg
+  - bioconda
+  - defaults
+dependencies:
+  - pip==23.3.*
+  - python==3.10.*
+  - msms==2.6.1
+  - pytorch==2.1.1
+  - pytorch-mutex==1.0.*
+  - torchvision==0.16.1
+  - torchaudio==2.1.1
+  - cpuonly==2.0.*
+  - pyg==2.4.0
+  - notebook==7.0.6
+  - pip:
+    - --requirement requirements.txt
diff --git a/env/requirements.txt b/env/requirements.txt
@@ -0,0 +1,6 @@
+--find-links https://data.pyg.org/whl/torch-2.1.0+cpu.html
+torch_scatter==2.1.2
+torch_sparse==0.6.18
+torch_cluster==1.6.3
+torch_spline_conv==1.2.2
+deeprank2==2.1.1
diff --git a/tutorials/data_generation_ppi.ipynb b/tutorials/data_generation_ppi.ipynb
@@ -102,7 +102,9 @@
     "data_path = os.path.join(\"data_raw\", \"ppi\")\n",
     "processed_data_path = os.path.join(\"data_processed\", \"ppi\")\n",
     "os.makedirs(os.path.join(processed_data_path, \"residue\"))\n",
-    "os.makedirs(os.path.join(processed_data_path, \"atomic\"))"
+    "os.makedirs(os.path.join(processed_data_path, \"atomic\"))\n",
+    "# Flag limit_data as True if you are running on a machine with limited memory (e.g., Docker container)\n",
+    "limit_data = True"
    ]
   },
   {
@@ -139,7 +141,10 @@
     "\tbas = csv_data_indexed.measurement_value.values.tolist()\n",
     "\treturn pdb_files, bas\n",
     "\n",
-    "pdb_files, bas = get_pdb_files_and_target_data(data_path)"
+    "pdb_files, bas = get_pdb_files_and_target_data(data_path)\n",
+    "\n",
+    "if limit_data:\n",
+    "\tpdb_files = pdb_files[:10]"
    ]
   },
   {
@@ -204,7 +209,7 @@
     "\tif count % 20 == 0:\n",
     "\t\tprint(f'{count} queries added to the collection.')\n",
     "\n",
-    "print(f'Queries ready to be processed.\\n')"
+    "print('Queries ready to be processed.\\n')"
    ]
   },
   {
@@ -437,7 +442,7 @@
     "\tif count % 20 == 0:\n",
     "\t\tprint(f'{count} queries added to the collection.')\n",
     "\n",
-    "print(f'Queries ready to be processed.\\n')"
+    "print('Queries ready to be processed.\\n')"
    ]
   },
   {

diff --git a/tutorials/data_generation_srv.ipynb b/tutorials/data_generation_srv.ipynb
@@ -116,7 +116,9 @@
     "data_path = os.path.join(\"data_raw\", \"srv\")\n",
     "processed_data_path = os.path.join(\"data_processed\", \"srv\")\n",
     "os.makedirs(os.path.join(processed_data_path, \"residue\"))\n",
-    "os.makedirs(os.path.join(processed_data_path, \"atomic\"))"
+    "os.makedirs(os.path.join(processed_data_path, \"atomic\"))\n",
+    "# Flag limit_data as True if you are running on a machine with limited memory (e.g., Docker container)\n",
+    "limit_data = True"
    ]
   },
   {
@@ -158,7 +160,10 @@
     "\tpdb_files = [data_path + \"/pdb/\" + pdb_name for pdb_name in pdb_names]\n",
     "\treturn pdb_files, res_numbers, res_wildtypes, res_variants, targets\n",
     "\n",
-    "pdb_files, res_numbers, res_wildtypes, res_variants, targets = get_pdb_files_and_target_data(data_path)"
+    "pdb_files, res_numbers, res_wildtypes, res_variants, targets = get_pdb_files_and_target_data(data_path)\n",
+    "\n",
+    "if limit_data:\n",
+    "\tpdb_files = pdb_files[:10]"
    ]
   },
   {