Merge pull request #528 from DeepRank/527_add_dockerfile_gcroci2

docs: add `Dockerfile` and `.yml` for building conda env
DeepRank · Dec 21, 2023 · 5cde0bb · 5cde0bb
2 parents 5426600 + 484d580
commit 5cde0bb
Show file tree

Hide file tree

Showing 10 changed files with 240 additions and 73 deletions.
diff --git a/.github/actions/install-python-and-package/action.yml b/.github/actions/install-python-and-package/action.yml
@@ -1,6 +1,6 @@
-name: "Install Python and deeprank2"
+name: "Install Python and DeepRank2"
 
-description: "Installs Python, updates pip and installs deeprank2 together with its dependencies."
+description: "Installs Python, updates pip and installs DeepRank2 together with its dependencies."
 
 inputs:
 

diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,36 @@
+# Pull base image
+FROM --platform=linux/x86_64 condaforge/miniforge3:23.3.1-1
+
+# Add files
+ADD ./tutorials /home/deeprank2/tutorials 
+ADD ./env/environment.yml /home/deeprank2
+ADD ./env/requirements.txt /home/deeprank2
+
+# Install
+RUN \
+  apt update -y && \
+  apt install unzip -y && \
+  ## GCC
+  apt install -y gcc && \
+  ## DSSP
+  wget https://github.com/PDB-REDO/dssp/releases/download/v4.4.0/mkdssp-4.4.0-linux-x64 && \
+  mv mkdssp-4.4.0-linux-x64 /usr/local/bin/mkdssp && \
+  chmod a+x /usr/local/bin/mkdssp && \
+  ## Conda and pip deps
+  mamba env create -f /home/deeprank2/environment.yml && \
+  ## Get the data for running the tutorials
+  if [ -d "/home/deeprank2/tutorials/data_raw" ]; then rm -Rf /home/deeprank2/tutorials/data_raw; fi && \
+  if [ -d "/home/deeprank2/tutorials/data_processed" ]; then rm -Rf /home/deeprank2/tutorials/data_processed; fi && \
+  wget https://zenodo.org/records/8349335/files/data_raw.zip && \
+  unzip data_raw.zip -d data_raw && \
+  mv data_raw /home/deeprank2/tutorials
+
+# Activate the environment
+RUN echo "source activate deeprank2" > ~/.bashrc
+ENV PATH /opt/conda/envs/deeprank2/bin:$PATH
+
+# Define working directory
+WORKDIR /home/deeprank2
+
+# Define default command
+CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--NotebookApp.token=''","--NotebookApp.password=''", "--allow-root"]
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Deeprank2
+# DeepRank2
 
 | Badges | |
 |:----:|----|
@@ -33,52 +33,101 @@ DeepRank2 extensive documentation can be found [here](https://deeprank2.rtfd.io/
 
 ## Table of contents
 
-- [Deeprank2](#deeprank2)
+- [DeepRank2](#deeprank2)
   - [Overview](#overview)
   - [Table of contents](#table-of-contents)
-  - [Installation](#installation)
-    - [Dependencies](#dependencies)
-    - [Deeprank2 Package](#deeprank2-package)
-    - [Test installation](#test-installation)
-    - [Contributing](#contributing)
-    - [Data generation](#data-generation)
-    - [Datasets](#datasets)
-      - [GraphDataset](#graphdataset)
-      - [GridDataset](#griddataset)
-    - [Training](#training)
+  - [Installations](#installations)
+    - [Containerized Installation](#containerized-installation)
+    - [Local/remote installation](#localremote-installation)
+      - [Non-pythonic dependencies](#non-pythonic-dependencies)
+      - [Pythonic dependencies](#pythonic-dependencies)
+      - [Install DeepRank2](#install-deeprank2)
+      - [Test installation](#test-installation)
+  - [Contributing](#contributing)
+  - [Data generation](#data-generation)
+  - [Datasets](#datasets)
+    - [GraphDataset](#graphdataset)
+    - [GridDataset](#griddataset)
+  - [Training](#training)
   - [Computational performances](#computational-performances)
   - [Package development](#package-development)
 
-## Installation
+## Installations
 
-The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows. 
+Note that the package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows. 
 
-### Dependencies
+You can either install DeepRank2 in a [dockerized container](#containerized-installation), which will allow you to run our [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials), or you can [install the package locally](#localremote-installation).
 
-Before installing deeprank2 you need to install some dependencies. We advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with Python >= 3.10 installed. The following dependency installation instructions are updated as of 14/09/2023, but in case of issues during installation always refer to the official documentation which is linked below:
+### Containerized Installation 
+
+In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands (you may need to have sudo permission) from the root of the repository.
+
+Build the Docker image:
+
+```bash
+docker build -t deeprank2 .
+```
+
+Run the Docker container:
+
+```bash
+docker run -p 8888:8888 deeprank2
+```
+
+This assumes that your application inside the container is listening on port 8888, and you want to map it to port 8888 on your host machine. Open a browser and go to `http://localhost:8888` to access the application running inside the Docker container and run the tutorials' notebooks.
+
+More details about the tutorials' content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.
+
+After running the tutorials, you may want to remove the (quite large) Docker image from your machine. In this case, remember to [stop the container](https://docs.docker.com/engine/reference/commandline/stop/) and then [remove the image](https://docs.docker.com/engine/reference/commandline/image_rm/). More general information about Docker can be found on the [official website docs](https://docs.docker.com/get-started/). 
+
+### Local/remote installation
+
+#### Non-pythonic dependencies
+
+Instructions are up to date as of 27 Nov 2023.
+
+Before installing DeepRank2 you need to install some dependencies:
 
-*  [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
-    * [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
-*  [PyTorch](https://pytorch.org/get-started/locally/)
-    * We support torch's CPU library as well as CUDA.
-*  [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
 *  [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
     * Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
       * on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
       * on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread. 
 *  [GCC](https://gcc.gnu.org/install/)
-    * Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.  
+    * Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`. 
+
+#### Pythonic dependencies
+
+Instructions are up to date as of 27 Nov 2023.
+
+Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):
+
+```bash
+# Ensure you are in your base environment
+conda activate
+# Create the environment
+conda env create -f env/environment.yml
+# Activate the environment
+conda activate deeprank2
+```
+
+Alternatively, if you are a MacOS user, if the YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation, please refer to the official documentation for each package (linked below), as our instructions may be out of date:
+
+*  [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
+    * [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
+*  [PyTorch](https://pytorch.org/get-started/locally/)
+*  [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) `conda install pyg -c pyg`
+    * Also install all [optional additions to PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html#installation-from-wheels), namely: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
 *  For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
 
-### Deeprank2 Package
+#### Install DeepRank2
 
-Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:
+Finally do:
 
 ```bash
 pip install deeprank2
 ```
 
-Alternatively, get all the new developments by cloning the repo and installing the editable version of the package with:
+Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:
 
 ```bash
 git clone https://github.com/DeepRank/deeprank2
@@ -88,22 +137,21 @@ pip install -e .'[test]'
 
 The `test` extra is optional, and can be used to install test-related dependencies useful during the development.
 
-### Test installation
+#### Test installation
 
-If you have installed the package from a cloned repository (second option above), you can check that all components were installed correctly, using pytest.
+If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest (run `pip install pytest` if you did not install it above).
 The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.
 
 Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).
 
-### Contributing
+## Contributing
 
 If you would like to contribute to the package in any way, please see [our guidelines](CONTRIBUTING.rst).
 
-The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries
-as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
+The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
 For more details, see the [extended documentation](https://deeprank2.rtfd.io/).
 
-### Data generation
+## Data generation
 
 For each protein-protein complex (or protein structure containing a SRV), a query can be created and added to the `QueryCollection` object, to be processed later on. Different types of queries exist:
 - In a `ProteinProteinInterfaceResidueQuery` and `SingleResidueVariantResidueQuery`, each node represents one amino acid residue.
@@ -186,11 +234,11 @@ hdf5_paths = queries.process(
     grid_map_method = MapMethod.GAUSSIAN)
 ```
 
-### Datasets
+## Datasets
 
 Data can be split in sets implementing custom splits according to the specific application. Assuming that the training, validation and testing ids have been chosen (keys of the HDF5 file/s), then the `DeeprankDataset` objects can be defined.
 
-#### GraphDataset
+### GraphDataset
 
 For training GNNs the user can create a `GraphDataset` instance:
 
@@ -226,7 +274,7 @@ dataset_test = GraphDataset(
 )
 ```
 
-#### GridDataset
+### GridDataset
 
 For training CNNs the user can create a `GridDataset` instance:
 
@@ -260,7 +308,7 @@ dataset_test = GridDataset(
 )
 ```
 
-### Training
+## Training
 
 Let's define a `Trainer` instance, using for example of the already existing `GINet`. Because `GINet` is a GNN, it requires a dataset instance of type `GraphDataset`.
 

diff --git a/docs/getstarted.md b/docs/getstarted.md
@@ -137,7 +137,7 @@ As representative example, the following is the HDF5 structure generated by the
         └── binary
 ```
 
-This entry represents the interface between the two proteins contained in the `.pdb` file, at the residue level. `edge_features` and `node_features` are specific for the graph-like representation of the PPI, while `grid_points` and `mapped_features` refer to the grid mapped from the graph. Each data point generated by deeprank2 has the above structure, apart from the features and the target that are specified by the user.
+This entry represents the interface between the two proteins contained in the `.pdb` file, at the residue level. `edge_features` and `node_features` are specific for the graph-like representation of the PPI, while `grid_points` and `mapped_features` refer to the grid mapped from the graph. Each data point generated by DeepRank2 has the above structure, apart from the features and the target that are specified by the user.
 
 It is always a good practice to first explore the data, and then make decision about splitting them in training, test and validation sets. For this purpose, users can either use [HDFView](https://www.hdfgroup.org/downloads/hdfview/), a visual tool written in Java for browsing and editing HDF5 files, or Python packages such as [h5py](https://docs.h5py.org/en/stable/). Few examples for the latter:
 
@@ -366,7 +366,7 @@ trainer.test()
 
 ### Results export and visualization
 
-The user can specify a deeprank2 exporter or a custom one in `output_exporters` parameter of the Trainer class, together with the path where to save the results. Exporters are used for storing predictions information collected later on during training and testing. Example:
+The user can specify a DeepRank2 exporter or a custom one in `output_exporters` parameter of the Trainer class, together with the path where to save the results. Exporters are used for storing predictions information collected later on during training and testing. Example:
 
 ```python
 from deeprank2.trainer import Trainer