Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add Dockerfile and .yml for building conda env #528

Merged
merged 13 commits into from
Dec 21, 2023
34 changes: 34 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Pull base image
FROM --platform=linux/x86_64 continuumio/miniconda3:23.10.0-1
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

# Add files
ADD ./tutorials /home/deeprank2/tutorials
ADD ./env/environment.yml /home
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
ADD ./env/requirements.txt /home
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

# Install
RUN \
apt update -y && \
apt install unzip -y && \
## GCC
apt install -y gcc && \
## DSSP
wget https://github.com/PDB-REDO/dssp/releases/download/v4.4.0/mkdssp-4.4.0-linux-x64 && \
mv mkdssp-4.4.0-linux-x64 /usr/local/bin/mkdssp && \
chmod a+x /usr/local/bin/mkdssp && \
## Conda and pip deps
conda env create -f /home/environment.yml && \
## Get the data for running the tutorials
wget https://zenodo.org/records/8349335/files/data_raw.zip && \
unzip data_raw.zip -d data_raw && \
mv data_raw /home/deeprank2/tutorials

# Activate the environment
RUN echo "source activate deeprank2" > ~/.bashrc
ENV PATH /opt/conda/envs/deeprank2/bin:$PATH

# Define working directory
WORKDIR /home/deeprank2

# Define default command
CMD ["bash"]
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CMD ["bash"]
CMD ["/opt/conda/envs/deeprank2/bin/jupyter", "notebook"]

Copy link
Collaborator Author

@gcroci2 gcroci2 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't working (no jupyter server was started)
I used the following instead:
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--NotebookApp.token=''","--NotebookApp.password=''", "--allow-root"]
Without disabling the password, the browser was asking for the token (and the token from the terminal wasn't working).
Anyway, I'd avoid using this if it's unsafe and there are no safe ways to disable the token request. The "bash" version seems easier to me from a user perspective in that case. @dsmits

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge soon the PR, but will leave this conversation unsolved so you can comment when you come back :) @dsmits

93 changes: 66 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,42 +37,81 @@ DeepRank2 extensive documentation can be found [here](https://deeprank2.rtfd.io/
- [Overview](#overview)
- [Table of contents](#table-of-contents)
- [Installation](#installation)
- [Dependencies](#dependencies)
- [Deeprank2 Package](#deeprank2-package)
- [Test installation](#test-installation)
- [Dockerfile](#dockerfile)
- [Non-pythonic dependencies](#non-pythonic-dependencies)
- [Pythonic dependencies](#pythonic-dependencies)
- [Test installation](#test-installation)
- [Contributing](#contributing)
- [Data generation](#data-generation)
- [Datasets](#datasets)
- [GraphDataset](#graphdataset)
- [GridDataset](#griddataset)
- [Training](#training)
- [Data generation](#data-generation)
- [Datasets](#datasets)
- [GraphDataset](#graphdataset)
- [GridDataset](#griddataset)
- [Training](#training)
- [Computational performances](#computational-performances)
- [Package development](#package-development)

## Installation

The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.
Note that the package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

### Dependencies
### Dockerfile
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

Before installing deeprank2 you need to install some dependencies. We advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with Python >= 3.10 installed. The following dependency installation instructions are updated as of 14/09/2023, but in case of issues during installation always refer to the official documentation which is linked below:
In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands from the root of the repository.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

Build the Docker image:
```bash
docker build -t deeprank2 .
```
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

SSH to a running container:
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

```bash
docker run -it --expose 3000 -p 3000:3000 deeprank2
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
```

Run the tutorials' notebooks from within the running container:
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
```bash
cd tutorials
jupyter notebook --ip 0.0.0.0 --no-browser --allow-root --port 3000
```
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
Now you can run the tutorials' notebook. More details about their content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.

gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
### Non-pythonic dependencies

Instructions are updated as of 14/09/2023.

Before installing deeprank2 you need to install some dependencies:
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

* [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* We support torch's CPU library as well as CUDA.
* [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
* [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
* Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
* on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
* on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread.
* [GCC](https://gcc.gnu.org/install/)
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.

### Pythonic dependencies
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

### Deeprank2 Package
Instructions are updated as of 14/09/2023.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):

```bash
# Create the environment
conda env create -f env/environment.yml
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
# Activate the environment
conda activate deeprank2
```

Alternatively, if you are a MacOS user, if the .YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation you should always refer to the official documentation which is linked below:
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

* [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).

Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:
Finally do:
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

```bash
pip install deeprank2
Expand All @@ -88,9 +127,9 @@ pip install -e .'[test]'

The `test` extra is optional, and can be used to install test-related dependencies useful during the development.

### Test installation
#### Test installation

If you have installed the package from a cloned repository (second option above), you can check that all components were installed correctly, using pytest.
If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest.
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.

Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).
Expand All @@ -103,7 +142,7 @@ The following section serves as a first guide to start using the package, using
as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
For more details, see the [extended documentation](https://deeprank2.rtfd.io/).

### Data generation
## Data generation

For each protein-protein complex (or protein structure containing a SRV), a query can be created and added to the `QueryCollection` object, to be processed later on. Different types of queries exist:
- In a `ProteinProteinInterfaceResidueQuery` and `SingleResidueVariantResidueQuery`, each node represents one amino acid residue.
Expand Down Expand Up @@ -186,11 +225,11 @@ hdf5_paths = queries.process(
grid_map_method = MapMethod.GAUSSIAN)
```

### Datasets
## Datasets

Data can be split in sets implementing custom splits according to the specific application. Assuming that the training, validation and testing ids have been chosen (keys of the HDF5 file/s), then the `DeeprankDataset` objects can be defined.

#### GraphDataset
### GraphDataset

For training GNNs the user can create a `GraphDataset` instance:

Expand Down Expand Up @@ -226,7 +265,7 @@ dataset_test = GraphDataset(
)
```

#### GridDataset
### GridDataset

For training CNNs the user can create a `GridDataset` instance:

Expand Down Expand Up @@ -260,7 +299,7 @@ dataset_test = GridDataset(
)
```

### Training
## Training

Let's define a `Trainer` instance, using for example of the already existing `GINet`. Because `GINet` is a GNN, it requires a dataset instance of type `GraphDataset`.

Expand Down
75 changes: 58 additions & 17 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,65 @@
# Installation

The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.
Note that the package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.

## Dependencies
## Dockerfile
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved

Before installing deeprank2 you need to install some dependencies. We advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with Python >= 3.10 installed. The following dependency installation instructions are updated as of 14/09/2023, but in case of issues during installation always refer to the official documentation which is linked below:
In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands from the root of the repository.

Build the Docker image:
```bash
docker build -t deeprank2 .
```

SSH to a running container:

```bash
docker run -it --expose 3000 -p 3000:3000 deeprank2
```

Run the tutorials' notebooks from within the running container:
```bash
cd tutorials
jupyter notebook --ip 0.0.0.0 --no-browser --allow-root --port 3000
```

Now you can run the tutorials' notebook. More details about their content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.

## Non-pythonic dependencies

Instructions are updated as of 14/09/2023.

Before installing deeprank2 you need to install some dependencies:

* [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* We support torch's CPU library as well as CUDA.
* [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
* [DSSP 4](https://swift.cmbi.umcn.nl/gv/dssp/)
* Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
* on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
* on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed.
* Check if `dssp` is installed: `dssp --version`. If this gives an error or shows a version lower than 4:
* on ubuntu 22.04 or newer: `sudo apt-get install dssp`. If the package cannot be located, first run `sudo apt-get update`.
* on older versions of ubuntu or on mac or lacking sudo priviliges: install from [here](https://github.com/pdb-redo/dssp), following the instructions listed. Alternatively, follow [this](https://github.com/PDB-REDO/libcifpp/issues/49) thread.
* [GCC](https://gcc.gnu.org/install/)
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.

## Pythonic dependencies

## Deeprank2 Package
Instructions are updated as of 14/09/2023.

Once the dependencies are installed, you can install the latest stable release of deeprank2 using the PyPi package manager:
Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):

```bash
# Create the environment
conda env create -f env/environment.yml
# Activate the environment
conda activate deeprank2
```

Alternatively, if you are a MacOS user, if the .YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation you should always refer to the official documentation which is linked below:

* [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).

Finally do:

```bash
pip install deeprank2
Expand All @@ -38,13 +75,17 @@ pip install -e .'[test]'

The `test` extra is optional, and can be used to install test-related dependencies useful during the development.

## Test installation
### Test installation

If you have installed the package from a cloned repository (second option above), you can check that all components were installed correctly, using pytest.
If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest.
The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.

Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).

## Contributing

If you would like to contribute to the package in any way, please see [our guidelines](CONTRIBUTING.rst).

The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries
as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
For more details, see the [extended documentation](https://deeprank2.rtfd.io/).
19 changes: 19 additions & 0 deletions env/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: deeprank2
channels:
- pytorch
- pyg
- bioconda
- defaults
dependencies:
- pip==23.3.*
- python==3.10.*
- msms==2.6.1
- pytorch==2.1.1
- pytorch-mutex==1.0.*
- torchvision==0.16.1
- torchaudio==2.1.1
- cpuonly==2.0.*
- pyg==2.4.0
- notebook==7.0.6
- pip:
- --requirement requirements.txt
6 changes: 6 additions & 0 deletions env/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
--find-links https://data.pyg.org/whl/torch-2.1.0+cpu.html
torch_scatter==2.1.2
torch_sparse==0.6.18
torch_cluster==1.6.3
torch_spline_conv==1.2.2
deeprank2==2.1.1
13 changes: 9 additions & 4 deletions tutorials/data_generation_ppi.ipynb
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,9 @@
"data_path = os.path.join(\"data_raw\", \"ppi\")\n",
"processed_data_path = os.path.join(\"data_processed\", \"ppi\")\n",
"os.makedirs(os.path.join(processed_data_path, \"residue\"))\n",
"os.makedirs(os.path.join(processed_data_path, \"atomic\"))"
"os.makedirs(os.path.join(processed_data_path, \"atomic\"))\n",
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
"# Flag limit_data as True if you are running on a machine with limited memory (e.g., Docker container)\n",
"limit_data = True"
]
},
{
Expand Down Expand Up @@ -139,7 +141,10 @@
"\tbas = csv_data_indexed.measurement_value.values.tolist()\n",
"\treturn pdb_files, bas\n",
"\n",
"pdb_files, bas = get_pdb_files_and_target_data(data_path)"
"pdb_files, bas = get_pdb_files_and_target_data(data_path)\n",
"\n",
"if limit_data:\n",
"\tpdb_files = pdb_files[:10]"
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand Down Expand Up @@ -204,7 +209,7 @@
"\tif count % 20 == 0:\n",
"\t\tprint(f'{count} queries added to the collection.')\n",
"\n",
"print(f'Queries ready to be processed.\\n')"
"print('Queries ready to be processed.\\n')"
]
},
{
Expand Down Expand Up @@ -437,7 +442,7 @@
"\tif count % 20 == 0:\n",
"\t\tprint(f'{count} queries added to the collection.')\n",
"\n",
"print(f'Queries ready to be processed.\\n')"
"print('Queries ready to be processed.\\n')"
]
},
{
Expand Down
9 changes: 7 additions & 2 deletions tutorials/data_generation_srv.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,9 @@
"data_path = os.path.join(\"data_raw\", \"srv\")\n",
"processed_data_path = os.path.join(\"data_processed\", \"srv\")\n",
"os.makedirs(os.path.join(processed_data_path, \"residue\"))\n",
"os.makedirs(os.path.join(processed_data_path, \"atomic\"))"
"os.makedirs(os.path.join(processed_data_path, \"atomic\"))\n",
"# Flag limit_data as True if you are running on a machine with limited memory (e.g., Docker container)\n",
"limit_data = True"
]
},
{
Expand Down Expand Up @@ -158,7 +160,10 @@
"\tpdb_files = [data_path + \"/pdb/\" + pdb_name for pdb_name in pdb_names]\n",
"\treturn pdb_files, res_numbers, res_wildtypes, res_variants, targets\n",
"\n",
"pdb_files, res_numbers, res_wildtypes, res_variants, targets = get_pdb_files_and_target_data(data_path)"
"pdb_files, res_numbers, res_wildtypes, res_variants, targets = get_pdb_files_and_target_data(data_path)\n",
"\n",
"if limit_data:\n",
"\tpdb_files = pdb_files[:10]"
gcroci2 marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand Down
Loading