Skip to content

Commit

Permalink
suggestions to README
Browse files Browse the repository at this point in the history
  • Loading branch information
DaniBodor committed Jan 24, 2024
1 parent eb6acf8 commit 486a8a1
Show file tree
Hide file tree
Showing 2 changed files with 100 additions and 94 deletions.
103 changes: 54 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,52 +54,54 @@ DeepRank2 extensive documentation can be found [here](https://deeprank2.rtfd.io/
- [Computational performances](#computational-performances)
- [Package development](#package-development)

## Installations
## Installation

The package officially supports ubuntu-latest OS only, whose functioning is widely tested through the continuous integration workflows.
There are two ways to install DeepRank2:

You can either install DeepRank2 in a [dockerized container](#containerized-installation), which will allow you to run our [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials), or you can [install the package locally](#localremote-installation).
1. In a [dockerized container](#containerized-installation). This allows you to use DeepRank2, including all the notebooks within the container (a protected virtual space), without worrying about your operating system or installation of dependencies.
- We recommend this installation for inexperienced users and to learn to use or test our software, e.g. using the provided [tutorials](tutorials/TUTORIAL.md). However, resources might be limited in this installation and we would not recommend using it for large datasets or on high-performance computing facilities.
2. [Local installation](#localremote-installation) on your system. This allows you to use the full potential of DeepRank2, but requires a few additional steps during installation.
- We recommend this installation for more experienced users, for larger projects, and for (potential) [contributors](#contributing) to the codebase.

### Containerized Installation
### Containerized Installation

In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container. After having cloned the repository and installed [Docker](https://docs.docker.com/engine/install/), run the following commands (you may need to have sudo permission) from the root of the repository.
In order to try out the package without worrying about your OS and without the need of installing all the required dependencies, we created a `Dockerfile` that can be used for taking care of everything in a suitable container.

Build the Docker image:
For this, you first need to install [Docker](https://docs.docker.com/engine/install/) on your system. Then run the following commands. You may need to have sudo permission for some steps, in which case the commands below can be preceded by `sudo`:

```bash
docker build -t deeprank2 .
```

Run the Docker container:
# Clone the DeepRank2 repository and enter its root directory
git clone https://github.com/DeepRank/deeprank2
cd deeprank2

```bash
# Build and run the Docker image
docker build -t deeprank2 .
docker run -p 8888:8888 deeprank2
```

This assumes that your application inside the container is listening on port 8888, and you want to map it to port 8888 on your host machine. Open a browser and go to `http://localhost:8888` to access the application running inside the Docker container and run the tutorials' notebooks.
Next, open a browser and go to `http://localhost:8888` to access the application running inside the Docker container. From there you can use DeepRank2, e.g. to run the tutorial notebooks.

More details about the tutorials' content can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials'. Please install the package locally to fully leverage its capabilities.
More details about the tutorials' contents can be found [here](https://github.com/DeepRank/deeprank2/blob/main/tutorials/TUTORIAL.md). Note that in the docker container only the raw PDB files are downloaded, which needed as a starting point for the tutorials. You can obtain the processed HDF5 files by running the `data_generation_xxx.ipynb` notebooks. Because Docker containers are limited in memory resources, we limit the number of data points processed in the tutorials. Please [install the package locally](#localremote-installation) to fully leverage its capabilities.

After running the tutorials, you may want to remove the (quite large) Docker image from your machine. In this case, remember to [stop the container](https://docs.docker.com/engine/reference/commandline/stop/) and then [remove the image](https://docs.docker.com/engine/reference/commandline/image_rm/). More general information about Docker can be found on the [official website docs](https://docs.docker.com/get-started/).
If after running the tutorials you want to remove the (quite large) Docker image from your machine, you must first [stop the container](https://docs.docker.com/engine/reference/commandline/stop/) and can then [remove the image](https://docs.docker.com/engine/reference/commandline/image_rm/). More general information about Docker can be found on the [official website docs](https://docs.docker.com/get-started/).

### Local/remote installation

#### Non-pythonic dependencies

Instructions are up to date as of 19 Jan 2024.

Before installing DeepRank2 you need to install some dependencies:
Local installation is formally only supported on the latest stable release of ubuntu, for which widespread automated testing through continuous integration workflows has been set up. However, it is likely that the package runs smoothly on other operating systems as well.

* [GCC](https://gcc.gnu.org/install/)
* Check if gcc is installed: `gcc --version`. If this gives an error, run `sudo apt-get install gcc`.
Before installing DeepRank2 please ensure you have [GCC](https://gcc.gnu.org/install/) installed: if running `gcc --version` gives an error, run `sudo apt-get install gcc`.

#### Pythonic dependencies
#### Using the provided YML file

Instructions are up to date as of 19 Jan 2024.

Then, you can use the YML file we provide for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of the package and all the other necessary conda and pip dependencies (CPU only, Python 3.10):
You can use the provided YML file for creating a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) containing the latest stable release of DeepRank2 and all its dependencies.
This will install the CPU-only version of DeepRank2 on Python 3.10.
Note that this will not work for MacOS. Do the [Manual Installation](#manual-installation) instead

```bash
# Clone the DeepRank2 repository and enter its root directory
git clone https://github.com/DeepRank/deeprank2
cd deeprank2

# Ensure you are in your base environment
conda activate
# Create the environment
Expand All @@ -108,24 +110,24 @@ conda env create -f env/environment.yml
conda activate deeprank2
```

Alternatively, if you are a MacOS user, if the YML file installation is not successfull, or if you want to use CUDA or Python 3.11, you can install each dependency separately, and then the latest stable release of the package using the PyPi package manager. Also in this case, we advise to use a [conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). In case of issues during installation, please refer to the official documentation for each package (linked below), as our instructions may be out of date:
See instructions below to [test](#testing-your-deeprank2-installation) that the installation was succesful.

* [DSSP 4](https://anaconda.org/sbl/dssp): `conda install -c sbl dssp`.
* [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`.
* [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
* [PyTorch](https://pytorch.org/get-started/locally/)
* We support torch's CPU library as well as CUDA.
* Currently, the package is tested using [PyTorch 2.0.1](https://pytorch.org/get-started/previous-versions/#v201).
* [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
* For MacOS with M1 chip users only install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).
#### Manual installation

#### Deeprank2 Package
If you want to use the GPUs, choose a specific python version, are a MacOS user, or if the YML installation was not succesful, you can install the package manually. We advise to do this inside a [conda virtual environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
If you have any issues during installation of dependencies, please refer to the official documentation for each package (linked below), as our instructions may be out of date (last tested on 19 Jan 2024):

Finally do:
- [DSSP 4](https://anaconda.org/sbl/dssp): `conda install -c sbl dssp`
- [MSMS](https://anaconda.org/bioconda/msms): `conda install -c bioconda msms`
- [Here](https://ssbio.readthedocs.io/en/latest/instructions/msms.html) for MacOS with M1 chip users.
- [PyTorch](https://pytorch.org/get-started/locally/): `conda install pytorch torchvision torchaudio cpuonly -c pytorch`
- Pytorch regularly publishes updates and not all newest versions will work stably with DeepRank2. Currently, the package is tested using [PyTorch 2.1.1](https://pytorch.org/get-started/previous-versions/#v211).
- We support torch's CPU library as well as CUDA.
- [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) and its optional dependencies: `torch_scatter`, `torch_sparse`, `torch_cluster`, `torch_spline_conv`.
- The exact command to install pyg will depend on the versiobn of pytorch you are using. Please refer to the source's installation instructions (we recommend using the pip installation for this as it also shows the command for the dependencies).
- For MacOS with M1 chip users: install [the conda version of PyTables](https://www.pytables.org/usersguide/installation.html).

```bash
pip install deeprank2
```
Finally install deeprank2 itself: `pip install deeprank2`.

Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:

Expand All @@ -135,23 +137,26 @@ cd deeprank2
pip install -e .'[test]'
```

The `test` extra is optional, and can be used to install test-related dependencies useful during the development.
The `test` extra is optional, and can be used to install test-related dependencies, useful during development.

#### Test installation
#### Testing your DeepRank2 installation

You can check that all components were installed correctly, using pytest. We especially recommend doing this in case you installed DeepRank2 and its dependencies manually (the latter option above).

If you have installed the package from a cloned repository (the latter option above), you can check that all components were installed correctly, using pytest (run `pip install pytest` if you did not install it above).
The quick test should be sufficient to ensure that the software works, while the full test (a few minutes) will cover a much broader range of settings to ensure everything is correct.

Run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).
First run `pip install pytest`, if you did not install it above. Then run `pytest tests/test_integration.py` for the quick test or just `pytest` for the full test (expect a few minutes to run).

## Contributing

If you would like to contribute to the package in any way, please see [our guidelines](CONTRIBUTING.rst).

## Using DeepRank2

The following section serves as a first guide to start using the package, using protein-protein Interface (PPI) queries as example. For an enhanced learning experience, we provide in-depth [tutorial notebooks](https://github.com/DeepRank/deeprank2/tree/main/tutorials) for generating PPI data, generating SVR data, and for the training pipeline.
For more details, see the [extended documentation](https://deeprank2.rtfd.io/).

## Data generation
### Data generation

For each protein-protein complex (or protein structure containing a missense variant), a `Query` can be created and added to the `QueryCollection` object, to be processed later on. Two subtypes of `Query` exist: `ProteinProteinInterfaceQuery` and `SingleResidueVariantQuery`.

Expand Down Expand Up @@ -234,11 +239,11 @@ hdf5_paths = queries.process(
grid_map_method = MapMethod.GAUSSIAN)
```

## Datasets
### Datasets

Data can be split in sets implementing custom splits according to the specific application. Assuming that the training, validation and testing ids have been chosen (keys of the HDF5 file/s), then the `DeeprankDataset` objects can be defined.

### GraphDataset
#### GraphDataset

For training GNNs the user can create a `GraphDataset` instance:

Expand Down Expand Up @@ -272,7 +277,7 @@ dataset_test = GraphDataset(
)
```

### GridDataset
#### GridDataset

For training CNNs the user can create a `GridDataset` instance:

Expand Down Expand Up @@ -304,7 +309,7 @@ dataset_test = GridDataset(
)
```

## Training
### Training

Let's define a `Trainer` instance, using for example of the already existing `GINet`. Because `GINet` is a GNN, it requires a dataset instance of type `GraphDataset`.

Expand Down Expand Up @@ -358,7 +363,7 @@ trainer.test()

```

### Run a pre-trained model on new data
#### Run a pre-trained model on new data

If you want to analyze new PDB files using a pre-trained model, the first step is to process and save them into HDF5 files [as we have done above](#data-generation).

Expand Down
Loading

0 comments on commit 486a8a1

Please sign in to comment.