From a49ff6e417321b0b0b08be7a24ca27ac9863590e Mon Sep 17 00:00:00 2001 From: gcroci2 Date: Tue, 5 Mar 2024 11:17:31 +0100 Subject: [PATCH] add explaination about the queries to the readme --- README.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 7fddb53f..310e93fe 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,12 @@ DeepRank2 is an open-source deep learning (DL) framework for data mining of protein-protein interfaces (PPIs) or single-residue variants (SRVs). This package is an improved and unified version of three previously developed packages: [DeepRank](https://github.com/DeepRank/deeprank), [DeepRank-GNN](https://github.com/DeepRank/Deeprank-GNN), and [DeepRank-Mut](https://github.com/DeepRank/DeepRank-Mut). -DeepRank2 allows for transformation of (pdb formatted) molecular data into 3D representations (either grids or graphs) containing structural and physico-chemical information, which can be used for training neural networks. DeepRank2 also offers a pre-implemented training pipeline, using either [CNNs](https://en.wikipedia.org/wiki/Convolutional_neural_network) (for grids) or [GNNs](https://en.wikipedia.org/wiki/Graph_neural_network) (for graphs), as well as output exporters for evaluating performances. +As input, DeepRank2 takes [PDB-formatted](https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html) atomic structures, and map them to graphs, where nodes can represent either residues or atoms, as chosen by the user, and edges represent the interactions between them. The user can configure two types of 3D structures as input for the featurization phase: + +- PPIs, for mining interaction patterns within protein-protein complexes, implemented by the `ProteinProteinInterfaceQuery` class; +- SRVs, for mining mutation phenotypes within protein structures, implemented by the `SingleResidueVariantQuery` class. + +The physico-chemical and geometrical features are then computed and assigned to each node and edge. The user can choose which features to generate from several pre-existing options defined in the package, or define custom features modules, as explained in the documentation. The graphs can then be mapped to 3D-grids as well. The generated data can be used for training neural networks. DeepRank2 also offers a pre-implemented training pipeline, using either [CNNs](https://en.wikipedia.org/wiki/Convolutional_neural_network) (for 3D-grids) or [GNNs](https://en.wikipedia.org/wiki/Graph_neural_network) (for graphs), as well as output exporters for evaluating performances. Main features: @@ -28,7 +33,7 @@ Main features: - binary class, CAPRI categories, DockQ, RMSD, and FNAT - Detailed docking scores documentation is available [here](https://deeprank2.readthedocs.io/en/latest/docking.html) - Flexible definition of both new features and targets -- Features generation for both graphs and grids +- Features generation for both graphs and 3D-grids - Efficient data storage in HDF5 format - Support for both classification and regression (based on [PyTorch](https://pytorch.org/) and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/)) @@ -172,7 +177,7 @@ A `Query` takes as inputs: - `chain_ids`, the chain ID or IDs (generally single capital letter(s)). - `SingleResidueVariantQuery` takes a single ID, which represents the chain containing the variant residue. - `ProteinProteinInterfaceQuery` takes a pair of ids, which represent the chains between which the interface exists. - - Note that in either case this does not limit the structure to residues from this/these chain/s. The structure contained in the `.pdb` can thus have any number of chains, and residues from these chains will be included in the graphs and grids produced by DeepRank2 (if they are within the `influence_radius`). + - Note that in either case this does not limit the structure to residues from this/these chain/s. The structure contained in the `.pdb` can thus have any number of chains, and residues from these chains will be included in the graphs and 3D-grids produced by DeepRank2 (if they are within the `influence_radius`). - Optionally, the correspondent position-specific scoring matrices (PSSMs), in the form of `.pssm` files. ```python @@ -222,7 +227,7 @@ queries.add(ProteinProteinInterfaceQuery( The user is free to implement a custom query class. Each implementation requires the `build` method to be present. -The queries can then be processed into graphs only or both graphs and 3D grids, depending on which kind of network will be used later for training. +The queries can then be processed into graphs only or both graphs and 3D-grids, depending on which kind of network will be used later for training. ```python from deeprank2.features import components, conservation, contact, exposure, irc, surfacearea @@ -235,7 +240,7 @@ hdf5_paths = queries.process( "/", feature_modules = feature_modules) -# Save data into 3D-graphs and 3D-grids +# Save data into graphs and 3D-grids hdf5_paths = queries.process( "/", feature_modules = feature_modules,