There are many ways to contribute to singlecellopenproblems
, with the most common ones
being contribution of datasets, methods, metrics, or new tasks. Improving the
documentation is no less important than improving the library itself. If you
find a typo in the documentation, or have made improvements, do not hesitate to
submit a GitHub pull request.
But there are many other ways to help. In particular answering queries on the issue tracker, investigating bugs, and reviewing other developers' pull requests are very valuable contributions that decrease the burden on the project maintainers.
Another way to contribute is to report issues you're facing, and give a "thumbs up" on issues that others reported and that are relevant to you. It also helps us if you spread the word: reference the project from your blog and articles, link to it from your website, or simply star it in GitHub to say "I use it".
To submit new features to Open Problems for Single Cell Analysis, follow the steps below:
-
Search through the GitHub Issues tracker to make sure there isn't someone already working on the feature you'd like to add. If someone is working on this, post in that issue that you'd like to help or reach out to one of the contributors working on the issue directly.
-
If there isn't an existing issue tracking this feature, create one! There are several templates you can choose one depending on what type of feature you'd like to add.
-
Fork https://github.com/singlecellopenproblems/SingleCellOpenProblems into your account. If you're new to
git
, you might find the Fork a repo documentation helpful. -
Create repository secrets (not environment secrets) at https://github.com/USERNAME/SingleCellOpenProblems/settings/secrets
- AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are included in your AWS login details. If you do not have these, please contact us at singlecellopenproblems@protonmail.com.
- TOWER_ACCESS_KEY (optional): log in with GitHub to https://tower.nf and create a token at https://tower.nf/tokens.
- When you are done with this step, your page should look like this:
-
Enable workflows at https://github.com/USERNAME/SingleCellOpenProblems/actions
-
Set up your git repository to fetch branches from
base
atsinglecellopenproblems/SingleCellOpenProblems
git clone git@github.com:<username>/SingleCellOpenProblems.git cd SingleCellOpenProblems git remote add base git@github.com:singlecellopenproblems/SingleCellOpenProblems.git git fetch --all git branch --set-upstream-to base/master git pull
Note: If you haven't set up SSH keys with your GitHub account, you may see this error message when adding
base
to your repository:git@github.com: Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights
To generate an SSH key and add it to your GitHub account, follow this tutorial from GitHub.
-
Create a new branch for your task (no underscores or spaces allowed). It is best to coordinate with other people working on the same feature as you so that there aren't clases in images uploaded to our ECR. Here we're creating a branch called
method-method-name-task-name
, but if you were creating a new metric you might usemetric-metric-name-task-name
. In practice you should actually use the name of your method or metric, likemethod-meld-differential-abundance
ormetric-mse-label-projection
.Note: This pushes the branch to your fork, not to
base
. You will create a PR to merge your branch tobase
only after all tests are passing.Warning: Do not edit the
master
branch on your fork! This will not work as expected, and will never pass tests.# IMPORTANT: choose a new branch name, e.g. git checkout -b method-method-name-task-name # or metric-new-metric-name, etc git push -u origin method-method-name-task-name
-
Sometimes, changes might be made to the SingleCellOpenProblems
base
repository that you want to incorporate into your fork. To sync your fork frombase
, use the following code adapted from the Syncing a Fork tutorial from GitHub.# Fetch the branches and their respective commits from the upstream repository git fetch base # Check out your fork's local default branch git checkout master # Merge the changes from the upstream default branch git merge base/master # Push the changes to your fork git push -u origin
You can now create a Pull Request from the default branch on your fork,
master
, into your working branch, e.g.method-method-name-task-name
. -
Wait for all tests to pass on your new branch before pushing changes (as this will allow GitHub Actions to cache the workflow setup, which speeds up testing.)
Each task consists of datasets, methods, and metrics.
Datasets should take no arguments and return an AnnData object. If test is True
, then the method should load the full dataset, but only return a small version of the same data (preferably <200 cells and <500 genes) for faster downstream analysis.
function dataset(bool test=False) -> AnnData adata
Methods should take an AnnData object and store the output in adata.obs
according to the specification of the task.
function method(AnnData adata) -> AnnData adata
Metrics should take an AnnData object and return a float
.
function metric(AnnData adata) -> float
Task-specific APIs are described in the README for each task.
Metrics and methods can also be written in R. AnnData Python objects are converted to and from SingleCellExperiment
R objects using anndata2ri
. R methods should be written in a .R
file which assumes the existence of a SingleCellExperiment
object called sce
, and should return the same object. A simple method implemented in R could be written as follows:
### tasks/<task_name>/methods/pca.R
# Dependencies
library(SingleCellExperiment)
library(stats)
# Method body
n_pca <- 10
counts <- t(assay(sce, "X"))
reducedDim(sce, "pca") <- prcomp(counts)[,1:n_pca]
# Return
sce
### tasks/<task_name>/methods/pca.py
from ....tools.conversion import r_function
from ....tools.decorators import method
from ....tools.utils import check_version
_pca = r_function("pca.R")
@method(
method_name="PCA",
paper_name="On lines and planes of closest fit to systems of points in space",
paper_url="https://www.tandfonline.com/doi/abs/10.1080/14786440109462720",
paper_year=1901,
code_url="https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp",
code_version=check_version("rpy2"),
image="openproblems-r-base",
)
def pca(adata):
return _pca(adata)
See the anndata2ri
docs for API details. For a more detailed example of how to use this functionality, see our implementation of fastMNN batch correction (mnn.R, mnn.py).
If you are unable to write your method using our base dependencies, you may add to our existing Docker images, or create your own. The image you wish to use (if you are not using the base image) should be specified in the image
keyword argument of the method/metric decorator. See the Docker images README for details.
Datasets are loaded under openproblems/data
. Each data loading function should download the appropriate dataset from a stable location (e.g. from Figshare) be decorated with openproblems.data.utils.loader
in order to cache the result.
To add a dataset, method, or metric to a task, simply create a new .py
file corresponding to your proposed new functionality and import the main function in the corresponding __init__.py
. E.g., to add a "F2" metric to the label projection task, we would create openproblems/tasks/label_projection/metrics/f2.py
and add a line
from .f2 import f2
to openproblems/tasks/label_projection/metrics/__init__.py
.
For datasets in particular, these should be loaded using a loader
function from openproblems.data
, with only task-specific annotations added in the task-specific data file.
For methods and metrics, they should be decorated with the appropriate function in openproblems.tools.decorators
to include metadata required for the evaluation and presentation of results.
Note that data is not normalized in the data loader; normalization should be performed as part of each method. For ease of use, we provide a collection of common normalization functions in openproblems.tools.normalize
.
The task directory structure is as follows
opensproblems/
- tasks/
- task_name/
- README.md
- __init__.py
- api.py
- datasets/
- __init__.py
- dataset1.py
- ...
- methods/
- __init__.py
- method1.py
- ...
- metrics/
- __init__.py
- metric1.py
- ...
task_name/__init__.py
can be copied from an existing task.
api.py
should implement the following functions:
check_dataset(AnnData adata) -> bool # checks that a dataset fits the task-specific schema
check_method(AnnData adata) -> bool # checks that the output from a method fits the task-specific schema
sample_dataset() -> AnnData adata # generates a simple dataset the fits the expected API
sample_method(AnnData adata) -> AnnData adata # applies a simple modification that fits the method API
For adding datasets, methods and metrics, see above.
Datasets, methods and metrics run inside Docker containers. We provide a few to start with, but if you have specific requirements, you may need to modify an existing image or add a new one. See the Docker README for details.
singlecellopenproblems
is maintained at close to 100% code coverage. For datasets, methods, and metrics, tests are generated automatically. For additions outside this core functionality, contributors are encouraged to write tests for their code -- but if you do not know how to do so, please do not feel discouraged from contributing code! Others can always help you test your contribution.
Code is tested by GitHub Actions when you push your changes. However, if you wish to test locally, you can do so with the following command:
cd SingleCellOpenProblems
pip install --editable .[test,r]
pytest -v
The test suite also requires Python>=3.7, R>=4.0, and Docker to be installed.
The benchmarking suite is tested by GitHub Actions when you push your changes. However, if you wish to test locally, you can do so with the following command:
cd workflow
snakemake -j 4 docker
cd ..
nextflow run -resume -profile test,docker singlecellopenproblems/nf-openproblems
The benchmarking suite also requires Python>=3.7, snakemake, nextflow, and Docker to be installed.
Code style is dictated by black
and flake8
with hacking
. Code is automatically reformatted by pre-commit
when you push to GitHub.
We abide by the principles of openness, respect, and consideration of others of the Python Software Foundation: https://www.python.org/psf/codeofconduct/ and by the Contributor Covenant. See our Code of Conduct for details.
This CONTRIBUTING.md
was adapted from scikit-learn.