Institut Curie - annotationMaker
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with conda / singularity containers making installation easier and results highly reproducible.
The goal of this pipeline is to generate annotation and indexes files in a standardized way for production analysis pipelines.
- Processes
file and generates.dict
files - Generates chromosome size file
- Calculates the effective genome size as the sum of non 'N' base on the genome.
- Generates indexes for :
- Processes GTF/GFF annotation file for downstream analysis tools
N E X T F L O W ~ version 21.10.6
Launching `` [gloomy_khorana] - revision: 20613d4cd4
_ _ _ __ __ _
| | | | (_) | \/ | | |
__ _ _ __ _ __ ___ | |_ __ _| |_ _ ___ _ __ | . . | __ _| | _____ _ __
/ _` | '_ \| '_ \ / _ \| __/ _` | __| |/ _ \| '_ \| |\/| |/ _` | |/ / _ \ '__|
| (_| | | | | | | | (_) | || (_| | |_| | (_) | | | | | | | (_| | < __/ |
\__,_|_| |_|_| |_|\___/ \__\__,_|\__|_|\___/|_| |_\_| |_/\__,_|_|\_\___|_|
The typical command for running the pipeline is as follows:
nextflow run --profile STRING --genome STRING -profile PROFILES
--genome STRING Name of the reference genome.
--profile STRING [conda, cluster, docker, multiconda, conda, path, multipath, singularity] Configuration profile to use. Can use multiple (comma separated).
--fasta PATH Path to genome fasta file
--genomeAnnotationPath PATH Path to genome annotations folder
--gff PATH Path to GFF annotation file
--gtf PATH Path to GTF annotation file
--build STRING Name of the genome build
--indexes STRING [all, bwa, bwamem2, dragmap, star, bowtie2, hisat2, cellranger, kallisto, salmon, none] Genome indexes to generate
--cellRangerPath PATH CellRanger path
--outDir PATH The output directory where the results will be saved
--skipGtfProcessing Skip GTF processing steps
--starVersion STRING [2.6.1b, 2.7.6a, 2.7.8a, 2.7.10a] Version of the STAR aligned to use
Available Profiles
-profile test Run the test dataset
-profile conda Build a new conda environment before running the pipeline. Use `--condaCacheDir` to define the conda cache path
-profile multiconda Build a new conda environment per process before running the pipeline. Use `--condaCacheDir` to define the conda cache path
-profile path Use the installation path defined for all tools. Use `--globalPath` to define the insallation path
-profile multipath Use the installation paths defined for each tool. Use `--globalPath` to define the insallation path
-profile docker Use the Docker images for each process
-profile singularity Use the Singularity images for each process. Use `--singularityPath` to define the insallation path
-profile cluster Run the workflow on the cluster, instead of locally
The pipeline can be run on any infrastructure from a list of input files or from a sample plan as follow
See the conf/test.conf to set your test dataset.
nextflow run -profile test,multiconda
echo "nextflow run --genome 'hg38' \
-profile multiconda,cluster --condaCacheDir MY_CONDA_CACHE \
--outDir MY_OUTPUT_DIR -w MY_OUTPUT_DIR/work" | qsub -N makeAnnot"
echo "nextflow run --genome 'mm39' \
--starVersion 2.7.8a --indexes star,kallisto,salmon \
--fasta GENOME_FASTA --skipGtfProcessing \
-profile multiconda,cluster --condaCacheDir MY_CONDA_CACHE \
--outDir MY_OUTPUT_DIR -w MY_OUTPUT_DIR/work -resume" | qsub -N hg19
By default (whithout any profile), Nextflow will excute the pipeline locally, expecting that all tools are available from your PATH
In addition, we set up a few profiles that should allow you i/ to use containers instead of local installation, ii/ to run the pipeline on a cluster instead of on a local architecture. The description of each profile is available on the help message (see above).
Here are a few examples of how to set the profile option.
## Run the pipeline locally, using the paths defined in the configuration for each tool (see conf.tool-path.config)
-profile path --globalPath 'PATH_TO_BINARY'
## Run the pipeline on the cluster, using the Singularity containers
-profile cluster,singularity
## Run the pipeline on the cluster, building a new conda environment
-profile cluster,conda
This pipeline has been written by the bioinformatics core facility of the Institut Curie.
If you use this pipeline for your project, please cite it using the following doi: 10.5281/zenodo.7515673.
Do not hesitate to use the Zenodo doi corresponding to the version you used !
For any question, bug or suggestion, please use the issues system or contact the bioinformatics core facility.