Institut Curie - annotationMaker
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with conda / singularity containers making installation easier and results highly reproducible.
The goal of this pipeline is to generate annotation and indexes files in a standardized way for production analysis pipelines.
- Processes
.fasta
file and generates.dict
and.fai
files - Generates chromosome size file
- Calculates the effective genome size as the sum of non 'N' base on the genome.
- Generates indexes for :
- Processes GTF/GFF annotation file for downstream analysis tools
N E X T F L O W ~ version 21.10.6
Launching `main.nf` [gloomy_khorana] - revision: 20613d4cd4
------------------------------------------------------------------------
_ _ _ __ __ _
| | | | (_) | \/ | | |
__ _ _ __ _ __ ___ | |_ __ _| |_ _ ___ _ __ | . . | __ _| | _____ _ __
/ _` | '_ \| '_ \ / _ \| __/ _` | __| |/ _ \| '_ \| |\/| |/ _` | |/ / _ \ '__|
| (_| | | | | | | | (_) | || (_| | |_| | (_) | | | | | | | (_| | < __/ |
\__,_|_| |_|_| |_|\___/ \__\__,_|\__|_|\___/|_| |_\_| |_/\__,_|_|\_\___|_|
v2.0.0
------------------------------------------------------------------------
Usage:
The typical command for running the pipeline is as follows:
nextflow run main.nf --profile STRING --genome STRING -profile PROFILES
MANDATORY ARGUMENTS:
--genome STRING Name of the reference genome.
--profile STRING [conda, cluster, docker, multiconda, conda, path, multipath, singularity] Configuration profile to use. Can use multiple (comma separated).
REFERENCES:
--fasta PATH Path to genome fasta file
--genomeAnnotationPath PATH Path to genome annotations folder
--gff PATH Path to GFF annotation file
--gtf PATH Path to GTF annotation file
INDEXES:
--build STRING Name of the genome build
--indexes STRING [all, bwa, bwamem2, dragmap, star, bowtie2, hisat2, cellranger, kallisto, salmon, none] Genome indexes to generate
OTHER OPTIONS:
--cellRangerPath PATH CellRanger path
--outDir PATH The output directory where the results will be saved
--skipGtfProcessing Skip GTF processing steps
--starVersion STRING [2.6.1b, 2.7.6a, 2.7.8a, 2.7.10a] Version of the STAR aligned to use
=======================================================
Available Profiles
-profile test Run the test dataset
-profile conda Build a new conda environment before running the pipeline. Use `--condaCacheDir` to define the conda cache path
-profile multiconda Build a new conda environment per process before running the pipeline. Use `--condaCacheDir` to define the conda cache path
-profile path Use the installation path defined for all tools. Use `--globalPath` to define the insallation path
-profile multipath Use the installation paths defined for each tool. Use `--globalPath` to define the insallation path
-profile docker Use the Docker images for each process
-profile singularity Use the Singularity images for each process. Use `--singularityPath` to define the insallation path
-profile cluster Run the workflow on the cluster, instead of locally
The pipeline can be run on any infrastructure from a list of input files or from a sample plan as follow
See the conf/test.conf to set your test dataset.
nextflow run main.nf -profile test,multiconda
echo "nextflow run main.nf --genome 'hg38' \
-profile multiconda,cluster --condaCacheDir MY_CONDA_CACHE \
--outDir MY_OUTPUT_DIR -w MY_OUTPUT_DIR/work" | qsub -N makeAnnot"
echo "nextflow run main.nf --genome 'mm39' \
--starVersion 2.7.8a --indexes star,kallisto,salmon \
--fasta GENOME_FASTA --skipGtfProcessing \
-profile multiconda,cluster --condaCacheDir MY_CONDA_CACHE \
--outDir MY_OUTPUT_DIR -w MY_OUTPUT_DIR/work -resume" | qsub -N hg19
By default (whithout any profile), Nextflow will excute the pipeline locally, expecting that all tools are available from your PATH
variable.
In addition, we set up a few profiles that should allow you i/ to use containers instead of local installation, ii/ to run the pipeline on a cluster instead of on a local architecture. The description of each profile is available on the help message (see above).
Here are a few examples of how to set the profile option.
## Run the pipeline locally, using the paths defined in the configuration for each tool (see conf.tool-path.config)
-profile path --globalPath 'PATH_TO_BINARY'
## Run the pipeline on the cluster, using the Singularity containers
-profile cluster,singularity
## Run the pipeline on the cluster, building a new conda environment
-profile cluster,conda
Sample ID | Sample Name | Path R1 .fastq file | [Path R2 .fastq file]
- Installation
- Reference genomes
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
This pipeline has been written by the bioinformatics core facility of the Institut Curie.
If you use this pipeline for your project, please cite it using the following doi: 10.5281/zenodo.7515673.
Do not hesitate to use the Zenodo doi corresponding to the version you used !
For any question, bug or suggestion, please use the issues system or contact the bioinformatics core facility.