Annotation Maker

Institut Curie - annotationMaker

Introduction

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with conda / singularity containers making installation easier and results highly reproducible.

The goal of this pipeline is to generate annotation and indexes files in a standardized way for production analysis pipelines.

Pipline summary

Processes .fasta file and generates .dict and .fai files
Generates chromosome size file
Calculates the effective genome size as the sum of non 'N' base on the genome.
Generates indexes for :

Processes GTF/GFF annotation file for downstream analysis tools

Quick help

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [gloomy_khorana] - revision: 20613d4cd4
------------------------------------------------------------------------
                           _        _   _             __  __       _  
                          | |      | | (_)           |  \/  |     | |
    __ _ _ __  _ __   ___ | |_ __ _| |_ _  ___  _ __ | .  . | __ _| | _____ _ __ 
   / _` | '_ \| '_ \ / _ \| __/ _` | __| |/ _ \| '_ \| |\/| |/ _` | |/ / _ \ '__|
  | (_| | | | | | | | (_) | || (_| | |_| | (_) | | | | |  | | (_| |   <  __/ |   
   \__,_|_| |_|_| |_|\___/ \__\__,_|\__|_|\___/|_| |_\_|  |_/\__,_|_|\_\___|_| 

  v2.0.0
------------------------------------------------------------------------
Usage:
The typical command for running the pipeline is as follows:

nextflow run main.nf --profile STRING --genome STRING -profile PROFILES

MANDATORY ARGUMENTS:

--genome  STRING   Name of the reference genome.
--profile STRING [conda, cluster, docker, multiconda, conda, path, multipath, singularity]  Configuration profile to use. Can use multiple (comma separated).

REFERENCES:
--fasta                PATH   Path to genome fasta file
--genomeAnnotationPath PATH   Path to genome annotations folder
--gff                  PATH   Path to GFF annotation file
--gtf                  PATH   Path to GTF annotation file

INDEXES:

--build   STRING   Name of the genome build
--indexes STRING [all, bwa, bwamem2, dragmap, star, bowtie2, hisat2, cellranger, kallisto, salmon, none]  Genome indexes to generate

OTHER OPTIONS:

--cellRangerPath    PATH   CellRanger path
--outDir            PATH   The output directory where the results will be saved
--skipGtfProcessing        Skip GTF processing steps
--starVersion       STRING [2.6.1b, 2.7.6a, 2.7.8a, 2.7.10a]   Version of the STAR aligned to use
			  
=======================================================
Available Profiles
  -profile test                    Run the test dataset
  -profile conda                   Build a new conda environment before running the pipeline. Use `--condaCacheDir` to define the conda cache path
  -profile multiconda              Build a new conda environment per process before running the pipeline. Use `--condaCacheDir` to define the conda cache path
  -profile path                    Use the installation path defined for all tools. Use `--globalPath` to define the insallation path
  -profile multipath               Use the installation paths defined for each tool. Use `--globalPath` to define the insallation path
  -profile docker                  Use the Docker images for each process
  -profile singularity             Use the Singularity images for each process. Use `--singularityPath` to define the insallation path
  -profile cluster                 Run the workflow on the cluster, instead of locally

Quick run

The pipeline can be run on any infrastructure from a list of input files or from a sample plan as follow

Run the pipeline on a test dataset

See the conf/test.conf to set your test dataset.

nextflow run main.nf -profile test,multiconda

Generate all annotations for a given species from an existing conda cache

echo "nextflow run main.nf --genome 'hg38' \
      -profile multiconda,cluster --condaCacheDir MY_CONDA_CACHE \
      --outDir MY_OUTPUT_DIR -w MY_OUTPUT_DIR/work" | qsub -N makeAnnot"

Generate a few indexes from already downloaded genome fasta file

echo "nextflow run main.nf --genome 'mm39' \
      --starVersion 2.7.8a --indexes star,kallisto,salmon \
      --fasta GENOME_FASTA --skipGtfProcessing \
      -profile multiconda,cluster --condaCacheDir MY_CONDA_CACHE \
      --outDir MY_OUTPUT_DIR -w MY_OUTPUT_DIR/work -resume" | qsub -N hg19

Defining the '-profile'

By default (whithout any profile), Nextflow will excute the pipeline locally, expecting that all tools are available from your PATH variable.

In addition, we set up a few profiles that should allow you i/ to use containers instead of local installation, ii/ to run the pipeline on a cluster instead of on a local architecture. The description of each profile is available on the help message (see above).

Here are a few examples of how to set the profile option.

## Run the pipeline locally, using the paths defined in the configuration for each tool (see conf.tool-path.config)
-profile path --globalPath 'PATH_TO_BINARY'

## Run the pipeline on the cluster, using the Singularity containers
-profile cluster,singularity

## Run the pipeline on the cluster, building a new conda environment
-profile cluster,conda

Sample ID | Sample Name | Path R1 .fastq file | [Path R2 .fastq file]

Full Documentation

Credits

This pipeline has been written by the bioinformatics core facility of the Institut Curie.

Citation

If you use this pipeline for your project, please cite it using the following doi: 10.5281/zenodo.7515673.
Do not hesitate to use the Zenodo doi corresponding to the version you used !

Contacts

For any question, bug or suggestion, please use the issues system or contact the bioinformatics core facility.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
assets		assets
bin		bin
conf		conf
docs		docs
geniac @ 0d0b70e		geniac @ 0d0b70e
lib		lib
nf-modules		nf-modules
recipes		recipes
test/data		test/data
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
parameters.settings.json		parameters.settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Annotation Maker

Introduction

Pipline summary

Quick help

Quick run

Run the pipeline on a test dataset

Generate all annotations for a given species from an existing conda cache

Generate a few indexes from already downloaded genome fasta file

Defining the '-profile'

Full Documentation

Credits

Citation

Contacts

About

Releases 1

Packages

Contributors 4

Languages

License

bioinfo-pf-curie/annotationMaker

Folders and files

Latest commit

History

Repository files navigation

Annotation Maker

Introduction

Pipline summary

Quick help

Quick run

Run the pipeline on a test dataset

Generate all annotations for a given species from an existing conda cache

Generate a few indexes from already downloaded genome fasta file

Defining the '-profile'

Full Documentation

Credits

Citation

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages