The pipeline is built using Nextflow, a workflow manager to run tasks across multiple compute infrastructures in a very portable manner. It supports conda package manager and singularity / Docker containers making installation easier and results highly reproducible.
nextflow run main.nf --help
N E X T F L O W ~ version 19.10.0
Launching `main.nf` [stupefied_darwin] - revision: aa905ab621
=======================================================
Usage:
Mandatory arguments:
--reads [file] Path to input data (must be surrounded with quotes)
--samplePlan [file] Path to sample plan file if '--reads' is not specified
--genome [str] Name of the reference genome. See the `--genomeAnnotationPath` to defined the annotation path
-profile [str] Configuration profile to use (multiple profiles can be specified with comma separated values)
Inputs:
--design [file] Path to design file for extended analysis
--singleEnd [bool] Specifies that the input is single-end reads
Skip options: All are false by default
--skipSoftVersion [bool] Do not report software version
--skipMultiQC [bool] Skip MultiQC
Other options:
--metadata [dir] Add metadata file for multiQC report
--outDir [dir] The output directory where the results will be saved
-w/--work-dir [dir] The temporary directory where intermediate data will be saved
-name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic
=======================================================
Available profiles
-profile test Run the test dataset
-profile conda Build a new conda environment before running the pipeline. Use `--condaCacheDir` to define the conda cache path
-profile multiconda Build a new conda environment per process before running the pipeline. Use `--condaCacheDir` to define the conda cache path
-profile path Use the installation path defined for all tools. Use `--globalPath` to define the installation path
-profile multipath Use the installation paths defined for each tool. Use `--globalPath` to define the installation path
-profile docker Use the Docker images for each process
-profile singularity Use the Singularity images for each process. Use `--singularityPath` to define the path of the singularity containers
-profile cluster Run the workflow on the cluster, instead of locally
The pipeline can be run on any infrastructure from a list of input files or from a sample plan as follows:
See the file conf/test.config
to set your test dataset.
nextflow run main.nf -profile test,conda
nextflow run main.nf --samplePlan mySamplePlan.csv --design myDesign.csv --genome 'hg19' --genomeAnnotationPath /my/annotation/path --outDir /my/output/dir
By default (whithout any profile), Nextflow excutes the pipeline locally, expecting that all tools are available from your PATH
environment variable.
In addition, several Nextflow profiles are available that allow:
- the use of conda or containers instead of a local installation,
- the submission of the pipeline on a cluster instead of on a local architecture.
The description of each profile is available on the help message (see above).
Here are a few examples to set the profile options:
Run the pipeline locally, using a global environment where all tools are installed (build by conda for instance)
-profile path --globalPath /my/path/to/bioinformatics/tools
-profile cluster,singularity --singularityPath /my/path/to/singularity/containers
-profile cluster,conda --condaCacheDir /my/path/to/condaCacheDir
For details about the different profiles available, see Profiles.
A sample plan is a csv file (comma separated) that lists all the samples with a biological IDs. The sample plan is expected to contain the following fields (with no header):
SAMPLE_ID,SAMPLE_NAME,path/to/R1/fastq/file,path/to/R2/fastq/file (for paired-end only)
A design file is a csv file that provides additional details on the samples and how they should be processed. Here is a simple example:
SAMPLEID,CONTROLID,GROUP
A949C08,A949C02,1
...
The pipeline does not provide any genomic annotations but expects them to be already available on your system. The path to the genomic annotations can be set with the --genomeAnnotationPath
option as follows:
nextflow run main.nf --samplePlan mySamplePlan.csv --design myDesign.csv --genome 'hg19' --genomeAnnotationPath /my/annotation/path --outDir /my/output/dir
For more details see Reference genomes.
- Installation
- Reference genomes
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
This pipeline has been written by
For any question, bug or suggestion, please use the issue system or contact the bioinformatics core facility.