These scripts are designed to work on the San Diego and San Francisco clusters.
The tools are installed in /illumina/scratch/Jigsaw/tools/jigsaw.
If you need to make any changes, a pull request is appreciated so I can keep track of things. If you clone your own copy, the pipeline will use the scripts in your source tree, so you can feel safe making related changes to several scripts.
In general, most of the scripts require 2 parameters:
-r : run folder -o : output folder
Example:
jigsaw/assemble_flowcell -r /illumina/scratch/Jigsaw/NMP/NMP_Seq_Runs/MyFlowcell \ -o /illumina/scratch/Jigsaw/Assemblies/MyAssemblies
Other parameters are usually just for internal helper scripts. Once exception is -s for setting the scratch root directory. By default, it is /scratch for SGE jobs and output/_scratch for interactive jobs.
The output folder may not be a subdirectory of the run folder - this makes it difficult to sync to local scratch space, and it's a bad habit to mix inputs & outputs anyway.
This script takes the primary parameters and submits an SGE job to assemble the flowcell. The output from the job is captured in the output directory you specify, and you will be emailed when the job completes. This is the script submitted to SGE. If you want to run the whole pipeline interactively, and you have a whole node, you can run this. Uses Isis/BWA to align the samples to the references provided in the sample sheet. Also creates the FASTQ files with adapters trimmed and reads reverse-complemented.The results are placed in output/Alignment.
Uses SPAdes 3.3.1 to assemble the FASTQ files. The results are placed in output/spades/SampleID. Uses various tools to create metrics. The following folders are placed in output:- picard/SampleID
- quast/SampleID
- visualization/SampleID
EC.report.html,
BC.report.html, and
RS.report.htmlwill appear in the directory above the output directory. These will display per-organism metrics and link out to other reports.
The visualization directory will contain a version of the assembly scaffolds aligned back to the reference genome. It will also contain a BED file called gaps.bed that can be used in IGV to quickly locate regions of the reference not covered by any contigs in the assembly.
The reference genomes we use are under/illumina/scratch/Jigsaw/genomes. Only B. Cereus is the same as what is in iGenomes. If you use IGV to inspect the alignments and assemblies, you must use the reference genomes from this location.