Skip to content

Commit

Permalink
Merge pull request #22 from Sage-Bionetworks-Workflows/addparams
Browse files Browse the repository at this point in the history
Add new main workflow for miRNA reads, add additional alignment parameters to all workflows
  • Loading branch information
wpoehlm authored Sep 22, 2020
2 parents 55f70d5 + fb523d7 commit c3461f6
Show file tree
Hide file tree
Showing 19 changed files with 864 additions and 9 deletions.
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Three main workflows are present in the root of this repository:
* [bam_paired.cwl](bam_paired.cwl): This workflow processes input BAM files from paired-end sequencing reads
* [fastq_paired.cwl](fastq_paired.cwl): This workflow processes paired end fastq files
* [fastq_single.cwl](fastq_single.cwl): This workflow processes single end fastq files
* [mirna_single.cwl](mirna_single.cwl): This workflow processes single-end fastq files from miRNA libraries

Subworkflows that the main workflows utilize are present in the [subworkflows](subworkflows) folder.

Expand Down Expand Up @@ -69,7 +70,7 @@ Each workflow requires the following inputs:

* `cwl_wf_url`: A URL that points to a commit or tagged version of this github repository at the time of job submission. "https://github.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/tree/5832931a9569d9d8fba26a36146a682870d6f5f7", for example. Guidance on generating a permanent github link can be found [here](https://help.github.com/en/github/managing-files-in-a-repository/getting-permanent-links-to-files#press-y-to-permalink-to-a-file-in-a-specific-commit).
* `cwl_args_url`: A raw github URL that points to the input parameters file for the job that you are running. "https://raw.githubusercontent.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/5832931a9569d9d8fba26a36146a682870d6f5f7/jobs/test-paired-bam/job.json", for example. To find the raw URL for a file on github, navigate to the file and follow the instructions for generating a [permanent url](https://help.github.com/en/github/managing-files-in-a-repository/getting-permanent-links-to-files#press-y-to-permalink-to-a-file-in-a-specific-commit). You can then click on the `raw` button to open the raw URL in your browser.
* `index_synapseid`: A [Synapse](https://www.synapse.org/) ID for the folder that contains a STAR-indexed reference genome. An example can be found in `syn22152278`
* `index_synapseid`: A [Synapse](https://www.synapse.org/) ID for the folder that contains a STAR-indexed reference genome. An example can be found in `syn22152278`. Two gtf files must be presesnt in this folder to run the `mirna_single.cwl` workflow: A main gtf file with the filename extension ".annotation.gtf" and a gtf file that contains only miRNA annotations with the filename extension ".subset.gtf". An example of a miRNA-compatible reference genome folder can be found in `syn22342700`
* `nthreads`: An integer value that represents the number of compute threads that the STAR aligner should use.
* `synapse_parentid`: A [Synapse](https://www.synapse.org/) ID for the folder that output tables will be uploaded to.
* `synapse_config`: A [Synapse](https://www.synapse.org/) configuration file that will be used to authenticate data downloads and uploads during workflow execution
Expand All @@ -81,6 +82,8 @@ The fastq_paired.cwl workflow also requires the following input:

An example input json file that contains values for these required inputs can be found [here](https://raw.githubusercontent.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/7d64748a3a6d7cc8cfd9f30fc43c1b9bc79b3b3f/jobs/test-paired-bam/job.json)

An example input json file that contains example parameters for the mirna_single.cwl workflow can be found [here](https://github.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/blob/master/jobs/test-single-mirna/job.json)

### Optional Job inputs

You can optionally supply an input parameter that specifies the strandedness parameter of the library that will be used by Picard Tools. To do so, add the `strand_specificity` argument to your job.json file. The three valid string options for this parameter are:
Expand All @@ -101,6 +104,20 @@ If this argument is not provided, the default value of `2` will be used. This is

An example input json file that contains the required inputs and these optional inputs can be found [here](https://raw.githubusercontent.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/7d64748a3a6d7cc8cfd9f30fc43c1b9bc79b3b3f/jobs/test-paired-fastq/job.json)

In addition, you may optionally specify the following parameters for the STAR alignment (Note that it is highly recommended to customize these arguments for the mirna_single.cwl workflow):

* `alignEndsType` : A string specifying the type of read ends alignment
* `outFilterMismatchNmax` : Integer specifying the maximum number of mismatches per pair
* `outFilterMultimapScoreRange` : Integer specifying the score range for multi-mapping alignments
* `outFilterMultimapNmax` : Integer specifying the maximum number of multiple alignments for a read
* `outFilterScoreMinOverLread` : Integer specifying the minimum score for an alignment to be reported, normalized to read length
* `outFilterMatchNminOverLread` : Integer specifying the minimum number of matched bases for an alignment to be reported, normalized to read length
* `outFilterMatchNmin` : Integer specifying the minimum number of matched bases for an alignment to be reported
* `alignSJDBoverhangMin` : Integer specifying the minimum block size for annotated spliced alignments
* `alignIntronMax` : Integer specifying the maximum intron size

For further details about these parameters, please refer to the [STAR manual](https://chagall.med.cornell.edu/RNASEQcourse/STARmanual.pdf)

## Resource Requirements

Resource requirements are specified using the CWL `ResourceRequirement` class. Each subworkflow contains specific requests for RAM, disk space, and number of threads. These values are set for average-sized RNA Sequencing input files for alignment against the human reference genome. If the default values are not sufficient, please modify the `ResourceRequirement` values in the subworkflow CWL files.
Expand Down
37 changes: 37 additions & 0 deletions bam_paired.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,25 @@ inputs:
type: string?
- id: column_number
type: int?
- id: alignEndsType
type: string?
- id: outFilterMismatchNmax
type: int?
- id: outFilterMultimapScoreRange
type: int?
- id: outFilterMultimapNmax
type: int?
- id: outFilterScoreMinOverLread
type: int?
- id: outFilterMatchNminOverLread
type: int?
- id: outFilterMatchNmin
type: int?
- id: alignSJDBoverhangMin
type: int?
- id: alignIntronMax
type: int?

outputs:
- id: clean_counts
outputSource:
Expand Down Expand Up @@ -83,6 +102,24 @@ steps:
source: synapse_config
- id: synapseid
source: synapseid
- id: alignEndsType
source: alignEndsType
- id: outFilterMismatchNmax
source: outFilterMismatchNmax
- id: outFilterMultimapScoreRange
source: outFilterMultimapScoreRange
- id: outFilterMultimapNmax
source: outFilterMultimapNmax
- id: outFilterScoreMinOverLread
source: outFilterScoreMinOverLread
- id: outFilterMatchNminOverLread
source: outFilterMatchNminOverLread
- id: outFilterMatchNmin
source: outFilterMatchNmin
- id: alignSJDBoverhangMin
source: alignSJDBoverhangMin
- id: alignIntronMax
source: alignIntronMax
out:
- id: splice_junctions
- id: reads_per_gene
Expand Down
37 changes: 37 additions & 0 deletions fastq_paired.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,25 @@ inputs:
type: string?
- id: column_number
type: int?
- id: alignEndsType
type: string?
- id: outFilterMismatchNmax
type: int?
- id: outFilterMultimapScoreRange
type: int?
- id: outFilterMultimapNmax
type: int?
- id: outFilterScoreMinOverLread
type: int?
- id: outFilterMatchNminOverLread
type: int?
- id: outFilterMatchNmin
type: int?
- id: alignSJDBoverhangMin
type: int?
- id: alignIntronMax
type: int?

outputs:
- id: clean_counts
outputSource:
Expand Down Expand Up @@ -81,6 +100,24 @@ steps:
source: synapseid
- id: synapseid_2
source: synapseid_2
- id: alignEndsType
source: alignEndsType
- id: outFilterMismatchNmax
source: outFilterMismatchNmax
- id: outFilterMultimapScoreRange
source: outFilterMultimapScoreRange
- id: outFilterMultimapNmax
source: outFilterMultimapNmax
- id: outFilterScoreMinOverLread
source: outFilterScoreMinOverLread
- id: outFilterMatchNminOverLread
source: outFilterMatchNminOverLread
- id: outFilterMatchNmin
source: outFilterMatchNmin
- id: alignSJDBoverhangMin
source: alignSJDBoverhangMin
- id: alignIntronMax
source: alignIntronMax
out:
- id: splice_junctions
- id: reads_per_gene
Expand Down
36 changes: 36 additions & 0 deletions fastq_single.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,24 @@ inputs:
type: string?
- id: column_number
type: int?
- id: alignEndsType
type: string?
- id: outFilterMismatchNmax
type: int?
- id: outFilterMultimapScoreRange
type: int?
- id: outFilterMultimapNmax
type: int?
- id: outFilterScoreMinOverLread
type: int?
- id: outFilterMatchNminOverLread
type: int?
- id: outFilterMatchNmin
type: int?
- id: alignSJDBoverhangMin
type: int?
- id: alignIntronMax
type: int?
outputs:
- id: clean_counts
outputSource:
Expand Down Expand Up @@ -83,6 +101,24 @@ steps:
source: synapse_config
- id: synapseid
source: synapseid
- id: alignEndsType
source: alignEndsType
- id: outFilterMismatchNmax
source: outFilterMismatchNmax
- id: outFilterMultimapScoreRange
source: outFilterMultimapScoreRange
- id: outFilterMultimapNmax
source: outFilterMultimapNmax
- id: outFilterScoreMinOverLread
source: outFilterScoreMinOverLread
- id: outFilterMatchNminOverLread
source: outFilterMatchNminOverLread
- id: outFilterMatchNmin
source: outFilterMatchNmin
- id: alignSJDBoverhangMin
source: alignSJDBoverhangMin
- id: alignIntronMax
source: alignIntronMax
out:
- id: splice_junctions
- id: reads_per_gene
Expand Down
4 changes: 3 additions & 1 deletion jobs/default/options.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
{
"zone": "us-east-1a",
"cluster_name": "rna-seq-reprocessing-scicomp-toil-cluster-v001",
"tmpdir": "/var/lib/toil",
"run_name": "def",
"run_name": "tst",
"log_level": "INFO",
"retry_count": "3",
"target_time": "1",
"default_disk": "450G",
"max_nodes": "5",
"node_types": "m5.4xlarge",
"node_storage": "500",
"preemptable_compensation": "0.5",
Expand Down
32 changes: 32 additions & 0 deletions jobs/test-UW-mirna/job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"cwl_wf_url": "https://github.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq",
"cwl_args_url": "https://raw.githubusercontent.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/master/jobs/test-UW-mirna/job.json",
"alignEndsType": "EndToEnd",
"outFilterMismatchNmax": 1,
"outFilterMultimapScoreRange": 0,
"outFilterMultimapNmax": 10,
"outFilterScoreMinOverLread": 0,
"outFilterMatchNminOverLread": 0,
"outFilterMatchNmin": 16,
"alignSJDBoverhangMin": 1000,
"alignIntronMax": 1,
"index_synapseid": "syn22337116",
"nthreads": 1,
"synapse_parentid": "syn22352005",
"synapse_config": {
"class": "File",
"path": "/etc/synapse/.synapseConfig"
},
"synapseid": [
"syn22334734",
"syn22334729",
"syn22334741",
"syn22334744",
"syn22334745",
"syn22334706",
"syn22334712",
"syn22334731",
"syn22334728",
"syn22334714"
]
}
17 changes: 17 additions & 0 deletions jobs/test-UW-mirna/options.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"zone": "us-east-1a",
"cluster_name": "rna-seq-reprocessing-scicomp-toil-cluster-v001",
"tmpdir": "/var/lib/toil",
"run_name": "uwm",
"log_level": "DEBUG",
"retry_count": "3",
"target_time": "1",
"max_nodes": "5",
"default_disk": "450G",
"node_types": "m5.4xlarge",
"node_storage": "500",
"preemptable_compensation": "0.5",
"rescue_frequency": "9000",
"cwl": "mirna_single.cwl"
}

4 changes: 3 additions & 1 deletion jobs/test-paired-bam/options.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
{
"zone": "us-east-1a",
"cluster_name": "rna-seq-reprocessing-scicomp-toil-cluster-v001",
"tmpdir": "/var/lib/toil",
"run_name": "def",
"run_name": "tst",
"log_level": "INFO",
"retry_count": "3",
"target_time": "1",
"default_disk": "450G",
"max_nodes": "5",
"node_types": "m5.4xlarge",
"node_storage": "500",
"preemptable_compensation": "0.5",
Expand Down
4 changes: 3 additions & 1 deletion jobs/test-paired-fastq/options.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
{
"zone": "us-east-1a",
"cluster_name": "rna-seq-reprocessing-scicomp-toil-cluster-v001",
"tmpdir": "/var/lib/toil",
"run_name": "def",
"run_name": "tst",
"log_level": "INFO",
"retry_count": "3",
"target_time": "1",
"default_disk": "450G",
"max_nodes": "5",
"node_types": "m5.4xlarge",
"node_storage": "500",
"preemptable_compensation": "0.5",
Expand Down
4 changes: 3 additions & 1 deletion jobs/test-single-fastq/options.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
{
"zone": "us-east-1a",
"cluster_name": "rna-seq-reprocessing-scicomp-toil-cluster-v001",
"tmpdir": "/var/lib/toil",
"run_name": "def",
"run_name": "tst",
"log_level": "INFO",
"retry_count": "3",
"target_time": "1",
"default_disk": "450G",
"max_nodes": "5",
"node_types": "m5.4xlarge",
"node_storage": "500",
"preemptable_compensation": "0.5",
Expand Down
23 changes: 23 additions & 0 deletions jobs/test-single-mirna/job.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"cwl_wf_url": "https://github.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq",
"cwl_args_url": "https://raw.githubusercontent.com/Sage-Bionetworks-Workflows/dockstore-workflow-rnaseq/master/jobs/test-single-mirna/job.json",
"alignEndsType": "EndToEnd",
"outFilterMismatchNmax": 1,
"outFilterMultimapScoreRange": 0,
"outFilterMultimapNmax": 10,
"outFilterScoreMinOverLread": 0,
"outFilterMatchNminOverLread": 0,
"outFilterMatchNmin": 16,
"alignSJDBoverhangMin": 1000,
"alignIntronMax": 1,
"index_synapseid": "syn22342700",
"nthreads": 1,
"synapse_parentid": "syn22152380",
"synapse_config": {
"class": "File",
"path": "/tmp/.synapseConfig"
},
"synapseid": [
"syn22351902"
]
}
15 changes: 15 additions & 0 deletions jobs/test-single-mirna/options.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"zone": "us-east-1a",
"cluster_name": "rna-seq-reprocessing-scicomp-toil-cluster-v001",
"tmpdir": "/var/lib/toil",
"run_name": "tst",
"log_level": "INFO",
"retry_count": "3",
"target_time": "1",
"default_disk": "450G",
"max_nodes": "5",
"node_types": "m5.4xlarge",
"node_storage": "500",
"preemptable_compensation": "0.5",
"rescue_frequency": "9000"
}
Loading

0 comments on commit c3461f6

Please sign in to comment.