Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove code duplication in main workflow #3

Merged
merged 19 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/configuration_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ There are mainly two cases in which the user might want to alter the internal MI

### Modification of the motif mapping file for the locus-based mode of maize

By default, the maize MINI-AC locus-based mode (for both genome versions) runs on the "medium" non-coding genomic space, which corresponds, for each locus in the genome, to the 5kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns. However, we generated two additional motif mapping files for the locus-based mode of maize, that cover "large" (15kb upstream of the translation start site, the 2.5kb downstream of the translation end site, and the introns), and "small" (1kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns) non-coding genomic spaces. For Arabidopsis only the "medium" non-coding genomic space motif mapping file was generated because it already covers 73.5% of the whole non-coding genomic psace (see publication). To use these files, first they need to be downloaded, and then, the corresponding parameters of the motif mapping file (```MotMapsFile_lb```) and the non-coding genomic space coordinates file (```Promoter_file```) should be modified either on the command line or in the configuration file.
By default, the maize MINI-AC locus-based mode (for both genome versions) runs on the "medium" non-coding genomic space, which corresponds, for each locus in the genome, to the 5kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns. However, we generated two additional motif mapping files for the locus-based mode of maize, that cover "large" (15kb upstream of the translation start site, the 2.5kb downstream of the translation end site, and the introns), and "small" (1kb upstream of the translation start site, the 1kb downstream of the translation end site, and the introns) non-coding genomic spaces. For Arabidopsis only the "medium" non-coding genomic space motif mapping file was generated because it already covers 73.5% of the whole non-coding genomic psace (see publication). To use these files, first they need to be downloaded, and then, the corresponding parameters of the motif mapping file (```MotMapsFile```) and the non-coding genomic space coordinates file (```Promoter_file```) should be modified either on the command line or in the configuration file.

To download the maize "large" motif mapping file and coordinates of the "large" non-coding genomic space:

Expand Down Expand Up @@ -192,14 +192,14 @@ wget https://zenodo.org/record/8386283/files/zma_v5_promoter_1kbup_1kbdown_sorte
Then (using the "small" definition as example), change the parameters on the command line:

```
nextflow -C mini_ac.config run mini_ac.nf --mode locus_based --species maize_v4 --MotMapsFile_lb data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed --Promoter_file data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed
nextflow -C mini_ac.config run mini_ac.nf --mode locus_based --species maize_v4 --MotMapsFile data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed --Promoter_file data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed
```
or add them to the configuration file, along with the other parameters:

```nextflow
params {
/// [Other parameters...]
MotMapsFile_lb = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed"
MotMapsFile = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_1kbup_1kbdown.bed"
Promoter_file = "$projectDir/data/zma_v4/zma_v4_promoter_1kbup_1kbdown_sorted.bed"
/// [Other parameters...]
}
Expand Down
142 changes: 33 additions & 109 deletions mini_ac.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,129 +10,53 @@ workflow MINIAC {
params.Shuffle_seed = -1
params.Csv_output = false

if (params.mode == "genome_wide" && params.species == "maize_v4") {

params.MotMapsFile_gw = "$projectDir/data/zma_v4/zma_v4_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/zma_v4/zma_v4_noncod_merged.bed"
params.Faix_file = "$projectDir/data/zma_v4/zma_v4.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v4/zma_v4_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/zma_v4/zma_v4_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/zma_v4/zma_v4_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v4/zma_v4_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v4/maize_v4_gene_metadata_file.txt"
params.P_val = 0.1

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
// define species id used for data subfolder and data file prefix
def species
switch(params.species) {
case "arabidopsis":
species = "ath"
break
case "maize_v4":
species = "zma_v4"
break
case "maize_v5":
species = "zma_v5"
break
default:
exit 1, "MINI-AC can only be run for the species 'arabidopsis', 'maize_v4' and 'maize_v5'. Instead it got '${params.species}'."
}

else if (params.mode == "genome_wide" && params.species == "maize_v5") {
// set input data parameters shared between genome-wide and locus-based modes
params.Faix_file = "$projectDir/data/${species}/${species}.fasta.fai"
params.Motif_tf_file = "$projectDir/data/${species}/${species}_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/${species}/${species}_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/${species}/${species}_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/${species}/${species}_gene_metadata_file.txt"
hdbeukel marked this conversation as resolved.
Show resolved Hide resolved

params.MotMapsFile_gw = "$projectDir/data/zma_v5/zma_v5_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/zma_v5/zma_v5_noncod_merged.bed"
params.Faix_file = "$projectDir/data/zma_v5/zma_v5.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v5/zma_v5_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/zma_v5/zma_v5_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/zma_v5/zma_v5_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v5/zma_v5_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v5/maize_v5_gene_metadata_file.txt"
params.P_val = 0.1
if (params.mode == "genome_wide") {

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
}

else if (params.mode == "genome_wide" && params.species == "arabidopsis") {
params.MotMapsFile = "$projectDir/data/${species}/${species}_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/${species}/${species}_noncod_merged.bed"
params.Genes_coords = "$projectDir/data/${species}/${species}_genes_coords_sorted.bed"

params.MotMapsFile_gw = "$projectDir/data/ath/ath_genome_wide_motif_mappings.bed"
params.Non_cod_genome = "$projectDir/data/ath/ath_noncod_merged.bed"
params.Faix_file = "$projectDir/data/ath/ath.fasta.fai"
params.Motif_tf_file = "$projectDir/data/ath/ath_motif_TF_file.txt"
params.Genes_coords = "$projectDir/data/ath/ath_genes_coords_sorted.bed"
params.Feature_file = "$projectDir/data/ath/ath_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/ath/ath_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/ath/arabidopsis_gene_metadata_file.txt"
params.P_val = 0.1

genome_wide_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.Second_gene_annot, params.Second_gene_dist, params.MotMapsFile_gw,
params.Non_cod_genome, params.Faix_file, params.Motif_tf_file, params.Genes_coords, params.Feature_file,
params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

}

else if (params.mode == "locus_based" && params.species == "maize_v4") {

params.MotMapsFile_lb = "$projectDir/data/zma_v4/zma_v4_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/zma_v4/zma_v4_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/zma_v4/zma_v4.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v4/zma_v4_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/zma_v4/zma_v4_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v4/zma_v4_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v4/maize_v4_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

}
genome_wide_miniac(params)

} else if (params.mode == "locus_based") {

else if (params.mode == "locus_based" && params.species == "maize_v5") {
params.MotMapsFile = "$projectDir/data/${species}/${species}_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/${species}/${species}_promoter_5kbup_1kbdown_sorted.bed"

params.MotMapsFile_lb = "$projectDir/data/zma_v5/zma_v5_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/zma_v5/zma_v5_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/zma_v5/zma_v5.fasta.fai"
params.Motif_tf_file = "$projectDir/data/zma_v5/zma_v5_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/zma_v5/zma_v5_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/zma_v5/zma_v5_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/zma_v5/maize_v5_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)

locus_based_miniac(params)

} else {
exit 1, "MINI-AC can only be run using the modes 'genome_wide' or 'locus_based'. Instead it got '${params.mode}'."
}

else if (params.mode == "locus_based" && params.species == "arabidopsis") {

params.MotMapsFile_lb = "$projectDir/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
params.Promoter_file = "$projectDir/data/ath/ath_promoter_5kbup_1kbdown_sorted.bed"
params.Faix_file = "$projectDir/data/ath/ath.fasta.fai"
params.Motif_tf_file = "$projectDir/data/ath/ath_motif_TF_file.txt"
params.Feature_file = "$projectDir/data/ath/ath_go_gene_file.txt"
params.TF_fam_file = "$projectDir/data/ath/ath_TF_family_file.txt"
params.Genes_metadata = "$projectDir/data/ath/arabidopsis_gene_metadata_file.txt"
params.P_val = 0.01

locus_based_miniac(params.OutDir, params.ACR_dir, params.Filter_set_genes, params.Set_genes_dir,
params.One_filtering_set, params.DE_genes, params.DE_genes_dir, params.One_DE_set, params.P_val,
params.Bps_intersect, params.MotMapsFile_lb, params.Promoter_file, params.Faix_file, params.Motif_tf_file,
params.Feature_file, params.OBO_file, params.TF_fam_file, params.Genes_metadata, params.Shuffle_count, params.Shuffle_seed,
params.Csv_output)
}

else {
exit 1, "MINI-AC can only be run using the modes 'genome_wide' and 'locus_based', and with the species 'arabidopsis', 'maize_v4' and 'maize_v5'. Instead it got '${params.species}' and '${params.mode}' "
}
}


workflow {
MINIAC()
}
16 changes: 8 additions & 8 deletions tests/mini_ac.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_gw = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed"
MotMapsFile = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_genome_wide_motif_mappings_chr1.bed"
Non_cod_genome = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_noncod_merged_chr1.bed"
Faix_file = "${baseDir}/data/zma_v4/zma_v4.fasta.fai"
Motif_tf_file = "${baseDir}/data/zma_v4/zma_v4_motif_TF_file.txt"
Genes_coords = "${baseDir}/data/zma_v4/zma_v4_genes_coords_sorted.bed"
Feature_file = "${baseDir}/data/zma_v4/zma_v4_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/zma_v4/zma_v4_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/maize_v4_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/zma_v4_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -91,13 +91,13 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_lb = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed"
MotMapsFile = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_locus_based_motif_mappings_5kbup_1kbdown_chr1.bed"
Promoter_file = "${baseDir}/tests/data/zma_v4_chr1/zma_v4_promoter_5kbup_1kbdown_sorted_chr1.bed"
Faix_file = "${baseDir}/data/zma_v4/zma_v4.fasta.fai"
Motif_tf_file = "${baseDir}/data/zma_v4/zma_v4_motif_TF_file.txt"
Feature_file = "${baseDir}/data/zma_v4/zma_v4_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/zma_v4/zma_v4_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/maize_v4_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/zma_v4/zma_v4_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -153,14 +153,14 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_gw = "${baseDir}/data/ath/ath_genome_wide_motif_mappings.bed"
MotMapsFile = "${baseDir}/data/ath/ath_genome_wide_motif_mappings.bed"
Non_cod_genome = "${baseDir}/data/ath/ath_noncod_merged.bed"
Faix_file = "${baseDir}/data/ath/ath.fasta.fai"
Motif_tf_file = "${baseDir}/data/ath/ath_motif_TF_file.txt"
Genes_coords = "${baseDir}/data/ath/ath_genes_coords_sorted.bed"
Feature_file = "${baseDir}/data/ath/ath_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/ath/ath_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/ath/arabidopsis_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/ath/ath_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down Expand Up @@ -220,13 +220,13 @@ nextflow_workflow {
Shuffle_seed = 42

//// Hard code data paths
MotMapsFile_lb = "${baseDir}/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
MotMapsFile = "${baseDir}/data/ath/ath_locus_based_motif_mappings_5kbup_1kbdown.bed"
Promoter_file = "${baseDir}/data/ath/ath_promoter_5kbup_1kbdown_sorted.bed"
Faix_file = "${baseDir}/data/ath/ath.fasta.fai"
Motif_tf_file = "${baseDir}/data/ath/ath_motif_TF_file.txt"
Feature_file = "${baseDir}/data/ath/ath_go_gene_file.txt"
TF_fam_file = "${baseDir}/data/ath/ath_TF_family_file.txt"
Genes_metadata = "${baseDir}/data/ath/arabidopsis_gene_metadata_file.txt"
Genes_metadata = "${baseDir}/data/ath/ath_gene_metadata_file.txt"
OBO_file = "${baseDir}/data/ontologies/go.obo"

//// Output folder
Expand Down
Loading
Loading