Releases: PombertLab/SYNY
SYNY-v1.2a
Changes
-
Now generates VCF files from minimap2 genome alignments (min. alignment length = 1000 bp) automatically. VCF file creation can be turned off with the new
--no_vcf
flag inrun_syny.pl
(and/or inget_paf.pl
). Note that these files can become quite large depending on the size of the genomes being compared. -
nucleotide_biases.pl
now calculates GC and AT skews. Corresponding data files are located in thePLOTS/CIRCOS_DATA/
subdirectory. -
GC/AT skews are now plotted automatically with Circos. If desired, these subplots can be turned off independently with the
--no_skews
option, or together with all nucleotide biases subplots (with--no_ntbiases
). -
Added a simple Fasta + GFF3 to GBFF converter (
gff3_to_gbff.pl
) in theUtils/
subdirectory. This tool was tested on NCBI GFF3 files and expects the GFF3 file(s) to include gene/mRNA/exon/CDS entries in thetype
column and theID
andParent
tags in the attributes column. It also expects the corresponding Fasta and GFF3 files to share the same prefixes (e.g. genome_1.fasta / genome_1.gff). The GBFF files thus created were designed to work with SYNY but do not adhere exactly to the GBFF format and may not work for other purposes. -
list_maker.pl
/run_syny.pl
: GenBank Flat file format extensions (gbk, gb, gbf) are now recognized/accepted -
check_mp_colors.py
: removed obsolete references to pylab -
Added
orient_fastas_to_reference.py
to theUtils/
subdirectory. This script reorients contigs in FASTA file(s) based on BLASTN homology searches against a reference. This can be useful when working with newly assembled genomes.
SYNY-v1.2
Bugfix release
- Fixed concatenation issue with isoforms in
list_maker.pl
- Fixed subranges issues in
list_maker.pl
- Adjusted linearmap alpha value and edge color for readability in
linear_maps.py
- Slightly reduced memory usage with matplotlib
SYNY-v1.1b
Bugfix release
- Fixed extra length issues with barplots, dotplots and linemaps. Code was missing a line.strip(). Issue created visual artefacts on barplots (longer frames).
Code cleanup:
- Added
--version
option for all scripts. - Minor code cleanup / standardisation across scripts
SYNY-v1.1a
Changes
Additions
- Added the
--include
option to select contigs by name from text file(s); one name per line - Added the
--ranges
option to select contig subranges from text file(s); name start end - Added the
--bpmode
option to generate pairwise (pair) and/or concatenated (cat) barplots. Possible values arepair
(default),cat
, andall
(for both). - Added the
--bclusters
option to color clusters by alternating colors in the barplots. The colors are not related within or between contigs, they are just used to highlight collinear chunks. - Created
check_versions.pl
to summarize script versions; this information can now be displayed withrun_syny.pl --version
.
Bugfixes
list_maker.pl
now grabs GeneID tags if locus tags are absent from GBFF annotation files.- Fixed .txt file extension + added a file size check to
paf_metrics.py
. Now skips plotting if file is empty. - Fixed div by zero issue in
nucleotide_biases.pl
. - Added a check to detect if annotations parsed are blank.
run_syny.pl
no longer crashes if annotations are blank when running gene cluster inferences. If blank, it now now skips this section automatically. - Fixed perl env shebangs causing issues with conda
- Fixed wrong exit codes with readmes
Readme / logs
- Added section about memory usage with genome alignments
- Added mashmap barplot examples in the Encephalitozoon section
- Added
changes.md
summarizing changes between versions - Improved
syny.log
file.
SYNY-v1.1
Additions:
- SYNY now generates linear maps (aka linemaps) from PAF files with
linear_maps.py
. - Added support for MashMap3 genome alignments. Mashmap can be selected instead of minimap with
--aligner mashmap
. It runs in a smaller memory footprint than minimap (if using its default percentage identity of 85%). It does not product exact alignments however. - Added the option to exclude contigs by name matching regular expression(s): e.g.
--exclude '^AUX' '^CPGT'
. - Added an alternate SYNY installation method that does not require sudo privileges by leveraging conda packages.
Fixes:
- Fixed the
The number of annotation files (2) does not equal the number of protein files (1)
error => rewrote the corresponding segment and removed the obsoleted subroutine. - Fixed the unreliable $diamond_check in
get_homology.pl
(i.e. replaced which by command -v). - Changed Perl dependency Roman => Text::Roman in
nucleotide_biases.pl
.
SYNY-v1.1
Changes:
Additions:
- SYNY now generates linear maps (aka linemaps) from PAF files with
linear_maps.py
. - Added support for MashMap3 genome alignments. Mashmap can be selected instead of minimap with
--aligner mashmap
. It runs in a smaller memory footprint than minimap (if using its default percentage identity of 85%). It does not product exact alignments however. - Added the option to exclude contigs by name matching regular expression(s): e.g.
--exclude '^AUX' '^CPGT'
. - Added an alternate SYNY installation method that does not require sudo privileges by leveraging conda packages.
Fixes:
- Fixed the
The number of annotation files (2) does not equal the number of protein files (1)
error => rewrote the corresponding segment and removed the obsoleted subroutine. - Fixed the unreliable $diamond_check in
get_homology.pl
(i.e. replaced which by command -v). - Changed Perl dependency Roman => Text::Roman in
nucleotide_biases.pl
.
SYNY-v1.0b
Changes:
run_syny.pl
options can now be set from a configuration file (requires Getopt::ArgvFile); e.g.run_syny.pl @commands.conf
- Added the Getopt::ArgvFile dependency to
setup_syny.pl
=>sudo cpanm Getopt::ArgvFile
- Added a minimum contig size option + set defaults to all contigs, i.e. (
--minsize 1
) - Added a matplotlib color palette check before computations so that plots won't crash if the color palette entered does not exist
SYNY-v1.0a
Changes:
- Added
--hfsize
,--hmin
,--hmax
and--hauto
options to heatmaps - Added more options to the Circos
--labels
command line switch. Possible values are now:mixed
,roman
,arabic
andnames
- Added
--pthreads
option to set the limit of plotting instances to run in parallel (in case each plot eats up too much RAM); defaults to the value set by--threads
if omitted. - Added SVG output to
paf_metrics.py
- Set fonts as editable in SVG output files
- Removed unnecessary border frames from barplots
- Fixed ambiguous heatmap titles
- Added an example script (
Arabidopsis.sh
) inExamples/
to download two Arabidopsis genomes (~ 100-150 Mbp each) for testing purposes
SYNY-v1.0
Stable release
Changes:
Misc:
- Fixed output directory bug in
run_syny.pl
when using a deep tree - Fixed abs_path() issue in
setup_syny.pl
that caused incomplete paths in the output configuration file - Created
check_mp_colors.py
to list/plot color palettes available on the system (Fedora 40/Ubuntu 22.04 matplotlib palettes are not the same - 170 vs. 166) + added color palette plot
(Images/python_color_palettes.png) - Fixed out of bounds barplot legends
- Added font size options --bfsize/--dfsize options for barplots/dotplots
Circos:
- Contigs from the reference genome are now visually distinct and are labelled by roman numerals. Other contigs are labelled by arabic numerals.
- Added
--orientation
option (possible values:normal
,inverted
,both
) + removed the now obsoleted--no_invert/--no_normal
options - Added
--no_cticks
option to disable ticks in Circos plots. - added
--no_ntbiases
option to disable nucleotide bias subplots. - Changed the default Circos plot mode to pairwise (
--circos pair
); concatenated plots can take a while to compute and are not always useful. - Circos figures are now plotted in
--orientation normal
by default instead of both normal/inverted => less wasteful. - Renamed the
.genotype
files generated by SYNY as.karyotype
to match the nomenclature used by Circos
SYNY-v0.9e
Cleanup release:
- Fixed a bug that crashed
nucleotide_biases.pl
when the reference entered was not found. Now uses the first sequence alphabetically if the ref entered is not found. - Created
fasta_to_gbff.pl
to convert FASTA sequences to GBFF files (without annotations); useful to compare newly assembled genomes using pairwise alignments - Added
Alignments
,Clusters
,Plots
, andUtils
subdirs to the git repository and moved scripts/data accordingly - Added shell scripts to download the example annotation data from NCBI
- Improved/cleaned up README
SYNY-v0.9d
Cleanup release:
- Sanitized output directory:
- Regrouped subdirs by analysis (
ALIGNMENTS/
,CLUSTERS/
) and moved content accordingly - Created
PLOTS/
subdir and moved all plots therein - Renamed the CIRCOS data folder as
CIRCOS_DATA/
for greater clarity - Created
SEQUENCES/
subdir to store genome and protein fasta files
- Regrouped subdirs by analysis (
- Restructured/cleaned up
run_syny.pl
- Improved the output log (
syny.log
)