Supplementary Table 3. Impact metrics of popular bioinformatics tools and resources. Only software that is being developed on GitHub, has over 50 stars, and published in peer-review journals was selected.
Name | Description | GitHub | Stars | Watchers | Forks | DOI | Journal | Year | Altmetrics | ImpactFactor | CiteScore | Citations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
samtools | Tools written in C using htslib for manipulating next-generation sequencing data | https://github.com/samtools/samtools | 679 | 110 | 366 | 10.1093/bioinformatics/btp352 | Bioinformatics | 2009 | 72.53 | 5.481 | 7.84 | 12191 |
bwa | Burrow-Wheeler Aligner for short-read alignment see minimap2 for long-read alignment | https://github.com/lh3/bwa | 613 | 118 | 321 | 10.1093/bioinformatics/btp324 | Bioinformatics | 2009 | 43.358 | 5.481 | 7.84 | 11185 |
STAR | RNA-seq aligner | https://github.com/alexdobin/STAR | 581 | 89 | 201 | 10.1093/bioinformatics/bts635 | Bioinformatics | 2012 | 95.74 | 5.481 | 7.84 | 3491 |
ranger | A Fast Implementation of Random Forests | https://github.com/imbs-hl/ranger | 359 | 42 | 94 | 10.18637/jss.v077.i01 | Journal of Statistical Software | 2016 | 47.35 | 22.737 | 16.32 | 47 |
trinityrnaseq | Trinity RNA-Seq de novo transcriptome assembly | https://github.com/trinityrnaseq/trinityrnaseq | 319 | 61 | 193 | 10.1038/nbt.1883 | Nature Biotechnology | 2011 | 40.096 | 35.724 | 12.94 | 5436 |
seurat | R toolkit for single cell genomics | https://github.com/satijalab/seurat | 308 | 55 | 202 | 10.1038/nbt.4096 | Nature Biotechnology | 2018 | 318.54 | 35.724 | 12.94 | 56 |
MACS | MACS – Model-based Analysis of ChIP-Seq | https://github.com/taoliu/MACS | 270 | 52 | 168 | 10.1186/gb-2008-9-9-r137 | Genome Biology | 2007 | 15 | 13.214 | 12.66 | 3219 |
canu | A single molecule sequence assembler for genomes large and small | https://github.com/marbl/canu | 253 | 52 | 75 | 10.1101/gr.215087.116 | Genome Research | 2017 | 89.85 | 10.101 | 11.65 | 305 |
gemini | a lightweight db framework for exploring genetic variation | https://github.com/arq5x/gemini | 235 | 46 | 107 | 10.1371/journal.pcbi.1003153 | PLoS Computational Biology | 2013 | 38.984 | 3.955 | 4.49 | 120 |
bowtie2 | A fast and sensitive gapped read aligner | https://github.com/BenLangmead/bowtie2 | 200 | 30 | 70 | 10.1038/nmeth.1923 | Nature Methods | 2012 | 77.088 | 26.919 | 13.07 | 8386 |
vcftools | A set of tools written in Perl and C for working with VCF files such as those generated by the 1000 Genomes Project | https://github.com/vcftools/vcftools | 198 | 27 | 84 | 10.1093/bioinformatics/btr330 | Bioinformatics | 2011 | 30.08 | 5.481 | 7.84 | 1864 |
sga | de novo sequence assembler using string graphs | https://github.com/jts/sga | 184 | 34 | 74 | 10.1101/gr.126953.111 | Genome Research | 2011 | 36.756 | 10.101 | 11.65 | 312 |
velvet | Short read de novo assembler using de Bruijn graphs | https://github.com/dzerbino/velvet | 182 | 24 | 75 | 10.1371/journal.pone.0008407 | PLoS ONE | 2009 | 6.5 | 2.766 | 3.01 | 119 |
hisat2 | Graph-based alignment Hierarchical Graph FM index | https://github.com/infphilo/hisat2 | 182 | 40 | 56 | 10.1038/nmeth.3317 | Nature Methods | 2015 | 53.416 | 26.919 | 13.07 | 898 |
bcftools | This is the official development repository for BCFtools To compile the develop branch of htslib is needed git clone –branchdevelop git//githubcom/samtools/htslibgit htslib | https://github.com/samtools/bcftools | 180 | 52 | 115 | 10.1093/bioinformatics/btw044 | Bioinformatics | 2016 | 8.25 | 5.481 | 7.84 | 28 |
cufflinks | https://github.com/cole-trapnell-lab/cufflinks | 174 | 41 | 94 | 10.1038/nbt.1621 | Nature Biotechnology | 2010 | 44.434 | 35.724 | 12.94 | 5203 | |
vcfanno | annotate a VCF with other VCFs/BEDs/tabixed files | https://github.com/brentp/vcfanno | 170 | 21 | 29 | 10.1186/s13059-016-0973-5 | Genome Biology | 2016 | 10.7 | 13.214 | 12.66 | 12 |
giggle | Interval data structure | https://github.com/ryanlayer/giggle | 159 | 20 | 19 | 10.1038/nmeth.4556 | Nature Methods | 2018 | 102.35 | 26.919 | 13.07 | 2 |
Basset | Convolutional neural network analysis for predicting DNA sequence activity | https://github.com/davek44/Basset | 156 | 22 | 63 | 10.1101/gr.200535.115 | Genome Research | 2016 | 50.43 | 10.101 | 11.65 | 95 |
lumpy-sv | lumpy a general probabilistic framework for structural variant discovery | https://github.com/arq5x/lumpy-sv | 155 | 27 | 74 | 10.1186/gb-2014-15-6-r84 | Genome Biology | 2013 | 42.4 | 13.214 | 12.66 | 217 |
abyss | microscope Assemble large genomes using short reads | https://github.com/bcgsc/abyss | 150 | 24 | 69 | 10.1093/bioinformatics/btp367 | Bioinformatics | 2009 | 3 | 5.481 | 7.84 | 242 |
ldsc | LD Score Regression LDSC | https://github.com/bulik/ldsc | 147 | 23 | 81 | 10.1038/ng.3404 | Nature Genetics | 2015 | 52.31 | 27.125 | 21.12 | 243 |
mothur | Welcome to the mothur project initiated by Dr Patrick Schloss and his software development team in the Department of Microbiology & Immunology at The University of Michigan This project seeks to develop a single piece of open-source expandable software to fill the bioinformatics needs of the microbial ecology community | https://github.com/mothur/mothur | 145 | 33 | 70 | 10.1128/aem.01541-09 | Applied & Environmental Microbiology | 2009 | 23.25 | 3.633 | 3.99 | 7535 |
delly | DELLY2 Structural variant discovery by integrated paired-end and split-read analysis | https://github.com/dellytools/delly | 136 | 35 | 62 | 10.1093/bioinformatics/bts378 | Bioinformatics | 2012 | 23.5 | 5.481 | 7.84 | 381 |
qiime2 | Official repository for the QIIME 2 framework | https://github.com/qiime2/qiime2 | 122 | 35 | 82 | 10.1038/nmeth.f.303 | Nature Methods | 2010 | 44.208 | 26.919 | 13.07 | 9982 |
mummer | Mummer alignment tool | https://github.com/mummer4/mummer | 121 | 22 | 34 | 10.1371/journal.pcbi.1005944 | PLoS Computational Biology | 2018 | 105.7 | 3.955 | 4.49 | 16 |
monocle-release | https://github.com/cole-trapnell-lab/monocle-release | 112 | 33 | 54 | 10.1038/nbt.2859 | Nature Biotechnology | 2014 | 78.296 | 35.724 | 12.94 | 501 | |
HiC-Pro | HiC-Pro An optimized and flexible pipeline for Hi-C data processing | https://github.com/nservant/HiC-Pro | 101 | 23 | 71 | 10.1186/s13059-015-0831-x | Genome Biology | 2015 | 10.8 | 13.214 | 12.66 | 88 |
clinvar | This repo provides tools to convert ClinVar data into a tab-delimited flat file and also provides that resulting tab-delimited flat file | https://github.com/macarthur-lab/clinvar | 98 | 42 | 43 | 10.1093/nar/gkx1153 | Nucleic Acids Research | 2017 | 13.53 | 11.561 | 10.84 | 39 |
ballgown | Bioconductor package ballgown devel version Isoform-level differential expression analysis in R | https://github.com/alyssafrazee/ballgown | 95 | 23 | 49 | 10.1038/nbt.3172 | Nature Biotechnology | 2015 | 47.508 | 35.724 | 12.94 | 67 |
DanQ | A hybrid convolutional and recurrent neural network for predicting the function of DNA sequences | https://github.com/uci-cbcl/DanQ | 94 | 20 | 43 | 10.1093/nar/gkw226 | Nucleic Acids Research | 2016 | 3.25 | 11.561 | 10.84 | 52 |
stringtie | Transcript assembly and quantification for RNA-Seq | https://github.com/gpertea/stringtie | 90 | 19 | 24 | 10.1038/nprot.2016.095 | Nature Protocols | 2016 | 81.946 | 12.423 | 10.98 | 208 |
scLVM | scLVM is a modelling framework for single-cell RNA-seq data that can be used to dissect the observed heterogeneity into different sources thereby allowing for the correction of confounding sources of variation | https://github.com/PMBio/scLVM | 83 | 23 | 38 | 10.1038/nbt.3102 | Nature Biotechnology | 2015 | 172.016 | 35.724 | 12.94 | 326 |
CNVnator | a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads | https://github.com/abyzovlab/CNVnator | 76 | 12 | 33 | 10.1101/gr.114876.110 | Genome Research | 2011 | 27.836 | 10.101 | 11.65 | 429 |
SIMLR | Implementations in both Matlab and R of the SIMLR method The manuscript of the method is available at https//wwwnaturecom/articles/nmeth4207 | https://github.com/BatzoglouLabSU/SIMLR | 65 | 18 | 36 | 10.1038/nmeth.4207 | Nature Methods | 2017 | 47.25 | 26.919 | 13.07 | 42 |
SnpEff | https://github.com/pcingola/SnpEff | 64 | 22 | 38 | 10.4161/fly.19695 | Fly | 2014 | 9.5 | 1.218 | 1.27 | 1785 | |
Artemis | Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence and its six-frame translation | https://github.com/sanger-pathogens/Artemis | 63 | 13 | 33 | 10.1093/bioinformatics/btr703 | Bioinformatics | 2011 | 26.334 | 5.481 | 7.84 | 271 |
MAST | Tools and methods for analysis of single cell assay data in R | https://github.com/RGLab/MAST | 62 | 9 | 28 | 10.1186/s13059-015-0844-5 | Genome Biology | 2015 | 48.806 | 13.214 | 12.66 | 126 |
ZIFA | Zero-inflated dimensionality reduction algorithm for single-cell data | https://github.com/epierson9/ZIFA | 61 | 8 | 23 | 10.1186/s13059-015-0805-z | Genome Biology | 2015 | 41.676 | 13.214 | 12.66 | 95 |
svtyper | Bayesian genotyper for structural variants | https://github.com/hall-lab/svtyper | 61 | 11 | 26 | 10.1038/nmeth.3505 | Nature Methods | 2015 | 152.438 | 26.919 | 13.07 | 58 |
deconstructSigs | deconstructSigs | https://github.com/raerose01/deconstructSigs | 60 | 10 | 19 | 10.1186/s13059-016-0893-4 | Genome Biology | 2016 | 22.65 | 13.214 | 12.66 | 120 |
sciclone | An R package for inferring the subclonal architecture of tumors | https://github.com/genome/sciclone | 59 | 48 | 35 | 10.1371/journal.pcbi.1003665 | PLoS Computational Biology | 2014 | 31.912 | 3.955 | 4.49 | 103 |
htseq | HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing HTS experiments | https://github.com/simon-anders/htseq | 59 | 9 | 37 | 10.1093/bioinformatics/btu638 | Bioinformatics | 2014 | 55.346 | 5.481 | 7.84 | 3084 |
circlator | A tool to circularize genome assemblies | https://github.com/sanger-pathogens/circlator | 58 | 17 | 20 | 10.1186/s13059-015-0849-0 | Genome Biology | 2015 | 50.358 | 13.214 | 12.66 | 116 |
clonevol | Inferring and visualizing clonal evolution in multi-sample cancer sequencing | https://github.com/hdng/clonevol | 56 | 8 | 25 | 10.1093/annonc/mdx517 | Annals of Oncology | 2017 | 15.65 | 13.926 | 8.97 | 7 |
methylKit | R package for DNA methylation analysis | https://github.com/al2na/methylKit | 55 | 15 | 64 | 10.1186/gb-2012-13-10-r87 | Genome Biology | 2011 | 23.35 | 13.214 | 12.66 | 283 |
PhenoGraph | Subpopulation detection in high-dimensional single-cell data | https://github.com/jacoblevine/PhenoGraph | 53 | 7 | 25 | 10.1016/j.cell.2015.05.047 | Cell | 2015 | 32.588 | 31.398 | 21.99 | 217 |
TADbit | TADbit is a complete Python library to deal with all steps to analyze model and explore 3C-based data With TADbit the user can map FASTQ files to obtain raw interaction binned matrices Hi-C like matrices normalize and correct interaction matrices identify and compare the so-called Topologically Associating Domains TADs build 3D models from the interaction matrices and finally extract structural properties from the models TADbit is complemented by TADkit for visualizing 3D models | https://github.com/3DGenomes/TADbit | 51 | 15 | 48 | 10.1371/journal.pcbi.1005665 | PLoS Computational Biology | 2017 | 18.4 | 3.955 | 4.49 | 22 |
weblogo | WebLogo 3 Sequence Logos redrawn | https://github.com/WebLogo/weblogo | 51 | 10 | 20 | 10.1101/gr.849004 | Genome Research | 2004 | 9.5 | 10.101 | 11.65 | 4467 |
targetfinder | https://github.com/shwhalen/targetfinder | 50 | 10 | 15 | 10.1038/ng.3539 | Nature Genetics | 2016 | 194.74 | 27.125 | 21.12 | 79 |