TBSeqPipe is a flexible and user-friendly pipeline based on snakemake workflow for analyzing WGS data of Mycobacterium tuberculosis complex isolates. Taking illumina WGS data as input, this workflow preforms some basic analysis tasks as well as some downstream high-level analysis steps. TBSeqPipe generates a final summary report to better integrate and present results from all analysis modules.
Conda can function as a package manager and is available here. If you have conda make sure the bioconda and conda-forge channels are added:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Detailed intsruction could be found here. Quick installation:
- Install mamba first (mamba provides a faster and more roboust way for conda packages installation):
conda install -n base -c conda-forge mamba
- Install snakemake using mamba:
conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake
git clone git@github.com:KevinLYW366/TBSeqPipe.git
conda activate snakemake
A pre-built 8 GB database MiniKraken DB_8GB is the suggested reference database for TBSeqPipe. It is constructed from complete bacterial, archaeal, and viral genomes in RefSeq.
To run the complete workflow do the following:
- Create an sample list file for all the samples you want to analyze with one ID per line.
- Copy all FASTQ files of your samples into one directory.
- Customize the workflow based on your need in:
config/configfile.yaml
. Parameters in "Required Parameters" section must be entered manually:sample_list
:/path/to/sample_list_file
data_dir
:/path/to/fastq_files
fastq_read_id_format
,fastq_suffix_format
anddata_dir_format
: give values based on the FASTQ file directory structure and the format of FASTQ file nameskraken_db
:/path/to/minikraken_20171019_8GB
- Move to the directory of TBSeqPipe.
cd /path/to/TBSeqPipe
- A dry-run is recommended at first to check if everything is okay.
snakemake -r -p -n
- If no error message shows up, let's do a formal run (feel free to modify "-j 40" which controls the CPU cores used in parallel).
snakemake --use-conda -r -p -j 40
After the workflow was killed (Snakemake didn’t shutdown), the workflow directory will be still locked. If you are sure, that snakemake is no longer running (ps aux | grep snake)
.
Unlock the working directory:
snakemake *.snakemake --unlock
If Snakemake marked a file as incomplete after a crash, delete and produce it again.
snakemake *.snakemake --ri
The code is available under the GNU GPLv3 license. The text and data are availabe under the CC-BY license.
For contacting the developer and issue reports please go to Issues.