-
Notifications
You must be signed in to change notification settings - Fork 6
PanPhlAn mapping 3_0
panphlan_map.py
requires bowtie2 and samtools in order to map metagenomic samples against the pangenome. The function must be called once for each sample file. Output generated can finaly be analyzed by panphlan_profile.py
After the bowtie2 mapping, the piling up method from Samtools is called. For details look here. This leads to a file with pileup format summarizing the aligned reads on each base of the sequence.
Then, for each read that was aligned, PanPhlAn will check for each gene, if the reads got aligned on a position (base) belonging to this gene sequence region, then it increase the coverage value of the gene by 1. So one could see the value in the output file as the number of reads aligned to that gene.
Mapping requires data retrieved using panphlan_download_pangenome.py
: bowtie2 indexes, and pangenome tsv file (location of gene on the contig of the reference genomes).
Example:
panphlan_map.py -p Eubacterium_rectale/Eubacterium_rectale_pangenome.tsv \
--indexes Eubacterium_rectale/Eubacterium_rectale \
-i sample01.fastq \
-o map_results/sample01_erectale.csv
-
-i INPUT_FILE
input path to a metagenomic sample. The following file formats are accepted:.fasta
orfastq
with compression.tar.gz
or.tar.bz2
, and.sra
. -
--indexes INDEXES
Path to Bowtie2 indexes and indexes prefix -
-p PANGENOME
Path to the pangenome file
Results are written in a tab-separated values format. File path and name are given by -o OUTPUT
If no --output
argument is provided, results will be printed in STDOUT.
usage: panphlan_map.py [-h] -i INPUT --indexes INDEXES -p PANGENOME -o OUTPUT [--tmp TMP] [--bt2 BT2] [-b OUT_BAM] [--nproc NPROC] [--min_read_length MIN_READ_LENGTH] [--th_mismatches TH_MISMATCHES]
[-m SAM_MEMORY] [--fasta] [-v]
optional arguments:
-h, --help show this help message and exit
--tmp TMP Location used for tmp files
--bt2 BT2 Additional bowtie2 mapping options, separated by slash: /-D/20/-R/3/, default: -bt2 /--very-sensitive/
-b OUT_BAM, --out_bam OUT_BAM
Get BAM output file
--nproc NPROC Maximum number of processors to use. Default is 12 or a lower number of available processors.
--min_read_length MIN_READ_LENGTH
Minimum read length, default 70
--th_mismatches TH_MISMATCHES
Number of mismatches to filter (bam)
-m SAM_MEMORY, --sam_memory SAM_MEMORY
Maximum amount of memory for Samtools (in Gb). Default 4
--fasta Read are fasta format. By default considered as fastq
-v, --verbose Show progress information
required arguments:
-i INPUT, --input INPUT
Metagenomic sample to map
--indexes INDEXES Bowtie2 indexes path and file prefix
-p PANGENOME, --pangenome PANGENOME
Path to pangenome tsv file exported from ChocoPhlAn
-o OUTPUT, --output OUTPUT
Path to output file
PanPhlAn is a project of the Computational Metagenomics Lab at CIBIO, University of Trento, Italy.
- PanPhlAn 3.0
- PanPhlAn 1.3