Skip to content

PanPhlAn mapping 3_0

leonarDubois edited this page Jan 20, 2021 · 5 revisions

panphlan_map.py requires bowtie2 and samtools in order to map metagenomic samples against the pangenome. The function must be called once for each sample file. Output generated can finaly be analyzed by panphlan_profile.py

After the bowtie2 mapping, the piling up method from Samtools is called. For details look here. This leads to a file with pileup format summarizing the aligned reads on each base of the sequence.

Then, for each read that was aligned, PanPhlAn will check for each gene, if the reads got aligned on a position (base) belonging to this gene sequence region, then it increase the coverage value of the gene by 1. So one could see the value in the output file as the number of reads aligned to that gene.

Mapping requires data retrieved using panphlan_download_pangenome.py : bowtie2 indexes, and pangenome tsv file (location of gene on the contig of the reference genomes).

Example:

panphlan_map.py -p Eubacterium_rectale/Eubacterium_rectale_pangenome.tsv \
                --indexes Eubacterium_rectale/Eubacterium_rectale \    
                -i sample01.fastq \
                -o map_results/sample01_erectale.csv

Input

  • -i INPUT_FILE input path to a metagenomic sample. The following file formats are accepted: .fasta or fastq with compression .tar.gz or .tar.bz2, and .sra.
  • --indexes INDEXES Path to Bowtie2 indexes and indexes prefix
  • -p PANGENOME Path to the pangenome file

Output

Results are written in a tab-separated values format. File path and name are given by -o OUTPUT If no --output argument is provided, results will be printed in STDOUT.

Help -h

usage: panphlan_map.py [-h] -i INPUT --indexes INDEXES -p PANGENOME -o OUTPUT [--tmp TMP] [--bt2 BT2] [-b OUT_BAM] [--nproc NPROC] [--min_read_length MIN_READ_LENGTH] [--th_mismatches TH_MISMATCHES]
                       [-m SAM_MEMORY] [--fasta] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --tmp TMP             Location used for tmp files
  --bt2 BT2             Additional bowtie2 mapping options, separated by slash: /-D/20/-R/3/, default: -bt2 /--very-sensitive/
  -b OUT_BAM, --out_bam OUT_BAM
                        Get BAM output file
  --nproc NPROC         Maximum number of processors to use. Default is 12 or a lower number of available processors.
  --min_read_length MIN_READ_LENGTH
                        Minimum read length, default 70
  --th_mismatches TH_MISMATCHES
                        Number of mismatches to filter (bam)
  -m SAM_MEMORY, --sam_memory SAM_MEMORY
                        Maximum amount of memory for Samtools (in Gb). Default 4
  --fasta               Read are fasta format. By default considered as fastq
  -v, --verbose         Show progress information

required arguments:
  -i INPUT, --input INPUT
                        Metagenomic sample to map
  --indexes INDEXES     Bowtie2 indexes path and file prefix
  -p PANGENOME, --pangenome PANGENOME
                        Path to pangenome tsv file exported from ChocoPhlAn
  -o OUTPUT, --output OUTPUT
                        Path to output file