Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsamtools::pileup counts overlapping paired-end reads twice #13

Open
alkodsi opened this issue Oct 10, 2019 · 5 comments
Open

Rsamtools::pileup counts overlapping paired-end reads twice #13

alkodsi opened this issue Oct 10, 2019 · 5 comments

Comments

@alkodsi
Copy link

alkodsi commented Oct 10, 2019

Is there a way to disable this behavior?

@mtmorgan
Copy link
Collaborator

can you clarify the problem and indicate the code you are using?

@alkodsi
Copy link
Author

alkodsi commented Oct 10, 2019

The problem is when paired end reads are overlapping, such as the example below:

Read1 (first pair):  ------------------ACGT
Read1 (second pair):                   ACGT--------------------

the pileup function will count the overlapping parts twice although they belong to the same fragment.

A related old issue in samtools:
https://www.biostars.org/p/87299/

The code I used:

gr <- GenomicRanges::GRanges(chr, IRanges::IRanges(pos, pos))
sbp <- Rsamtools::ScanBamParam(which = gr)
pileupParam <- Rsamtools::PileupParam(max_depth = 100000, min_base_quality = 20,  min_mapq = 30, distinguish_strands = F, include_deletions = F, include_insertions = F)
    
p <- Rsamtools::pileup(bam, scanBamParam = sbp, pileupParam = pileupParam)

@veseshan
Copy link

veseshan commented Oct 8, 2021

I am using Rsamtools v2.8.0 and pileup still double counts the overlapping section of read pairs.
Can I request that it mimic the behavior of "samtools mpileup" which by default uses overlap detection and counts it only once.

@hpages
Copy link
Contributor

hpages commented Oct 8, 2021

Hi,

Rsamtools::pileup() was implemented years ago (in 2014) by a Bioconductor core team member (Nathaniel Hayden) who has left the team since then. It was a significant effort i.e. Nathaniel spent a couple of months on it and implemented most of it in C/C++. Unfortunately the core team does not have the resources at the moment to fix the double counting problem but a PR would be welcome.

Best,
H.

@veseshan
Copy link

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants