-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duphold use with gVCF files #41
Comments
Hi @brentp, just following up on this. Do you think it will be possible to implement such a feature in the near future? |
you mean to still send a CRAM/BAM, but to evaluate every site in a GVCF? |
Hi @brentp , CRAM/BAM file is used for coverage-based QC (DHFC, DHBFC, DHFFC) while bcf(multisample/ single -sample) file is used for snp/indel annotation i.e. compute DHGT. Therefore, In the context of snp/indel annotation, I would suggest, duphold has a feature to use gVCF file i.e a file of the following format --- where 1 14605-14609, 1-14611-14652 is represented as a single block (saves disk space) of non-variant sites. This is followed by variant only site 1-14653. Presently, 1) If I use these non-variant sites information for DHGTing a SV overlapping here, then I have to convert these gvcf files to a regular vcf files(which takes up lot of space). Secondly, 2) just using a variant-sites only vcf file(single /multisample) may not be sufficient estimator of Quality of a SV deletion---i.e. reject a SV del if we find N number of het calls within it. 3) If we use a stretch of non-variant sites from a gvcf file to accurately compute no. of Homozygous ref, then that would complete snp/indel based filtering of SV Dels. Ofcourse, we can infer the number of non-variant sites from a regular variant sites only whole genome vcf file overlapping with a DEL SV of interest, but in our pipeline we generate these gVCF files and it would help us to use duphold directly with them. Please let me know what you think? |
Since DHGT is not the primary feature of duphold (primary is depth annotation of SVs), I'm less inclined to work on this. All that said, I would look into this if you have evidence that DHGT is valuable. I always found that depth was more reliable than DHGT. |
Hello @brentp,
Thanks for this awesome tool!
It would be great, if duphold could be used to compute (B Allele freq /DHGT field) from gVCF files(where non-variant sites are represented as blocks to save space) directly, rather than be limited to regular VCF files.
Thanks and let us know if you have plans to implement such a feature.
The text was updated successfully, but these errors were encountered: