-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
locateVariants() and predictCoding() silently drop large INDELs #81
Comments
Thanks very much for this report. Would you be willing to share the VCF or a slice of it that is sufficient to reproduce the problem? |
I've attached a minimal example here. Also, sessionInfo()
|
Wonderful example. I think the problem you are observing arises from
in GenomicFeatures, where there is an overlap test that uses type "within". The initial developers of both GenomicFeatures and VariantAnnotation are long |
Ah, so it might be because the deletion extends beyond the end of a single annotated feature? |
OK -- so what are your thoughts? Would you want to make a PR that identifies the event and emits message or warning, or is the behavior reasonable in this form? We could just improve the documentation. |
I have encountered an issue with large INDELs being silently discarded from my results.
This is the offending VCF entry:
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTGCCGCACGAATCCCATGGTGGCCAGCTCGGTCGTCCCGGGGCCTCTACTGTCCAGAGTCCTCCGCGGATCCCGATCTGACGGTTCACTAAACGAGCTCTGTTTATATAGACCTCCCACCGTACACGCCTACCGCCCATTTGCGTCAACGGGGCGGGCGATCGCAGTTGTTACGACATTTTGGAAAGTCCCGTTGATTTTGGTGCCAAAACAAACTCCCATTGACGTCAATGGGGTGGAGACTTGGAAATCCCCGTGAGTCAAACCGCTATCCACGCCCATTGGTGTACTGCCAAAACCGCATCACCATGGTAATAGCGATGACTAATACGTAGATGTACTGCCAAGTAGGAAAGTCCCGTAAGGTCATGTACTGGGCATAATGCCAGGCGGGCCATTTACCGTCATTGACGTCAATAGGGGGCGGACTTGGCATATGATACACTTGATGTACTGCCAAGTGGGCAGTTTACCGTAAATACTCCACCCATTGACGTCAATGGAAAGTCCCTATTGGCGTTACTATGGGAACATACGTCATTATTGACGTCAATGGGCGGGGGTCGTTGGGCGGTCAGCCAGGCGGGCCATTTACCGTAAGTTATGTAACGCGGAACTCCATATATGGGCTATGAACTAATGACCCCGTAATTGATTACTATTAATAACTAGTCAATAATCAATGCCAACATGGCGGTCATATTGGACATGAGCCAATATAAATGTACATATTATGATATAGATACAACGTATGCAATGGCCAATAGCCAATATTGATTTATGCTATATAACCAATGAATAATATGGCTAATGGCCAATATTGAAGATCCCCGGGTACCGAGCTCGAATTCATCGATGATGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCGCCCTTCCCGCATGGCATCTCATTACCGCCCGATCCGGCGGTTTCCGCTTCCGTTCCGCATGCTAACGAGGAACGGGCAGGGGGCGGGGCCCGGGCCCCGACTTCCCGGTTCGGCGGTAATGTGATACGAGCCCCGCGCGCCCGTTGGCCGTCCCCGGGCCCCCGGTCCCGCCCGCCGGACGCCGGGACCAACGGGACGGCGGGCGGCCCTTGGGCCGCCCGCCTTGCCGCCCCCCCATTGGCCGGCGGGCGGGACCGCCCCAAGGGGGCGGGGCCGCCGGGTAAAAGAAGTGAGAACGCGAAGCGTTCGCACTTCGTCCCAATATATATATATTATTAGGGCGAAGTGCGAGCACTGGCGCCGTGCCCGACTCCGCGCCGGCCCCGGGGGCGGACCCGGGCGGCGGGGGGCGGGTCTCTCCGGCGCACATAAAGGCCCGGCGCGACCGACGCCCGCAGACGGCGCCGGCCACGAACGACGGGAGCGGCTGCGGAGCACGCGGACCGGGAGCGGGAGTCGCAGAGGGCCGTCGGAGCGGACGGCGTCGGCATCGCGACGCCCCGGCTCGGGATCGGGATCGCATCGGAAAGGGACACGCGGACGCGGGGGGGAAAGACCCGCCCACCCCACCCACGAAACACAGGGGACGCACCCCGGGGGCCTCCGACGACAGAAACCCACCGGTCCGCCTTTTTTGCACGGGTAAGCACCTTGGGTGGGCGGAGGAGGGGGGACGCGGGGGCGGAGGAGGGGGGACGCGGGGGCCGGAGGAGGGGGGACGCGGGGGCGGAGGAGGGGG G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259
This represents a deletion of 2542 bp.
The large deletion is present when I call
rowRanges(vcf)
So it is successfully being read by
readVcf()
However, when I call:
or
predictCoding(vcf, txdb, seqSource=fasta)
this variant position is not included among the results.
With some trial and error, I have determined that truncating the REF sequence to 800 characters allows the variant to be maintained, but 850 characters fails.
Success, 850 bp
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTG G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259
Fails, 900 bp
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTGCCGCACGAATCCCATGGTGGCCAGCTCGGTCGTCCCGGGGCCTCTACTGT G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259
Can you suggest a way to maintain these large INDELs among the results?
Alternatively, if it is impossible to maintain large INDELs, could a warning or error be returned instead of a silent discard?
Thanks!
The text was updated successfully, but these errors were encountered: