-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
genome reference assembly extraction from variants/reads objects? #13
Comments
Hi Tim, Actually I was able to find it using the java api client using this command:
Then in the the
Hope it helps, |
That is terrific; it's in there, at least. This sort of thing may become more of an issue (generally) as "variant" calls move towards graph traversals from assembly A to variant assembly representation B. But at least there's a way to extract it and prevent some issues when using the data from R, which avoids certain types of derp. Thank you! --t
|
Yeah, using a graph representation of the variants will definitely be a different approach, and that's the way GA4GH is also going. It will take a bit of work to do it efficiently, but that's the fun part :) Also below are the steps if you want to do what I did above in Java through R:
~p |
This is a great feature request and is something we should do soon. For reads, its the referenceSetId which can then be used to look up the reference set
For variants, it comes from the VCF the header and we can get it from the variant set metadata.
Per @calbach down the road we should have https://github.com/googlegenomics/api-client-java/issues/66 |
One of the handy defaults in Bioconductor GRanges/VRanges objects is a slot for the genome (reference assembly) of each chromosome (should all be the same for any sane object, of course). This helps prevent comparisons of (e.g.) hg18 and hg19, or hg19 and hg38, data as if on the same coordinate system (a practice which is such a terrible idea that it's essentially never worth allowing).
Unfortunately, this safeguard can't be enforced when no genome is specified.
In the course of adding a default seqlevelsStyle for *Ranges, I realized it would be nice to have the genome reference automatically specified when retrieving variants or aligned reads. This doesn't seem to be possible from the current data structure. What am I overlooking here?
The text was updated successfully, but these errors were encountered: