Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate INFO column with VariantAnnotation::writeVcf #61

Open
Al-Murphy opened this issue May 31, 2022 · 4 comments
Open

Populate INFO column with VariantAnnotation::writeVcf #61

Al-Murphy opened this issue May 31, 2022 · 4 comments

Comments

@Al-Murphy
Copy link

Al-Murphy commented May 31, 2022

Hi ,

I'm the creator of the MungeSumstats Bioconductor package in which we use VariantAnnotation::writeVcf as an output options for the formatted sumstats. I want to align the formatting of our sumstats to IEU OpenGWAS VCF, part of which means adding the RSIDs to the INFO column in the VCF. However, I can't seem to populate INFO with VariantAnnotation::writeVcf. Is there any way to do this?

An example:

#get example data 
sumstats_dt <- MungeSumstats::formatted_example()
#convert to VRanges
gr <- to_granges(sumstats_dt)
message("Converting summary statistics to VRanges.")
gr$dummy <- "GWAS"
vr <- VariantAnnotation::makeVRangesFromGRanges(gr,
    ref.field = "A1", 
    alt.field = "A2",
    keep.extra.columns = TRUE,
    sampleNames.field = "dummy"
)
#make SNP field into INFO
vr$INFO <- paste0("RSID=",vr$SNP)
vr$SNP <- NULL
#Output as VCF
VariantAnnotation::writeVcf(
            obj = vr,
            ### Must supply filename without compression suffix
            filename ="~/Downloads/test_vcf.vcf",
            index = FALSE
        )
#inspect
print(readLines("~/Downloads/test_vcf.vcf",100))
#INFO not populated
@vjcitn
Copy link
Contributor

vjcitn commented May 31, 2022

Have you tried running your example, but preceding this with

debug(VariantAnnotation:::.makeVcfMatrix) # note the :::

to get clues as to why the INFO data are lost? Please consider making a PR if you find the answer.

@Al-Murphy
Copy link
Author

Hey Vince!

Tried to look into this but can't seem to install the 1.99.3+ version of Rhtslib on mac, have you come across this issue?

But from what I can tell without getting into the details of Variantannotation, when writeVcf is called on a VRanges object I believe it's the below function the one that's called from here:

setMethod(writeVcf, c("VCF", "connection"),
    function(obj, filename, index = FALSE, ...)
{
    if (!isTRUEorFALSE(index))
        stop("'index' must be TRUE or FALSE")

    if (!isOpen(filename)) {
        open(filename)
        on.exit(close(filename))
    }

    scon <- summary(filename)
    headerNeeded <- !(file.exists(scon$description) &&
                      file.info(scon$description)$size !=0) 

    if (headerNeeded) {
        hdr <- .makeVcfHeader(obj)
        writeLines(hdr, filename)
    }

    if (index)
        obj <- sort(obj)

    if (all(is.na(idx <- .chunkIndex(dim(obj)[1L], ...))))
        .makeVcfMatrix(filename, obj)
    else
        for (i in idx)
            .makeVcfMatrix(filename, obj[i])
    flush(filename)

    if (index) {
        filenameGZ <- bgzip(scon$description, overwrite = TRUE)
        indexTabix(filenameGZ, format = "vcf")
        unlink(scon$description)
        invisible(filenameGZ)
    } else {
        invisible(scon$description)
    }
})

Then the call VariantAnnotation:::.makeVcfMatrix uses the line INFO <- .makeVcfInfo(info(obj), length(rd)) to get the info field where the function info() is defined as:

setMethod("info", "VCFHeader", 
    function(x) 
{
    info <- slot(x, "header")$INFO
    if (is.null(info))
        info <- DataFrame(Number=integer(), Type=character(),
                          Description=character())
    info
})

The issue is that VRanges don't have a slot(x, "header") value so I don't think it would ever pick it up? Unless I misinterpreted the function calls here?

@Al-Murphy
Copy link
Author

Wait I figured a way around it:

VariantAnnotation::writeVcf(
  obj = VariantAnnotation::asVCF(vr,info='RSID'),
  ### Must supply filename without compression suffix
  filename ="~/Downloads/test_vcf.vcf",
  index = FALSE
)

Just converting to VCF class yourself and specifying the column to go into info in the writeVcf call works

@vjcitn
Copy link
Contributor

vjcitn commented Jun 1, 2022

Thank you for persevering. If you see a documentation enhancement that will prevent other users from hitting your problem please make a PR. I think we should leave this open for now.

@vjcitn vjcitn reopened this Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants