Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use result of lassosum to plink #17

Open
biostat0903 opened this issue Aug 22, 2019 · 16 comments
Open

how to use result of lassosum to plink #17

biostat0903 opened this issue Aug 22, 2019 · 16 comments

Comments

@biostat0903
Copy link

biostat0903 commented Aug 22, 2019

Hi,
When I run the example of lassosum, it is difficult for me to get the same result from lassosum and plink. lassosum code is:

library(lassosum)
setwd(system.file("data", package="lassosum")) 
ss <- fread("summarystats.txt")
ref.bfile <- "refpanel"
test.bfile <- "testsample"
LDblocks <- "EUR.hg19"
cor <- p2cor(p = ss$P_val, n = 60000, sign=log(ss$OR_A1))
out <- lassosum.pipeline(cor=cor, chr=ss$Chr, pos=ss$Position, 
                         A1=ss$A1, A2=ss$A2, # A2 is not required but advised
                         ref.bfile=ref.bfile, test.bfile=test.bfile,
                         LDblocks = LDblocks)
set.seed(20190822)
pheno <- rnorm(nrow.bfile(out$test.bfile))
v <- validate(out, pheno=pheno)
beta <- v$best.beta
idx_eff <- which(beta != 0) ##lasso zero
idx <- out$sumstats$order[idx_eff]
beta_dat <- data.frame(ss[idx, 1], ss[idx, 3], ss[idx, 4], beta[idx_eff])
test_bim <- data.frame(fread("testsample.bim"))
test_snp <- test_bim[match(beta_dat[, 2], test_bim[, 4]), 2]
beta_dat2 <- data.frame(test_snp, beta_dat[, 3], beta_dat[, 4])
write.table(beta_dat2, file = "eff.txt", col.names = F, row.names = F, quote = F)

plink code is:

cd your/lassosum/datapath/
plink-1.9 --bfile testsample --score eff.txt 1 2 3 sum --out pheno_testsample

Result of lassosum:

head(v$beta.pgs)
-0.418 0.46 -0.451 0.165 0.102

Result of plink:

 FID       IID  PHENO    CNT   CNT2 SCORESUM
   0   HG00096     -9   1634    730 -0.307637
   0   HG00097     -9   1634    748 -0.102278
   0   HG00099     -9   1634    761 -0.202595
   0   HG00100     -9   1634    744 -0.0630223
   0   HG00101     -9   1634    734 -0.221991
   0   HG00102     -9   1634    740 -0.139318
   0   HG00103     -9   1634    716 -0.286505
   0   HG00105     -9   1634    722 -0.221223
   0   HG00106     -9   1634    746 -0.237664

Could you tell me the mistake? Thanks!
Best,
Sheng

@wavefancy
Copy link

Following on this thread. If developers can support output the adjusted beta, that would be the best. We can score by plink for support new file format or even dosage format.

Best regards
Wallace

@tshmak
Copy link
Owner

tshmak commented Sep 11, 2019

lassosum and plink are not doing the same thing so I don't understand why you would expect the same result from lassosum and plink.

@tshmak
Copy link
Owner

tshmak commented Sep 11, 2019

What do you mean by adjusted beta? From your code, you have generated beta by beta <- v$best.beta. That is correct. Furthermore, out$keep.test will give you a logical vector to indicate which SNPs in the test .bed file map to the betas.

@wavefancy
Copy link

lassosum and plink are not doing the same thing so I don't understand why you would expect the same result from lassosum and plink.

We don't expect the plink and lassosum do the same thing. We just want to use plink's scoring function to use lassosum's adjusted betas. Only the last step.

Thanks much for the prompt reply.

Best regards
Wallace

@tshmak
Copy link
Owner

tshmak commented Sep 11, 2019

Can you explain what you mean by "adjusted beta" and how it is supposed to differ from "beta"?

@wavefancy
Copy link

Will the lassosum take in the GWAS summary (univariate beta) and LD structure between snps to update the univariate beta to a new beta (adjusted beta) to control LD between SNPs. And then time the new beta (adjusted beta) with effective allele dosage (0/1/2) to have the PRS? Apologize if my understanding is wrong.

Then our purpose is to use the new beta (adjusted beta) to calculate individual PRS by plink other than by lassosum, picking up the best parameter combination by lassosum is OK.

Best regards
Wallace

@tshmak
Copy link
Owner

tshmak commented Sep 12, 2019

If X denote the test genotype matrix, then v$best.pgs = X %*% v$best.beta, which is the "adjusted beta" you described. It could be different from what you get from plink because, (1) plink scales the PGS after X %*% v$best.beta (according the number of SNPs used), (2) the handling of missing genotype in plink maybe different. Try specifying --fill-missing-a2 in plink, and it should have a correlation of nearly 1 with v$best.pgs.

@privefl
Copy link
Contributor

privefl commented Sep 12, 2019

What I understand from the question:

  • v$beta.pgs is already the "adjusted betas" (meaning directly usable on 0/1/2 genotypes?)
  • use these effects to compute individual scores with PLINK
  • individual scores computed with PLINK are not the same as v$best.pgs (correlation of 0.573)
  • PLINK option --fill-missing-a2 does not work with --score (and it does not seem there is missing data in this example data)

@wavefancy
Copy link

@privefl, I observed the same thing a while ago as you did.

Best regards
Wallace

@biostat0903
Copy link
Author

biostat0903 commented Jan 28, 2020

I use the best.pgs from validate. Then I calculate the R2 and MSE between predicted phenotype and testing phenotype. R2 looks good, but MSE is bias.
Best,
Sheng

@dheerajbobbili1988
Copy link

dheerajbobbili1988 commented Jul 22, 2020

Hi,

Following on the same thread, would it be possible to export the "v$best.beta" along with their corresponding SNP ids, a1 and a2 alleles? I managed to export the adjusted betas and the SNPids, however I'm unsure about the a1 that is used by lassosum. Does it swap the alleles while adjusting the betas? If yes, is there a way to export it too?

Best,
Dheeraj.

@biostat0903
Copy link
Author

Hi,
When I use the v0.4.4, the bug is fixed. Both mse and r2 looks good. Thanks!

Best,
Sheng

@tshmak
Copy link
Owner

tshmak commented Aug 13, 2020

@dheerajbobbili1988 ,

Yes. If out is the output from the lassosum.pipeline() function, then the out$sumstats object is a data.frame which gives the details of the SNPs used in v$best.beta. You can simply concatenate the two, e.g. with, sumstats <- cbind(out$sumstats, v$best.beta).

@tshmak
Copy link
Owner

tshmak commented Aug 13, 2020

Incidentally, if you know the A1,A2 of the bfile is the same as that in test.bfile, you can use pgs(test.bfile, v$best.beta) to generate the PGS also. The pgs function also supports parallel processing using the cluster option.

@dheerajbobbili1988
Copy link

Yes, it seemed to work. Thanks a lot!!

@Mahantesh-Biradar
Copy link

I'm trying to export the the final pgs using the option: pgs(out$test.bfile, target.res$best.beta)

However, I'm getting the following error. How to resolve this? TIA

Error in .pgs.default(weights = weights, bfile = bfile, ...) :
Number of rows in (or vector length of) weights does not match number of selected columns in bfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants