Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in splitvec.from.bfile(bfile) : length(pvec) == length(bfile) is not TRUE #43

Open
rainajia opened this issue Oct 11, 2022 · 6 comments

Comments

@rainajia
Copy link

rainajia commented Oct 11, 2022

Hi, I ran the pipeline by chromosomes using the same ref.file and test.file for each chromosome, then merged the output variables together using "merge" in a loop.
However, when I used "validate", it threw the error:
Error in splitvec.from.bfile(bfile) :length(pvec) == length(bfile) is not TRUE
Could you explain to me what might caused the error?

@tshmak
Copy link
Owner

tshmak commented Oct 12, 2022

Can you give me your entire script?

@rainajia
Copy link
Author

rainajia commented Oct 12, 2022

Can you give me your entire script?
Hi, my original code is attached below. I have realised that to merge the lassosum.pipeline output, it didn't work when I merge them in a loop, but it worked when I do merge(out1, out2,out3...out22). However, it is taking very long to run validate with the merged "out", I have a large sample size of ~400k for my phenotype, which validation method would be the most efficient for large sample sizes?

for(i in 1:22){
print(paste0("now processing chromosome ",i))
bfile <- paste0("./Chr",i")
rfile <- paste0("../Chr",i,"_Random25k")

tmp <-
  lassosum.pipeline(
  cor = cor,
  chr = ss$CHR,
  pos = ss$POS,
  A1 = ss$A1,
  A2 = ss$A2,
  ref.bfile = rfile,
  test.bfile = bfile,
  max.ref.bfile.n=25000,  
  LDblocks = LDblocks, 
  cluster=cl)

  if(i==1){
       out <- tmp
       }else{
       out <- merge(out,tmp)
       }
}
target.res <- lassosum::validate(out, pheno = as.data.frame(pheno), covar=as.data.frame(cov))

@tshmak
Copy link
Owner

tshmak commented Oct 12, 2022

So are you still getting this error splitvec.from.bfile(bfile) : length(pvec) == length(bfile) is not TRUE. And if so, at which stage?

@rainajia
Copy link
Author

rainajia commented Oct 12, 2022

So are you still getting this error splitvec.from.bfile(bfile) : length(pvec) == length(bfile) is not TRUE. And if so, at which stage?

I don't get this error anymore when I do validate(out) where "out <- merge(out1, out2, out3...out22)". The error occured previously when I do validate(out) where out is merged by each of the lassosum.poipeline output from chromosomes in a for loop as shown in the code above. Sorry about the confusion, my current question is which validatation method to use for large samples. I have 400k samples for matched genotype and phenotype, and the previous run with validate(out, pheno,covar) has ran over 9 hours with 40 cores. I was wondering if this is normal behaviour, or is there a better way to parallelise it?

@tshmak
Copy link
Owner

tshmak commented Oct 12, 2022

Yes, calculating PGS can take a long time with a large sample size. One way to speed up the calculation is to use multiprocessing (see here). Another way is to try to ensure the covar and the pheno is in the exact order as test.bfile. (Maybe you need to ensure there are no missing values also, but I can't remember if that's the case.) If everything matches exactly, you will not see the message Calculating PGS..., and it should be very fast.

@rainajia
Copy link
Author

Yes, calculating PGS can take a long time with a large sample size. One way to speed up the calculation is to use multiprocessing (see here). Another way is to try to ensure the covar and the pheno is in the exact order as test.bfile. (Maybe you need to ensure there are no missing values also, but I can't remember if that's the case.) If everything matches exactly, you will not see the message Calculating PGS..., and it should be very fast.

Thanks very much, Calculating PGS... was exactly what I have been seeing. I will double check on these points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants