Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liftover weird behaviour #104

Open
gmauro opened this issue Aug 9, 2024 · 1 comment
Open

liftover weird behaviour #104

gmauro opened this issue Aug 9, 2024 · 1 comment

Comments

@gmauro
Copy link
Contributor

gmauro commented Aug 9, 2024

Hello @Cloufield
I'm getting a weird error when performing a liftover that I can't explain.
Gwaslab is unable to lift 4 SNPs, while when I use the liftover librarydirectly it works.
Can you help me find out what's going on?
Thank you

These are script and data files to reproduce the problem:
test.py.txt
unmapped_fromLB.txt

2024/08/09 10:15:28 GWASLab v3.4.47 https://cloufield.github.io/gwaslab/
2024/08/09 10:15:28 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com
2024/08/09 10:15:28 Start to initialize gl.Sumstats from file :unmapped_fromLB.txt
2024/08/09 10:15:28 -Reading columns : EAF,EA,NEA,CHR,POS,MLOG10P,SE,SNPID,BETA
2024/08/09 10:15:28 -Renaming columns to : EAF,EA,NEA,CHR,POS,MLOG10P,SE,SNPID,BETA
2024/08/09 10:15:28 -Current Dataframe shape : 4 x 9
2024/08/09 10:15:28 -Initiating a status column: STATUS ...
2024/08/09 10:15:28 #WARNING! Version of genomic coordinates is unknown...
2024/08/09 10:15:29 -NEAF is specified...
2024/08/09 10:15:29 -Checking if 0<= NEAF <=1 ...
2024/08/09 10:15:29 -Converted NEAF to EAF.
2024/08/09 10:15:29 -Removed 0 variants with bad NEAF.
2024/08/09 10:15:29 Start to reorder the columns...v3.4.47
2024/08/09 10:15:29 -Current Dataframe shape : 4 x 10 ; Memory usage: 19.94 MB
2024/08/09 10:15:29 -Reordering columns to : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,MLOG10P,STATUS
2024/08/09 10:15:29 Finished reordering the columns.
2024/08/09 10:15:29 -Column : SNPID CHR POS EA NEA EAF BETA SE MLOG10P STATUS
2024/08/09 10:15:29 -DType : object string int64 category category float64 float64 float64 float64 category
2024/08/09 10:15:29 -Verified: T F T T T T T T T T
2024/08/09 10:15:29 #WARNING! Columns with possibly incompatible dtypes: CHR
2024/08/09 10:15:29 -Current Dataframe memory usage: 19.94 MB
2024/08/09 10:15:29 Finished loading data successfully!
Gwaslab: Number of SNPs before liftover: 4
2024/08/09 10:15:36 Start to perform liftover...v3.4.47
2024/08/09 10:15:36 -Current Dataframe shape : 4 x 10 ; Memory usage: 19.94 MB
2024/08/09 10:15:36 -Number of threads/cores to use: 3
2024/08/09 10:15:36 -Creating converter : hg19 to hg38
2024/08/09 10:15:36 -Converting variants with status code xxx0xxx :4...
2024/08/09 10:15:36 -Removed unmapped variants: 4
2024/08/09 10:15:36 Start to fix chromosome notation (CHR)...v3.4.47
2024/08/09 10:15:36 -Current Dataframe shape : 0 x 10 ; Memory usage: 3.82 MB
2024/08/09 10:15:36 -Checking CHR data type...
2024/08/09 10:15:36 -Variants with standardized chromosome notation: 0
2024/08/09 10:15:36 -All CHR are already fixed...
2024/08/09 10:15:37 Finished fixing chromosome notation (CHR).
2024/08/09 10:15:37 Start to fix basepair positions (POS)...v3.4.47
2024/08/09 10:15:37 -Current Dataframe shape : 0 x 10 ; Memory usage: 3.82 MB
2024/08/09 10:15:37 -Converting to Int64 data type ...
2024/08/09 10:15:38 -Position bound:(0 , 250,000,000)
2024/08/09 10:15:38 -Removed outliers: 0
2024/08/09 10:15:38 -Removed 0 variants with bad positions.
2024/08/09 10:15:38 Finished fixing basepair positions (POS).
2024/08/09 10:15:38 Finished liftover.
Gwaslab: Number of SNPs after liftover: 0

The following instead, making direct use of the Liftover library

19:54784936 has been lifted to [('chr19', 54281081, '+')]
19:54815387 has been lifted to [('chr19', 54304112, '+')]
19:54813762 has been lifted to [('chr19', 54302487, '+')]
22:24334948 has been lifted to [('chr22', 23992755, '+')]

@Cloufield
Copy link
Owner

Hi @gmauro,
thanks for the feedback. I replicated this. But I think the problem might be related to 0/1 coordinate systems (https://www.biostars.org/p/84686/). POS in sumstats is 1-based but for liftover we need 0-based coordinates. (gwaslab converts 1-based POS to 0-based POS and then calls converter, after getting 0-based results, gwaslab converts it back to 1-based.)

When you called liftover directly, you used converter = get_lifter('hg19', 'hg38', one_based=False).
one_based=False (the default setting I think) means that your query is 0-based but the sumstats are actually 1-based.
By one_based=False, converter["19"][54784936] actually converted the position of the variant 19:54784937:X:X instead of 19:54784936:X:X. Theoretically, you might need to use converter["19"][54784936-1] for 19:54784936 or set one_based=True. But in that case, the results are empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants