Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on LocusZoom Plot Enhancements #120

Open
Sandyyy123 opened this issue Dec 2, 2024 · 7 comments
Open

Feedback on LocusZoom Plot Enhancements #120

Sandyyy123 opened this issue Dec 2, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@Sandyyy123
Copy link

Sandyyy123 commented Dec 2, 2024

I wanted to reach out regarding potential improvements to the zoom plot visualization.

It can often be challenging to explain the relationship between the upper and lower panels of a locus zoom plot, especially when the top hit in the lower panel differs from the one in the upper panel. This issue arises because the lower panel does not explicitly indicate which SNP it is showing LD with. Additionally, when the lower panel SNP is different from the upper panel SNP, it is often left unlabeled, leading to further confusion.

Things become even more complicated when an entirely different SNP is the focus, which is neither the top SNP in the upper panel nor in the lower panel (as demonstrated in the attached example).

Would it be possible to include an LD box or label within the lower panel that specifies which SNP the LD is being calculated against, along with corresponding color codes? This enhancement would make it much clearer and eliminate the assumption that LD in the lower panel is always relative to the SNP marked in the upper LD panel box.

Thank you for considering this suggestion, and I look forward to your thoughts!

Best regards,
Screenshot 2024-12-02 014802

@Sandyyy123
Copy link
Author

Sandyyy123 commented Dec 2, 2024

As you can see in the figure, there are 3 different SNPs, and only one is labeled. It is not clear which is the top SNP in EUR and there is no corresponding LD box with colors, along with the SNP. Even for EAS, one gets an idea about the top SNP from LD box (rs74284577) only.

@Cloufield
Copy link
Owner

Hi,
thank you very much for the feedback. I totally agree with you on your suggestions and that is what I am working on right now.
Actually, using newer versions gwaslab>=3.5.0, you can use region_ld_legends = [ index of the panels] to add legends for other panels like:

gl.plot_stacked_mqq(objects=[gl1,gl2],
                    vcfs=[gl.get_path("1kg_eas_hg19"), gl.get_path("1kg_eas_hg19")],
                    region=(19,46214297 - 300000, 46214297 + 300000),
                    build="19",
                    mode="r",
                    anno="SNPID",
                    titles=["Male","Female"],
                    region_ld_legends=[0,1],
                    anno_set2={"rs1055220"},
                    region_ref2="rs1055220",
                    title_args={"size":20},
                    anno_args={"rotation":0}, verbose=True, check=False)

image

BTW, you can customize individual panels by adding a number to the options like anno_set2, which will pass anno_set to the second plot.

You can also try using multiple reference variants region_ref=["rs1055220","rs35560038","rs4802274"] like

gl.plot_stacked_mqq(objects=[gl1,gl2],
                    vcfs=[gl.get_path("1kg_eas_hg19"), gl.get_path("1kg_eas_hg19")],
                    region=(19,46214297 - 300000, 46214297 + 300000),
                    build="19",
                    mode="r",
                    anno="SNPID",
                    titles=["Male","Female"],
                    anno_set2={"rs1055220"},
                    region_ref=["rs1055220","rs35560038","rs4802274"],
                    title_args={"size":20},
                    anno_args={"rotation":0}, verbose=True, check=False)

this might make more sense when you have multiple variants to compare.
image

What I am working on is to include the maker in the legends to make it more clear as you suggested. I will let you know after I finished this.
image

@Cloufield Cloufield added the enhancement New feature or request label Dec 2, 2024
@Sandyyy123
Copy link
Author

Sandyyy123 commented Dec 2, 2024

Thank you for your great work on this tool. Let us consider a specific scenario where the goal is to highlight the top variant for Females, Males, and the combined Male-Female dataset (specified as the reference in the main script). While there may or may not be overlaps between these three variants, I wanted to share some ideas that might help improve the clarity and usability of the plot.

LD Reference SNP Display
I’m perfectly fine with using the top variants in Females and Males as the LD reference SNPs for their respective panels (as it currently does) and labels the top variant in combined dataset within a box. However, it might be an improvement to:
Add SNP labels and a legend for the bottom panel, especially when the top SNP in the combined dataset (specified as the reference SNP in the script) differs from the SNP in the bottom panel.
Add SNP labels for the top panel, particularly when the top SNP in the combined dataset (specified as the reference SNP in the main script) differs.

Label Differentiation
To enhance clarity, you could consider labeling the top SNPs in both the upper and lower panels without enclosing them in boxes when they differ from the top SNP in the combined dataset (specified as the reference SNP in the script). This would visually distinguish them from the reference SNP specified by the user in the combined dataset.

Automation for Identifying Top SNPs
It feels somewhat cumbersome to manually identify and specify the top SNPs for the lower panel using anno_set2 and region_ref2 in the script. It would be helpful if the tool could automatically detect and label the top SNPs for both the upper and lower panels.

Perhaps, one can also have 3 panels in such a scenario. That would be the best solution.

These are just my thoughts as a user. I’m not an expert, so please feel free to correct me or share your perspective. I hope this feedback is helpful and contributes to improving the tool’s functionality.

@Cloufield
Copy link
Owner

Hi,
Thank you so much for the very detailed feedback on how to improve the stacked regional plot in terms of clarity and usability.
For the three points you mentioned (LD Reference SNP Display, Label Differentiation and Automation for Identifying Top SNPs), I think these are very useful when plotting a few separate datasets and a combined datasets.

To summarize, I guess I will implement the workflow as:

  1. identify top variants within each panel (this could be done by mysumstats.get_lead().)
  2. add legend to each panel if top variants differs from each other
  3. label the auto-detected top variants and user-specified variants differently
  4. other customizations

I will try to do some tests and implement this in next a few versions possibly. Thank you again for the
feedback. User feedback like yours is really helpful and often makes my thought more clear.

BTW, actually you can add up to 9 panels using plot_stacked_mqq() by simply passing a list of objects:

gl.plot_stacked_mqq(objects=[gl1,gl2,gl3],
                    vcfs=[gl.get_path("1kg_eas_hg19"), gl.get_path("1kg_eas_hg19"), gl.get_path("1kg_eas_hg19")],
                    region=(19,46214297 - 300000, 46214297 + 300000),
                    build="19",
                    mode="r",
                    region_ref=["rs35560038","rs76938031","rs16980091"],
                    anno=True,
                    anno_style="tight",
                    titles=["Male","Female","Combined"],
                    title_args={"size":20},
                    anno_args={"rotation":0}, verbose=False, check=False)

image

@Sandyyy123
Copy link
Author

I wish you all the best. You are doing outstanding work, and your data handling and visualizations are among the best I’ve seen. I eagerly look forward to your updates and hope to incorporate them into my next publication in a few months.

@Sandyyy123
Copy link
Author

Sandyyy123 commented Dec 2, 2024

Just one final comment: how would you distinguish between top variants in different panels with overlapping LD? For instance, in your latest graph, if the dark red overlaps with the dark green, it could cause some confusion. I think the best approach would be to restrict each panel to use the classical locus zoom coloring for the respective top variants. Alternatively, you could provide users with the option to choose different color schemes for each panel (e.g., red, blue, green) to make them more distinct, rather than mixing all the colors in each panel.

@Cloufield
Copy link
Owner

Thank you very much for your kind words. Regarding your comment, SNPs are colored against the reference variants with which they are in highest LD. This is consistent with the original LocusZoom regional plot for multiple reference SNPs (as described in https://genome.sph.umich.edu/wiki/LocusZoom_Standalone#Optional_Input). I will implement this an option so that users can select the style that they want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants