-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend to ordinal conditions #25
Comments
Great question. We’ve recently been implementing and then assessing correlation tests including Pearson and Spearman. I believe these would basically give the detection power of ordinal testing. I like the idea of including this in an eQTL or sQTL framework. We’d have to allow for changing covariates per row of the matrix, but it’s not out of the realm of possibility… What does your genotype matrix look like? |
The eQTL tools I work with will take a variety of inputs, but all of them convert the genotypes to a data frame or array of 0/1/2 values (g x n), where n = # of samples, and g = number of SNPs within 1e6 bp of the transcriptional start site of the gene of interest. The basic algorithm employed by matrixQTL (R), fastQTL (R+multiple comparison correction via permutation), and tensorQTL(python, correction, GPU based):
It seems to me like fishpond's methods would improve the accuracy of the regression step, and the rest of it is just housekeeping. I know that I've struggled with false positive associations that are due to outlier/extreme samples skewing my data, and fishpond seems well suited to addressing that problem. The thing I'm unsure about is how easy it would be to adapt a method designed to perform a phe ~ 0/1 significance analysis to phe ~ 0/1/2 or phe ~ continuous analysis. |
We have already tested a lot the phenotype ~ integer, or phenotype ~ continuous, with time series and pseudotime analyses respectively. Things look good in simulation and the real data results look nice as well 😃 Lemme loop back here next week for more thoughts |
hi @JosephLalli I wanted to return to this. We've been thinking a lot in the lab about different aspects of modeling QTL. We've focused on distributional questions lately, and less on uncertainty in quantification. I think the fishpond framework is strong, but when you want to add in a lot of covariates (as we often need PCs, factors of unwanted expression variation etc.), the non-parametric framework starts to be less useful. Happy to chat sometime, but I think we won't be extending fishpond in this direction, but instead focused on other modeling aspects in the future. |
Your timing is uncanny @mikelove - I've also been coming back around this idea. I'd be curious to talk more here vs email about the strengths and drawbacks of using non-parametric methods of calculating mean & var for genotype-phenotype associations. My intuition was that using bootstraps would help address problems I've been encountering with reference bias* and outlier expression values creating false positive results (especially if looking at differential isoform usage). If your group has encountered difficulties applying this method, I'd love to talk more about it. *Reference bias issues & associated high expression rates of pseudogenes have been a big problem for my dataset. I'm also experimenting with using a modified version of the SEESAW/g2g tools pipeline to address this problem. |
Let’s chat on zoom as I think there’s a lot to discuss, I’m at ENAR until Wednesday. What’s a good email to reach out? |
Hi @mikelove, you can reach me at Lalli at wisc dot edu. |
Briefly looking through the code here, it seems like the basic algorithm (correlation + bootstrapping) is extendable to more than two conditions. This would in effect turn your DE software into a potential engine for uncertainty-aware eQTL/sQTL analyses.
I’m sure I’ve missed some major hurdle - as this work is at the far end of my bioinformatics knowledge - but if not, is this a feature you are considering adding?
The text was updated successfully, but these errors were encountered: