Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical flaw in our propagation of "absent" calls #230

Open
fbastian opened this issue Oct 16, 2020 · 2 comments
Open

Logical flaw in our propagation of "absent" calls #230

fbastian opened this issue Oct 16, 2020 · 2 comments

Comments

@fbastian
Copy link
Member

I copy here the answer I gave to a ticket, that allowed me to spot this problem:

There often exist contradicting reports of presence or absence of expression of a gene in a condition, from different assays and experiments. For now in Bgee, "present" calls always win over "absent" calls. (we are currently benchmarking improved methods to reconcile this better).

Looking more precisely at your results (an information that you could find in our download files with advanced columns, see https://bgee.org/?page=download&action=expr_calls#id15), it appears that 3 RNA-Seq experiments have detected expression of ENSCAFG00000011243 in testis, 1 experiment has reported absence of expression.
Even more precisely (an information that you could find in our processed annotated data, see https://bgee.org/?page=download&action=proc_values#id15), 11 RNA-Seq libraries from these 3 experiments showed expression of this gene, 1 library from 1 experiment showed absence of expression (experiment SRP007359, library SRX080218).
=> The report of expression of ENSCAFG00000011243 in testis is reliable.

About the absence of expression in left/right testis: this is where the flaw appears.
Summary: you can discard this report of absence of expression. And we will fix this flaw in our upcoming Bgee 14.2 release, you contacted us just in time for that
Detailed explanation (that might not be of interest to you):
We propagate calls of presence of expression to all parent conditions, such as, for instance, if we found expression of a gene in e.g. "midbrain", we report expression of the gene in "brain", "nervous system", etc...
We also propagate calls of absence of expression one level down to child anatomical entities. And in the case of ENSCAFG00000011243 in left/right testis, the absence of expression found in this one experiment SRP007359 in testis has been propagated to left/right testis... since we had no other data for left/right testis, this call of absence of expression has not been contradicted by a call of presence of expression.
=> Obviously, we should not propagate calls of absence of expression that have already been contradicted in the condition they were produced from.

=> Either we should stop propagating calls of absence of expression, or at the very least we should not propagate them when they have been already contradicted.

@marcrr
Copy link

marcrr commented Oct 16, 2020

Suggestion: propagate only the consensus call from all data in a condition.

That poses the problem of selecting calls per data type, since the information would be lost. One answer could be that the propagation of an RNA-seq call is no longer an "RNA-seq" call, but a "propagation" type call.

@fbastian
Copy link
Member Author

Good point, I didn't think of the case where an Affymetrix absent call could have been "canceled out" by a RNA-Seq present call, but would still be propagated. I'll think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants