Sunday, February 24, 2013

My thoughts on the immortality of television sets

There's a new GB&E manuscript sensationally blasting a certain widely-reported claim of the 2012 ENCODE Consortium paper, namely that the data generated in that project "enabled us to assign biochemical functions for 80% of the genome." I'm one of 400+ authors on that paper, but I was a bit player - not at all involved in the consortium machinations that resulted in that particular wording, which has proven quite controversial, and has already been discussed/clarified by other authors big and small.

The first author of the new criticism, Dan Graur, is an authority on molecular evolution and authored a popular textbook on that topic (one I own!). The manuscript stridently argues that ENCODE erred in using a definition of "functional element" in the human genome based on certain reproducible biochemical activities, rather than a definition based on natural selection and evolutionary conservation. Interestingly, while the consortium was mostly focused on high-throughput experimental assays to identify the biochemical activities, my modest contributions to ENCODE were entirely based on examining evolutionary evidence, through sequence-level comparative genomics. So, a few comments by a former rogue evolutionary ENCODE-insider:

Definitions of function. One practical difficulty with a selection-based definition of biological function is that selection can be very difficult to detect - as Graur et al. discuss. They should also have noted that it's actually difficult for selection to even act on many traits. For they must be very well aware that significant phenotypic variants can nonetheless have essentially no effect on reproductive fitness; a disease that manifests only at advanced age, for example. Thus the evolutionary definition, taken too far, also leads to "bizarre outcomes": calling genetic loci with causal roles in such neutral traits non-functional, or else abandoning hope of identifying their functions through association with those traits, selection being the only criterion for inferring function. (Josh Whitten touches on this as well.)

Graur et al. present a similarly bad strawman about ENCODE's definition:
The ENCODE Incongruity implies that a biological function can be maintained without selection, which in turn implies that no deleterious mutations can occur in those genomic sequences described by ENCODE as functional.
This is wrong, because ENCODE didn't claim that biochemical signatures lacking evidence of selection have been, or necessarily will be, maintained over evolutionary timescales. But they may nonetheless prove highly consequential over human timescales, and to human values.

Of course, my expertise being what it is, I sat in the back of conference rooms at ENCODE meetings and thought to myself, all this non-conserved stuff, is it not crap? But the above is a humbling truth I've slowly come to accept.

Affirming the consequent, that is, applying the scientific method. One early section of the Graur et al. manuscript presents a step-by-step walkthrough of affirming the consequent, an error in deductive reasoning. They accuse ENCODE of committing this grave error by inferring function based on indirect biochemical readouts.

As an admirer of Bayesian methodologies, I'm baffled to see an appeal to deductive logic in a paper about data interpretation in the natural sciences. Graur and coauthors are surely acquainted with Bayes' theorem, but their writing here suggests they've yet to grasp it as the essence of reasoning under uncertainty - that is, of practically all reasoning performed in the natural sciences. For Bayes' theorem provides the precise justification for "affirming the consequent" in the presence of uncertainty - which is not erroneous at all, but instead permits inference to the best explanation. A hypothesis is confirmed by any body of data that its truth renders probable. This is the essence of the scientific method, and no progress in our field can be made without it!

(Dear reader, I expect that on this point you're either totally with me, or else it sounds like metaphysical gibberish. If the latter, I strongly urge you to read this book. It may change your life!)

Graur et al.'s overly conservative definition of biological function, and their flawed view that it's necessary to deductively refute any alternative explanations before even claiming evidence for a hypothesis, generally undermine the several ensuing sections that individually attack the specific assays ENCODE used and the putative associated functions. These sections will probably be addressed by other consortium members far better than I'd be able to - but the two problems I've discussed are the biggest by far, in my opinion. And in fairness, Graur et al. do present numerous specific points on which I'd agree.

On the presentation: I honestly don't mind too much the style in which the Graur et al. manuscript is written. It certainly grabs your attention, and I think the polemics generally serve a rhetorical purpose, albeit over-the-top here and there. (E.T. Jaynes wrote the aforementioned book, one of my all-time favorites, in a similar style.) Were the writing watered down with weasel-words and footnotes and apologies everywhere, maybe it would be more precise, but reading it would be like chewing glass. And we certainly wouldn't be talking about it.

The shoe fits the other foot too, by the way: a lot of ENCODE participants would be more than willing write a treatise about how the "biochemical function[s]" investigated in the project have only limited immediate implications, how the data is really quite noisy, that cutoff selection is profoundly arbitrary even if you dress it up, that - out of necessity - they invented new statistical methods as they went along, and so on. Who's gonna print that - or read it?

You can be sure there were plenty of heated discussions about those topics in the consortium, on many hours of conference calls and over beers at the Hilton in Rockville, MD. But in the end, publishing a paper in Nature is brutal even after it's accepted: you're lucky if you end up getting to say even 50% of what you'd like to. Reading the text of the consortium paper again, I think the appropriate definitions and qualifications are all strictly present and proper, but frankly there was no extra space to spend bawling over them further. (The subsequent presentation to the general public may be another matter - that was debated pretty thoroughly last fall, so I won't go into it here.)

Lastly, Graur et al. should have shown more appreciation for the fact that a paper cannot be written by unanimous consent of 400+ people (cf. the US Congress). As a participant, if you disagree with some aspect of the wording negotiated by the PIs and editors, one option is to berate the other consortium members, remove yourself from the author list, and decline further funding - I have seen this happen (well, certainly the beratement part, with pledges to do the others; I don't know if they actually followed through). Another is to contribute to the best of your abilities, publish your take on it separately, and trust that this will not be seen as an inconsistency to be mocked, but rather diversity to be appreciated.