Sunday, January 5, 2014

Ribosome profiling confirms widespread stop codon readthrough in flies

A recent eLife paper from the Weissman lab at UCSF uses high-throughput ribosome profiling to show that many Drosophila melanogaster (fruitfly) genes undergo an unusual translation process called stop codon readthrough - confirming many of, and expanding on, our earlier predictions based on computational comparative genomics. Stop codon readthrough occurs when a translating ribosome reaches an in-frame stop codon in an mRNA but, instead of terminating as usual, continues on with translation - as if the stop codon were a sense codon. This gives rise to a protein isoform with an extended C-terminal region, potentially modifying its function or localization.

Stop codon readthrough is a long-known but fairly obscure "recoding" mechanism, which wasn't believed to play a widespread role in metazoan gene expression, save for selenoproteins and a few other intriguing but isolated examples. Now we know that it actually affects hundreds of fly genes - and moreover that, in many cases, the products confer biological functions conserved throughout many millions of years of evolution.

The idea to look at evolutionary signatures of stop codon readthrough originally came from a casual remark by Bill Gelbart in 2006, early on in the analysis phase of the 12 Drosophila Genomes project (when Sanger sequencing of twelve insect genomes was downright Big Science!). We (Manolis Kellis and I) had just developed a precursor to PhyloCSF, so I wrote a program using it to screen the regions immediately 3' of known stop codons for continued protein-coding evolutionary signatures. Frankly we weren't expecting much, given the mere handful of known instances. But this became one of the very rare occasions when the data turned out delightfully surprising - dozens and dozens of beautiful examples! Here's one (Fig. 5A of Lin et al., 2007):

We're looking at an excerpt of 12-fly genome alignments, corresponding to the region surrounding the stop codon of Caki (aka CASK). Synonymous substitutions with respect to D. melanogaster are shown in bright green, conservative amino acid substitutions in light green, and others in red. Conserved protein-coding regions characteristically show far more synonymous substitutions than other kinds of substitutions. Here we see this leading up to the annotated stop codon, as expected - but surprisingly, we also see it in the region immediately following. In fact, it continues until the next downstream stop codon, at which point the signature deteriorates. That suggests translation sometimes continues through the first stop codon, and in doing so leads to a fitness advantage conserved at least across the Drosophila genus.

We published an initial list of 149 candidate fly readthrough genes in 2007, along with a few other predictions of unusual phenomena like programmed frameshifting (also confirmed in the time since). A couple years later, Irwin Jungreis and Clara Chan started taking another look in the context of their class project at MIT. Irwin ended up joining the Kellis lab to undertake it as a full research project, and developed a revised and expanded list of 283 readthrough candidates. Importantly, modENCODE RNA-Seq data arrived around this time, showing the readthrough stop codons were not being spliced out of these transcripts prior to translation - a caveat we had not previously been able to rule out. Irwin published these results in 2011, along with many other related analyses and even a few validations with transgenic flies by Kevin White's group.

The new study by Dunn et al. takes an entirely complementary, high-throughput experimental approach to the phenomenon. The Weissman lab had previously pioneered the use of next-generation sequencing to profile transcriptome-wide ribosome activity, and they evidently learned that Drosophila readthrough might be a "fruitful" application from John Atkins - a leading expert on recoding with whom we had also corresponded. They developed a version of their protocol for flies, and applied it to generate about 50Gb of short read data describing ribosome localization and mRNA expression in early D. melanogaster embryos and S2 cells (a cell line).

Here's one of their figures which I was especially pleased to see, showing that our predicted readthrough extensions tend to show several times higher relative ribosome localization than other regions 3' of ORFs (Fig. 3E of Dunn et al., 2013):

By studying these localization statistics and other hallmarks of translation in their ribosome profiling data, Dunn et al. found orthogonal evidence for readthrough in 43 of the 283 genes on our list - an impressive percentage considering the measurements came from just one embryonic stage and a cell line. They also found such evidence in a surprisingly large number of genes that weren't on our list - 307, to be exact. As a group these are poorly conserved across the Drosophila genus and most are probably more-or-less neutral, if functional at all - but they probably also include some relatively recent adaptations in D. melanogaster and its ancestors. Dunn et al. present some initial results on their evolution and function, and going forward it will be really interesting to study them further.

As a computational biologist, it's extremely pleasing to have broad experimental confirmation of a surprising prediction that was based on entirely computational analysis of a well-studied genome like D. melanogaster's!