Coding sequences in eukaryotic genomes are frequently interrupted by spliceosomal introns, regions of noncoding DNA that are removed from pre-mRNA transcripts by the spliceosome, a complex of five RNAs and hundreds of proteins [1]. Why do introns exist? No general function for spliceosomal introns has been demonstrated, and both their absence from prokaryotes and the recurrent massive loss of introns in various eukaryotic lineages suggests that no such essential function may exist. One common hypothesis is that introns impose only a small (or no) burden, and so are tolerated in many lineages. However, a recent report by Jaillon et al. [2] reveals widespread intron mis-splicing, suggesting a significant cost associated with spliceosomal introns, and deepening the mystery of intron proliferation and persistence.

If introns are efficiently removed from transcripts before they are exported for translation, they should not respect coding meanings: whether or not an intron sequence contains a termination codon or a frameshift should be determined by chance. Jaillon et al. [2] report that this is not the case, however. Instead, the numerous (average 2.3 per gene) and very short (average length 25 bp) introns of the ciliate Paramecium tetraurelia show a pronounced preference for interrupted reading frames: 81.3% of P. tetraurelia introns have a frameshift (that is, they are not a multiple of three base pairs), as opposed to the expected two-thirds frequency. Moreover, those without a frameshift are twice as likely as frameshifting introns to have in-frame stop codons. Thus there is strong evolutionary selection against 'read-through' introns that could be translated into protein.

Translation of RNAs carrying premature stop codons can be prevented by the nonsense-mediated mRNA decay pathway (NMD) [3], and Jaillon et al. [2] experimentally tested the hypothesis that this pathway is responsible for removing mis-spliced or unspliced transcripts in P. tetraurelia. By knocking down a component of the NMD machinery, they revealed the intrinsic low efficiency of splicing for many introns and the essential role of NMD in preventing translation of the resulting unspliced transcripts. By a bioinformatics analysis of intron sequences in other eukaryotes, the authors conclude that such splicing inefficiency is likely to be widespread, at least for short introns.

The costs and benefits of spliceosomal introns

The findings of Jaillon et al. [2] have far-reaching implications for our understanding of the eukaryotic genome. First, the demonstrated inefficiency of splicing suggests that the presence of introns is even more disadvantageous to general fitness than previously appreciated. There is considerable wastage of mRNAs, and the maintenance of a complicated and effective monitoring system for the identification and removal of these transcripts - the NMD machinery - is required. In the case of P. tetraurelia, these burdens are borne by a species in which there is almost no functional alternative splicing [2], and so diversification of the proteome can be ruled out as a reason for intron presence in the genome. The very short lengths of the introns in Paramecium also suggest a minor role, at most, in genomic stability, chromatin structure or in promoting recombination - other proposed advantages of introns. This intensifies the central mystery of eukaryotic gene structure: why did so many costly introns with no apparent function arise in the first place, and why are they retained in such a diverse array of species?

Introns in reduced genomes

Another recent report, by Pleiss et al. [4], shows the functional potential of unspliced transcripts. Many eukaryotes have reduced genomes with few introns. The few introns in Saccharomyces cerevisiae (only 0.05 per gene on average) show a previously mysterious bias towards ribosomal protein genes (0.74 introns per ribosomal gene). Pleiss et al. showed that introns in ribosomal protein genes, but not introns in other genes, are inefficiently spliced under aminoacid starvation, resulting in reduced production of protein-encoding spliced transcripts and thus presumably inhibiting ribosome formation and overall protein translation. This is not a general stress response, as the splicing of these introns remains unaffected under other, unrelated stress conditions (such as exposure to toxic levels of ethanol), nor does it reflect a general collapse of cellular processes due to stress, as the splicing of introns in non-ribosomal genes was not reduced [4].

The use of regulated splicing to serve some biological function is thus one potential explanation for the retention of occasional introns in reduced eukaryotic genomes. Together with the finding of widespread mis-splicing in Paramecium [2] and the observed low level of evolutionary conservation of alternative splice forms in metazoans (see [5] for a recent review), these results suggest an intriguingly counterintuitive possibility. This is that a large fraction of the introns in reduced genomes may serve important functions, whereas the numerous introns and frequent alternative splicing of more intron-rich genomes may be largely nonfunctional [69].

Early eukaryotic genomes and the origins of alternative splicing

The work in Paramecium [2] also has important implications for the origins of alternative splicing. The inefficient splicing demonstrated suggests that eukaryotes may have been producing variable transcripts - presumably a requisite for the emergence of widespread alternative splicing [8] - early in their history. This would mean that variably spliced genes encoding multiple functional proteins need not have emerged by serendipitious, rare mutations from alleles producing single transcript forms. Instead, preexisting nonfunctional transcript variation is likely to have been co-opted for new functions. If the production of alternative transcripts from the same gene was already common early in eukaryote evolution [8, 10, 11], the question then becomes at what point(s) did functional alternative splicing emerge. On the other hand, it should be noted that most well characterized functional alternative splicing events do not involve the inclusion or exclusion of introns, but the inclusion or exclusion of exons, the evolution of which may proceed differently [10].

A recent flood of comparative genomic data indicates that early eukaryotes had complex genome structures, suggesting that compact intron-poor genomes are the most 'highly evolved' among eukaryotes. It has been shown that the last common ancestor of extant eukaryotes contained a complex spliceosome [12] and a large number of introns [1318], and that those introns probably had degenerate sequences without strong consensus motifs [11, 19]. The results of Jaillon et al. [2] emphasize how much of a burden those ancestral structures could have imposed - the many degenerate introns are likely to have been only inefficiently spliced, implying large numbers of unspliced and therefore useless transcripts. These results also indicate a strong selective force for the origin of intron-mediated NMD: the presence of large numbers of unspliced transcripts is likely to have driven the evolution of NMD, rather than the NMD system and introns being selected to deal with the (relatively infrequent) transcription errors. It is truly a mystery how such seemingly hapless and inefficient early eukaryotes could have succeeded in a world already well colonized by prokaryotes unburdened by such problems.