Alternative splicing, the process by which multiple mRNA isoforms are generated from a single pre-mRNA species, is an important means of regulating gene expression. Alternative splicing determines cell fate in numerous contexts, such as sexual differentiation in Drosophila and apoptosis in mammals [1], and aberrant regulation of alternative splicing has been implicated in human disease [1,2]. Additional attention is now being given to alternative splicing in the wake of the sequencing of the human genome. On the basis of the initial drafts of the human genome sequence it was estimated that there are 30,000 to 40,000 genes [3,4] - significantly fewer than expected. Although final gene counts may be higher, there is a disparity between the relatively small number of human genes and the complexity of the human proteome. This suggests that alternative splicing is important in the generation of protein diversity. This article describes what is known about the regulatory elements that direct alternative splicing, how genome-wide analyses are being applied to their identification, and suggests directions for future genomic and large-scale studies.

Prediction of alternative splice forms

Estimates of the extent of alternative splicing in humans made on the basis of alignments of expressed sequence tags (ESTs; sequenced portions of cDNAs) range from 30 to 60% of genes, and even 60% might be an underestimate [3] because there is a lack of ESTs for many tissues and many developmental stages, and ESTs are biased toward the 3' end of an mRNA. The majority of alternative splicing events affect coding regions [5]. Within the coding region, protein domains can be added or removed, the reading frame can be shifted to give rise to an altered protein sequence, or the protein can be truncated by the introduction of a termination codon. Although less common, alternative splicing of 5' and 3' untranslated regions may insert or remove key cis-regulatory elements that affect mRNA localization, stability, and translation. The seven basic types of alternative splicing are illustrated in Figure 1.

Figure 1
figure 1

Functionally significant examples of different types of alternative splicing. (a) Alternative inclusion of a cassette exon is very common. Neuron-specific inclusion of the N1 exon in the c-src proto-oncogene generates an insertion in the SH3 protein-protein interaction domain that alters its binding to other proteins [34]. (b) Alternative exons may be mutually exclusive, such as exons IIIb and IIIc in the fibroblast growth factor receptor 2 (FGFR-2) gene. Use of IIIb produces a receptor with high affinity for keratinocyte growth factor (KGF), whereas use of IIIc produces a high-affinity FGF receptor. Loss of the IIIb isoform is thought to be important in prostate cancer [35]. (c) The choice of an alternative 5' splice site in the Wilms' Tumor suppressor gene Wt1 results in the insertion of the three amino acids lysine, threonine, and serine (KTS). The +KTS and -KTS forms play distinct roles in kidney and gonad formation, and shift of the balance toward the -KTS form is associated with Frasier syndrome [36]. (d) In the transformer (tra) gene in Drosophila, selection of a female-specific alternative 3' splice site produces a single long open reading frame that gives rise to a regulatory protein that controls female somatic sexual differentiation. In male flies, tra mRNAs lack a long open reading frame, and no protein is made [37]. (e) Alternative terminal exons in the gene encoding calcitonin and calcitonin-gene-related peptide (CGRP) give rise to a hormone involved in calcium homeostasis in the thyroid gland, or a neuropeptide involved in vasodilation in the nervous system [38]. (f) Alternative promoter usage in the myosin light chain (MLC) gene leads to different first exons, which pair with mutually exclusive downstream exons to give rise to distinct protein isoforms, namely MLC1 and MLC3 [39]. This type of alternative splicing pattern results primarily from transcriptional regulation, not from the regulation of splice-site choice per se. (g) Intron retention is one of the rarest forms of alternative splicing in humans. Retention of intron 2 in the human muscle-specific chloride channel 1 (ClC-1) mRNA in myotonic dystrophy (DM) patients introduces a premature stop codon and leads to downregulation of ClC-1 expression, contributing to problems in muscle relaxation (myotonia) [2].

The significance of alternative splicing extends beyond the ability to generate different protein isoforms to the ability to modulate the levels of those isoforms. The proportions of different splice forms produced by alternative splicing may vary in different cell contexts, such as by cell type, developmental stage, or disease state. Numerous examples of cell-type-specific alternative splicing have been found (Figure 1), but the number of alternatively spliced genes identified so far is only a fraction of the number that has been predicted. In the last few years, bioinformatic studies comparing ESTs have greatly increased the number of known alternatively spliced genes (reviewed in [5]), and expanding these comparisons to include human genome sequence data holds the promise of finding many more. Genomic, mRNA, and EST sequences are also being used to better characterize alternative exons and their flanking introns [6]. Furthermore, microarray technologies are now evolving to look at splicing variation as well as overall gene expression in different cell contexts [7,8,9]. In future studies, arrays can be designed to screen for alternative splice forms in different tissues, at different developmental stages, in normal versus disease states, or in different mouse models, such as knockout mice lacking auxiliary splicing regulators. The caveats of microarray analyses are that they often cannot determine whether the splicing of multiple variable regions within an individual transcript is coordinated, and they rely on having sequence data (for example, for exon-exon junctions) prior to probe design. Nonetheless, taken together with bioinformatic approaches, microarrays will help to develop splicing profiles that provide a global picture of how alternative splicing is regulated.

The cis elements and trans-acting factors that regulate alternative splicing

In addition to identifying alternatively spliced mRNAs, genome-wide analyses will help answer many exciting questions about how alternative splicing is regulated. There is much to be learned about the regulatory factors that mediate cell-context-specific alternative splicing and about the cis elements through which they act. Genomic sequence, EST, and alternative splicing databases (Table 1) can be used to identify alternatively spliced RNAs and the regulatory elements that direct splicing decisions.

Table 1 Useful databases for identifying alternatively spliced RNAs and regulatory elementsof alternative splicing data

Auxiliary ciselements

Both constitutive and alternative splicing require the assembly of the basal splicing machinery in spliceosome complexes on consensus sequences present at all boundaries between introns and exons (the 5' and 3' splice sites). The spliceosome has two functions: to recognize and select splice sites, and to catalyze the two sequential transesterification reactions that remove the introns and join the exons together. The efficiency with which the spliceosome acts on an exon is determined by a balance of several features, including the strength of a splice site (that is, its conformity to consensus splice-site sequences), exon size, and the presence of auxiliary cis elements. Exons of ideal size (typically 50 to 300 nucleotides) with strong splice-site sequences are recognized efficiently by the splicing machinery and are constitutively included in the transcript, whereas suboptimal exons require auxiliary elements for recognition. But it is becoming increasingly clear that many constitutive exons also use auxiliary elements to ensure their recognition in all cells expressing the pre-mRNA. For example, a bioinformatic approach was used to identify intronic G-rich elements that facilitate the recognition of small constitutive exons [10]. Alternatively spliced exons are often small and have weak 5' and/or 3' splice sites, but for these exons auxiliary elements serve not only to improve splice site recognition, but also to modulate selection of splice sites used in specific cell contexts.

In general, the auxiliary elements that regulate the usage of alternative splice sites share several common features: they are small, variable in sequence, individually weak, and present in multiple copies. They are usually single-stranded, although secondary structure has been implicated in the function of a few elements (see, for example, [11]). Auxiliary elements are often conserved between species and perhaps between similarly regulated genes, but they contain degenerate sequence motifs, making it difficult to identify them. They can be exonic or intronic, and when they are intronic they can lie upstream, downstream, or flanking both sides of the regulated exon. Intronic elements can also be proximal (within 100 nucleotides) or distal (more than one kilobase away from the regulated exon), although they are often located close to the exon. And finally, auxiliary elements can enhance or repress splice-site selection. Depending on their location and their effect on the recognition of alternative splice sites, the elements are referred to as exonic splicing enhancers or silencers or intronic splicing enhancers or silencers (Figure 2). Table 2 lists the intronic splicing enhancers and silencers that have been identified to date; exonic splicing enhancers have recently been cataloged elsewhere [12].

Figure 2
figure 2

Typical features of alternative exons. Alternative exons are on average less than half the size of constitutive exons and have weak 5' and/or 3' splice sites. Auxiliary elements aid or prevent the recognition of these exons by binding trans-acting factors in different cellular contexts, and how often an exon is included in the mRNA depends on a balance between positive and negative regulation. Enhancer (+) and silencer (-) elements can be found within the alternative exon (yellow box in the center) or the flanking introns (lines). Splicing decisions are controlled by multiple elements, and for a given exon these can be different elements, multiple copies of the same element located at different sites, or a combination of the two (as indicated by the non-yellow colored boxes). Different alternative exons are regulated by different sets of auxiliary elements, but alternative exons that are regulated by the same trans-acting factors have some common elements. Intronic elements can be distal, but are more often located in the introns adjacent to the alternative exon (near the exon-intron boundary), and in some cases can overlap with, or be contained within, the consensus splice site sequences that are recognized by the basal spliceosomal machinery.

Table 2 Known auxiliary intronic elements that regulate alternative splicing

Growing evidence suggests that many alternative splice sites are associated with both enhancers and silencers, and that regulation of alternative splicing is often the result of dynamic antagonism between trans-acting factors binding to these elements (reviewed in [13]). Indeed, results from the best-characterized vertebrate experimental systems (see below) argue that for most alternatively spliced transcripts there is no 'default' or unregulated state; instead, the ratio of alternative splice forms observed for a given pre-mRNA results from a balance between positive and negative regulation.

Trans-acting splicing factors

Numerous exonic splicing enhancers have been shown to bind serine/arginine-rich (SR) proteins, a family of essential splicing factors, to promote inclusion of alternative exons with weak splice sites in the pre-mRNA [14]. The effects of SR proteins on alternative splicing are antagonized by the constitutive splicing factor heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1). Overexpression of SR proteins or hnRNP A1 has a general effect on splicing of alternatively spliced pre-mRNAs in vitro or in transient transfection assays, suggesting that alternative splicing of some RNAs can be globally regulated in cell populations by changing the relative ratios of constitutive splicing factors. In at least some cases, hnRNP A1 cooperatively binds exonic splicing silencer sequences within the same exon and blocks binding of the SR proteins to exonic splicing enhancers [15].

Other auxiliary elements mediate their effects by binding to auxiliary splicing factors. For example, muscle-specific splicing elements in the intronic regions flanking the alternative exon 5 of the cardiac troponin T protein regulate inclusion of exon 5 in transcripts in embryonic striated muscle. Positive muscle-specific splicing elements downstream of exon 5 bind members of the CUG-binding protein (CUG-BP) and embryonically lethal abnormal vision-type RNA binding protein 3 (ETR-3)-like factor (CELF) family; binding of CELF splicing factors promotes exon inclusion [16]. Negative elements antagonize this muscle-specific activity by binding the ubiquitously expressed pyrimidine tract-binding protein (PTB) upstream and downstream of exon 5 [17]. The neuron-specific N1 exon of the c-src proto-oncogene is similarly regulated. PTB binds to intronic sequences flanking the N1 exon to inhibit its inclusion in the c-src mRNA in non-neuronal cells [18]. In neurons, a downstream intronic enhancer region binds several factors to derepress the N1 exon, including KH-type splicing-regulatory protein (KSRP), far upstream element binding protein (FBP), and the heterogeneous nuclear ribonucleoproteins hnRNP H and hnRNP F [18]. The neuron-specific splicing of several other targets is similarly repressed by PTB, and is antagonized in at least some of these cases by binding of the neuron-specific activator neurooncological ventral antigen 1 (Nova-1) to enhancer elements [18].

Most cis elements known to regulate alternative splicing were identified using deletion or site-directed mutational analysis of minigenes that were tested in transient transfection assays. This approach is limited by the difficulty in making minigene constructs that preserve the ability to regulate the exon in a cell-culture system (reviewed in [19]). Another caveat of this approach has been that multiple small elements often display functional redundancy, making it hard to identify them by a loss-of-function approach. In contrast, this repetitiveness should be helpful for identification of auxiliary cis elements in large-scale genomic analyses. Sequencing of the human genome has provided large data sets that are invaluable for finding new elements involved in cell-context-specific alternative splicing.

Using genomics to identify alternative splicing elements

There are two basic approaches to using genomic information to investigate alternative splicing elements: first, comparative or computational approaches to identify putative elements followed by validation of the ability of the elements to regulate splicing, and second, experimental approaches to identify sequence motifs followed by searching the genome for natural regulatory sites containing these elements. Both of these approaches have recently been attempted with some early successes.

Computational identification of exonic elements

Large genomic data sets have been used to identify elements by computational analyses. In one study, Fedorov and colleagues [20] found differences in the distribution of pentameric and hexameric nucleotide sequences in the exons of intron-containing and intron-lacking genes, some of which may represent exonic regulatory elements. The differences in nucleotide distributions that they reported were not created by a few strong signals, but rather by the accumulation of multiple weak signals, consistent with our current understanding of exon recognition. The putative exonic elements described by Fedorov et al. [20] did not generally match known exonic splicing enhancers and silencers, but experimentation to test whether these elements do indeed affect splicing was left to future studies.

Fairbrother and colleagues [12] used a computational approach to predict exonic splicing enhancers sequences by statistical analysis, followed by experimental verification of enhancer activity. They looked for hexameric sequences that are enriched in exons relative to introns and that are near weak splice sites relative to strong ones. They identified ten classes of putative exonic splicing enhancers, five of which matched known motifs for such elements and five of which were novel. Representatives of all ten motifs had enhancer activity when tested in minigene constructs in vivo. Fairbrother et al. [12] also showed that nine out of the ten elements tested had greater enhancer activity than corresponding mutant sequences that were statistically predicted to lack enhancer activity due to single base-pair substitutions. The predictive ability of their method was further demonstrated by comparing their predicted motifs to hexamers within the human hypoxanthine phosphoribosyl transferase (HPRT) gene. Numerous natural exonic mutations are known to cause exon skipping in the HPRT gene, mutation of which is associated with Lesch-Nyhan syndrome, a neurogenetic disorder caused by a defect in the purine biosynthesis pathway. Of 30 mutations, more than half disrupt predicted exonic splicing enhancer motifs. The results by Fairbrother et al. [12] suggest that it is possible to predict accurately the splicing phenotype of point mutations in human diseases by computational analysis and demonstrate the striking possibilities for predicting splicing elements from genomic sequence. Interestingly, each of the ten predicted exonic splicing enhancer motifs occurred as often or slightly less often in alternative exons than in constitutive exons, so it is likely that the motifs identified in this study play a role in exon recognition in both alternative and constitutive splicing.

Computational identification of intronic elements

On a small scale, a comparative genomic approach was used to identify intronic elements regulating splicing of a single alternative exon in the transcript for the splicing factor hnRNP A1. Human and mouse hnRNP A1 genomic sequences were aligned and conserved intronic stretches were tested to determine whether they play a role in splicing of the hnRNP A1 pre-mRNA [21]. Brudno and colleagues [22] used a computational approach to find candidate intronic regulatory elements involved in cell-type-specific alternative splicing. In this study, a kilobase of intronic sequence from each side of 25 brain-specific internal alternative exons was contrasted to a larger set of intronic sequences flanking constitutive exons. Brudno et al. [22] found a significant enrichment of the hexanucleotide UGCAUG and related pentamers in the region of the downstream intron proximal to the regulated exon. They also found this sequence at a high frequency downstream of a smaller set of known muscle-specific exons, suggesting that this element may play a broader role in cell-type-specific alternative splicing. Strikingly, UGCAUG was previously identified in intronic splicing enhancers defined by functional assays ([23] and references within), and, furthermore, the splicing factors KSRP and FBP bind to this sequence in the intronic downstream control element of c-src transcripts [18]. Brudno et al. [22] also report that computational analysis of PTB consensus binding sites revealed a statistical over-representation of these sites near brain-specific exons, supporting models in which dynamic antagonism between cell-type-specific activators and the more ubiquitous repressor PTB modulate splicing of alternative exons by binding to positive and negative auxiliary elements, respectively [17,18]. In future studies, this approach could be used to look at other subsets of cell-type-specific alternatively spliced genes, or expanded to encompass more distal intronic regions.

Functional identification of auxiliary elements

The most common experimental approach used to identify RNA motifs associated with regulatory factors is systematic evolution of ligands by exponential enrichment (SELEX), in which preferred functional sites or binding sites are selected and amplified from an initial pool of random RNA sequences. Several studies have identified potential exonic splicing enhancers by functional selection of exonic sequences that enhance splicing in cell-free splicing assays [24,25,26,27] or in cultured cells [28]. Of these, studies from the Krainer laboratory [26,27] are most striking. In this work, Liu and colleagues used a splicing substrate in which the known exonic splicing enhancer was replaced with a pool of random sequences. To select for sequences that mediate splicing via individual SR proteins, they used a splicing complementation assay in which recombinant SR proteins were added to splicing-deficient cytoplasmic extracts. Liu et al. [26,27] identified four sets of exonic splicing enhancer motifs that are specifically activated by SR proteins, namely ASF/SF2, SRp40, SRp55, and SC35. These motifs matched almost a dozen natural exonic splicing enhancers for which binding specificity of individual SR proteins has been determined. The score matrices generated for the four SR proteins were then used as tools to predict exonic splicing enhancers. Disruption of such a predicted enhancer explains aberrant splicing resulting from a single nonsense point mutation in the BRCA1 gene in breast and ovarian cancer patients [29]. The mutation perturbs an exonic splicing enhancer regulated by ASF/SF2, causing exon skipping, which in turn disrupts the carboxy-terminal end of the BRCA1 protein. This mutation had previously been thought to contribute to disease by introducing a premature termination codon, but this analysis [29] demonstrated that the substitution gives rise to a splicing mutation instead.

Similar analysis of a single base-pair substitution within exon 7 of the survival of motor neuron 2 (SMN2) gene showed that exon skipping and subsequent truncation of the SMN2 protein is due to disruption of an exonic splicing enhancer [30]. Truncation of the SMN2 protein prevents compensation for loss of a nearly identical SMN1 gene in patients with spinal muscular atrophy. These studies dramatically demonstrate the importance of exonic splicing enhancers for exon recognition, as well as the significant role disabled elements play in human disease. Thus far, the sequence motifs identified by SELEX for the four SR proteins studied by Liu et al. [26,27] have been compared to established exonic splicing enhancers and point mutations known to affect splicing. In the future, analyses can be extended to include additional SR proteins and the elements identified by SELEX can be compared to databases of genomic sequences, alternatively spliced exons, or other point mutations associated with human diseases.

One study used sequences found by SELEX to identify alternative splicing elements from genomic sequence. Buckanovich and Darnell [31] used SELEX to identify binding sites for the neuron-specific splicing factor Nova-1. The selected sequence of three intact UCAU repeats was shown experimentally to be both necessary and sufficient for high-affinity Nova-1 binding. Although the sequence was selected on the basis of optimal binding and not function, Buckanovich and Darnell [31] were able to identify natural targets of Nova-1-mediated cell-specific splicing, by searching GenBank, a neuron-specific alternatively spliced exon database, and the Nova-1 genomic sequence with the consensus RNA selection sequence. Only two targets (exon 3A of the glycine receptor α2 and exon H of Nova-1) were identified, but at the time of this study, few genomic sequences were available and alternative splicing databases contained few entries. It is interesting to note, however, that Brudno and colleagues [22] found no increase in the frequency of UCAY sequences (where Y is a pyrimidine) in brain-specific alternative introns, suggesting that Nova-1 activity mediated through this element may be important only for a minority of brain-specific exons. Today the approach used by Buckanovich and Darnell [31] could presumably be used to find many natural targets of other cell-context-specific regulators by searching the large genomic and alternative splicing data sets for elements identified by SELEX.

Beyond the genome

Many powerful new tools are emerging that use genomic information and large-scale analyses: investigators can now compare and contrast vast genomic, mRNA, and EST data sets, use computational analyses to predict regulatory targets and elements, and develop array-based expression profiles. A challenge for the new millennium will be to integrate these tools with traditional biochemical and molecular biology approaches to understand how complex processes such as alternative splicing are regulated in specific cellular contexts, such as different cell types or cells at different developmental stages, in response to changing environmental cues, and in human disease. A consolidation of experimental and computational approaches will be required to catalog alternatively spliced genes, to characterize auxiliary cis elements and the trans factors that mediate the use of alternative splice sites, to identify genes with common alternative splicing programs, and to develop profiles of how alternative splicing is regulated in different cellular contexts.

Pioneering studies of this sort have begun in other areas of RNA processing. Darnell and colleagues [32] recently performed SELEX to identify binding motifs for the fragile X mental retardation protein (FMRP), an RNA-binding protein associated with fragile X syndrome that is thought to be involved in regulating mRNA translation, and used the consensus of these motifs to screen 245,000 sets of mammalian genomic sequences to identify natural targets of FMRP. RNAs from six genes bound FMRP and were identified as putative targets regulated by FMRP. In a parallel study, Brown et al. [33] identified a subset of FMRP-associated RNAs by combining microarray analysis of FMRP complexes in the murine brain and polyribosome profiling of cells derived from patients with fragile X syndrome, who harbor a 5' untranslated trinucleotide repeat expansion that leads to transcriptional silencing of the gene encoding FMRP. A total of 14 of the RNAs in this subset were in turn searched for the SELEX-derived FMRP-binding sites, and 7 of 11 putative elements in these RNAs bound with high affinity to FMRP, demonstrating how useful it is to combine these approaches to identify in vivo targets [32]. Similar integrated studies would be valuable for elucidating the elements through which regulatory splicing factors act and the in vivo targets containing these elements. Extending these analyses to model genetic organisms, such as Mus musculus or Drosophila melanogaster, whose genomes have been sequenced, will be especially valuable for defining regulatory networks that coordinate cell-type-specific alternative splicing, allowing us to see the 'big picture' of the transcriptome during development and in models of disease.