Gene organization and evolutionary history

The discovery of SR proteins goes back to studies in Drosophila where genetic screens identified SWAP (suppressor-of-white-apricot) [1], Tra (transformer) [2] and Tra-2 (transformer-2) [3] as splicing factors. Their sequence characterization led to the identification of a protein domain rich in arginine and serine dipeptides, termed the arginine/serine (RS) domain. Subsequent identification of the splicing factors SF2/ASF and SC35 from human cell lines also revealed the presence of extended RS domains in addition to at least one RNA-binding domain of the RNA recognition motif (RRM)-type [46]. The family of SR proteins was classified following the identification of additional RS-domain-containing proteins on the basis of the presence of a phosphoepitope recognized by the monoclonal antibody mAb104 [7], their conservation across vertebrates and invertebrates, and their activity in splicing complementation assays [8]. In humans, the SR protein family is encoded by nine genes, designated splicing factor, arginine/serine-rich (SFRS) 1-7, 9, and 11 (Table 1). All nine members of the human SR protein family - SF2/ASF, SC35, SRp20, SRp40, SRp55, SRp75, SRp30c, 9G8, and SRp54 - have a common structural organization (Figure 1), containing either one or two amino-terminal RNA-binding domains that provide RNA-binding specificity, and a variable-length RS domain at their carboxyl terminus that functions as a protein interaction domain [9].

Figure 1
figure 1

The human SR protein family. The structural organization of the nine human SR proteins is shown. RRM, RNA recognition motif; RRMH, RRM homology; RS, arginine/serine-rich domain; Zn, Zinc knuckle.

Table 1 Human genes encoding SR proteins

More recent genome-wide studies have identified several other RS-domain-containing proteins, most of wh ich are conserved in higher eukaryotes and function in pre-mRNA splicing or RNA metabolism [10]. Because of differences in domain structure, lack of mAb104 recognition, or lack of a prototypical RRM, these proteins are referred to as SR-like or SR-related proteins. An extensive list of SR-related proteins and their functional roles in RNA metabolism was recently discussed [11].

While introns are common to all eukaryotes, the complexity of alternative splicing varies among species. SR proteins exist in all metazoan species [8] as well as in some lower eukaryotes, such as the fission yeast Schizosaccharomyces pombe [12, 13]. However, classical SR proteins are not present in all eukaryotes and are apparently missing from the budding yeast Saccharomyces cerevisiae, which lacks alternative splicing. Instead, three SR-like proteins have been identified in S. cerevisiae, one of which has been shown to modulate the efficiency of pre-mRNA splicing [14]. In general, the species-specific presence of SR proteins correlates with the presence of RS domains within other components of the general splicing machinery. The observation that the density of RS repeats correlates with the conservation of the branch-point signal, a critical sequence element of the 3' splice site, argues for an ancestral origin of SR proteins [15]. As such, SR proteins appear to be ancestral to eukaryotes and were subsequently lost independently in some lineages (Figure 2). Phylogenetic tree analyses further suggest that successive gene duplications played an important role in SR protein evolution [16]. These duplication events are coupled with high rates of nonsynonymous substitutions that promoted positive selection favoring the gain of new functions, supporting the hypothesis that the expansion of RS repeats during evolution had a fundamental role in the relaxation of the splicing signals and in the evolution of regulated splicing.

Figure 2
figure 2

Evolutionary relationship between members of the SR family. The phylogeny was inferred using the neighbor-joining method. ClustalW was used to align sequences and perform phylogenetic analysis. Trees were drawn by CTree. The horizontal lines in each panel indicate the similarity between SR proteins. (a) Phylogenetic tree based on the alignment of the human (Hs) SR protein family. The numbers above each bar indicate the degree of similarity. (b) Phylogenetic tree based on the alignment of Homo sapiens (Hs), Drosophila melanogaster (Dm), Caenorhabditis elegans (Ce), Arabidopsis thaliana (At), and Schizosaccharomyces pombe (Sp) SR protein sequences. Green and blue lines indicate different clusters. Cluster set selection is based on minimizing the subtype diversity ratio, a measure that groups related subclasses.

Characteristic structural features

All SR proteins share two main structural features: the RS domain and at least one RRM (Figure 1). For the majority of SR proteins with two RNA-binding domains, the second is a poor match to the RRM consensus and is referred to as an RRM homolog (RRMH). The only exception is 9G8, which contains an RRM and a zinc-knuckle domain that is thought to contact the RNA [17]. In the cases where it has been determined, SR proteins have specific, yet degenerate RNA-binding specificities [18, 19]. The RS domains of SR proteins participate in protein interactions with a number of other RS-domain-containing splicing factors [20, 21]. These include other SR proteins, SR-related proteins [22], and components of the general splicing machinery [20, 21, 2325]. Furthermore, the RS domain can function as a nuclear localization signal by mediating the interaction with the SR protein nuclear import receptor, transportin-SR [2628].

Structural characterization of a complete SR protein has not yet been achieved. Consequently, only isolated RRMs of SR proteins have been analyzed structurally by nuclear magnetic resonance spectroscopy. Unfortunately, no structural information detailing the RS domain is available to date. This may be explained by the poor solubility of these proteins in their free state and the unknown phosphorylatio n state of the serines within the RS domain. In addition, the degenerate RNA-binding sequences recognized by SR proteins may have prevented their study in the bound form. To tackle the solubility issues, the RRMs of SRp20 and 9G8 were fused to the immunoglobulin G-binding domain 1 of Streptococcal protein G (GB1) solubility tag [29] or overexpressed RRMs were suspended in a solution containing charged amino acids [30]. Using these manipulations it was possible to obtain solution structures of the free 9G8 and SRp20 RRMs and of the SRp20 RRM in complex with the RNA sequence 5'-CAUC-3' (Figure 3). When examining the unbound RRMs of SRp20 and 9G8, one is struck by an unusually large exposed hydrophobic surface, which could explain why the solubility of SR proteins is so low. The SRp20 RRM complex with RNA shows that although all four nucleotides present are contacted by the RRM, only the 5' cytosine is recognized in a specific manner. These structural insights provided an explanation for the seemingly low specificity of RNA binding exhibited by SRp20 [31, 32].

Figure 3
figure 3

Solution structure of an SR protein RRM from human SRp20 (blue) in complex with the RNA sequence 5'-CAUC-3' (red). All four nucleotides present are contacted by the RRM, but only the 5' cytosine is recognized specifically. The structure was generated using the Visual Molecular Dynamics program [78] from coordinates deposited in the Brookhaven National Laboratory Protein Data Bank [30].

Localization and function

Many proteins involved in pre-mRNA splicing, including the SR proteins, are enriched in nuclear compartments termed speckles, which occur throughout the nucleus. Speckles are of two distinct structural types [33]: inter-chromatin granule clusters (IGCs) about 20-25 nm in diameter, which are storage/reassembly sites for pre-mRNA splicing factors; and perichromatin fibrils approximately 5 nm in diameter, which are sites of actively transcribing genes and co-transcriptional splicing [34]. The SR proteins are a prominent component of nuclear speckles (Figure 4) [35, 36], and biochemical analyses have indicated that RS domains are responsible for targeting the SR proteins to these structures [26, 37]. The intranuclear organization of SR proteins is dynamic, and they are recruited from the IGC storage clusters to the sites of co-transcriptional splicing, the perichromatin fibrils [38, 39]. Interestingly, both the RNA-binding domains and RS domains are required for recruitment of SR proteins from the IGCs to the perichromatin fibrils, as is phosphory lation of the RS domain [40].

Figure 4
figure 4

Localization of SR proteins within the nucleus. Left panel: HeLa cells transfected with GFP-SRp20. The GFP fluorescence is visualized directly. Middle panel: cells are also stained with anti-SC35 hybridoma supernatant to highlight clusters of SR proteins in the nucleus (red), which are referred to as nuclear speckles. Speckles are believed to be storage compartments for SR proteins and other splicing factors. Right panel: merge of GFP-SRp20 and SC35 images. The bar in each panel indicates the scale. Images courtesy of Lin Li and Rozanne Sandri-Goldin.

Splicing activation

In classic cases of alternative splicing, it has been shown that cis-acting RNA sequence elements, known as splicing enhancers, increase exon inclusion by serving as sites for recruitment of the splicing machinery - the spliceosome - which is a complex of ribonucleoprotein splicing factors, such as U1 and U2 small nuclear ribonucleoproteins (snRNPs), and their associated proteins, such as U2 auxiliary factor (U2AF), that splices exons together and releases the intron RNA. Splicing enhancers are usually located within the regulated exon, and are thus known as exonic splicing enhancers (ESEs) [41, 42]. ESEs are usually recognized by at least one member of the SR protein family and recruit the splicing machinery to the adjacent intron [9, 41, 42]. SR proteins act at several steps during the splicing reaction [4, 5, 8, 4345] and require phosphorylation for efficient splice-site recognition and dephosphorylation for splicing catalysis [46, 47]. A number of SR protein kinases have been identified that specifically phosphorylate serine residues within the RS domain of SR proteins. These include SR protein kinase 1 (SRPK1) [48], Clk/Sty kinase [49], cdc2p34 [50], and topoisomerase [51]. Surprisingly, binding sites for SR proteins are not only limited to alternatively spliced exons, but have also been verified for exons of constitutively spliced pre-mRNAs [52, 53]. It is therefore likely that SR proteins bind to sequences found in most, if not all, exons.

One model for the mechanism of splicing activation proposes that the RS domain of an enhancer-bound SR protein interacts directly with other splicing factors containing an RS domain, thus facilitating the recruitment of spliceosomal components such as the snRNP U1 to the 5' splice site or U2AF65 (the large subunit of the splicing factor U2AF) to the 3' splice site [9]. An alternative mode of spliceosomal recruitment was suggested by experiments showing that RS domains of SR proteins contact the pre-mRNA within the functional spliceosome [54, 55]. Irrespective of the RS domain activation mode, SR proteins facilitate the recruitment of spliceosomal components to the regulated splice site [42, 56] (Figure 5a). Thus, SR proteins bound to ESEs function as general activators of exon definition [57]. Kinetic analyses have shown that the relative activity of ESE-bound SR proteins determines the magnitude of splicing promotion. This activity depended on the number of SR proteins assembled on an ESE and the distance between the ESE and the intron. It was also shown that activation of splicing was proportional to the number of serine-arginine repeats within the RS domain of the bound SR protein. Thus, the quantity of serine-arginine repeats appears to dictate the activation potential of SR proteins [58].

Figure 5
figure 5

Splicing functions of SR proteins. (a) SR proteins (green) bound to an exonic splicing enhancer (ESE) may function in constitutive splicing by interacting with the splicing factors U2AF bound at the upstream 3' splice site and U1 snRNP bound to the downstream 5' splice site. Py represents the polypyrimidine tract, the binding site for U2AF. (b) Exon-independent functions of SR proteins. SR proteins may have two exon-independent functions. SR proteins facilitate splice-site pairing by simultaneously interacting with U1 snRNP and U2AF across the intron. SR proteins also assist in recruiting the U4/U6U5 tri-snRNP. (c) Splicing repression is mediated when SR proteins associate with intronic sequences close to the splice sites. Recruitment of spliceosomal components is inhibited through steric hindrance or nonproductive spliceosomal assembly. Adapted with permission from [79].

In addition to their exon-dependent functions, SR proteins have activities that do not require interaction with exon sequences [59]. The role of the exon-independent function may be to promote the pairing of 5' and 3' splice sites across the intron or to facilitate the incorporation of the tri-snRNP U4/U6•U5 into the spliceosome [44] (Figure 5b). U4/U6•U5 is a complex of snRNPs that contains the splicing activity. Although the RRM of the SR protein is essential for its exon-independent activity [59], it is likely that SR proteins interact with the partially assembled spliceosome or the tri-snRNP through RS domain contacts.

Splicing repression

One striking feature of SR proteins is their prevalent location within the pre-mRNA. In nearly all cases SR proteins have been found to interact with exonic sequences of the pre-mRNA. This is a surprising finding considering the fact that their relatively promiscuous binding specificity predicts that introns are littered with potential SR-protein-binding sites. The fact that SR proteins are mainly observed to bind within exonic sequences suggests that additional requirements need to be met for functional SR protein binding to the pre-mRNA. There are, however, some instances of SR proteins binding within the intron, where they function as negative regulators of splicing. The best-characterized example occurs during adenovirus infection [60]. In this case, splicing is repressed by the binding of the SR protein SF2/ASF to an intronic repressor element located upstream of the 3' splice site branchpoint sequence in the adenovirus pre-mRNA. When bound to the repressor element, SF2/ASF prevents the recruitment of the snRNP U2 to the branchpoint sequence, thereby inactivating the 3' splice site (Figure 5c). Other studies have provided further support for the idea that SR proteins bound to introns generally interfere with the productive assembly of spliceosomes [61]. These observations show that exonic splicing enhancers not only function in exon and splice-site recognition, but also act as barriers to prevent exon skipping.

Role of SR proteins in mRNA export

Some SR proteins - SF2/ASF, SRp20, and 9G8 - shuttle continuously between the nucleus and the cytoplasm [62]. The movement of these proteins requires the phosphorylation of specific residues in the RS domain and the RNA-binding domain. These unique intracellular transport properties suggest that a subset of SR proteins functions not only in pre-mRNA processing but also in mRNA export [62]. In fact, the SR proteins 9G8 and SRp20 promote nuclear export of the intronless histone H2A mRNA in mammalian cells and Xenopus oocytes [63] by binding to a 22-nucleotide sequence within the H2A mRNA (Figure 6a). In addition, the S. cerevisiae protein Npl3p, which is closely related to the SR proteins, assists in mRNA export in yeast [64]. Once again, phosphorylation of specific serine residues within the RS domain seems to control the efficiency of the mRNA-export function of Npl3p [65]. Given the fact that SR proteins are essential for splicing [9], remain associated with the spliced mRNA after intron removal [66, 67], and shuttle between the nucleus and the cytoplasm [62], it seems highly likely that SR proteins also play an important part in the export of spliced mRNAs. As shown recently, 9G8 and SRp20 are involved in mediating the efficient handover of mRNA to Tip-associated protein (TAP), which is an essential nuclear export factor [68].

Figure 6
figure 6

SR protein functions other than splicing. (a) mRNA export. SR proteins associate site-specifically with intronless mRNAs, such as histone H2A mRNA [63], to promote their export (left-hand side). The export machinery is as yet unknown. For intron-containing pre-mRNAs (right-hand side), SR protein association with the spliced mRNA has also been suggested to mediate nuclear export through interactions with the RNA export factor ALY/REF and Tip-containing protein (TAP). (b) Translation initiation. Interactions between mRNA-bound SF2/ASF and the protein kinase mTOR trigger phosphorylation of 4E-BP (eIF4E-binding protein). In its phosphorylated form 4E-BP dissociates from the translation initiation factor eIF4E, thereby releasing eIF4E and activating initiation of cap-dependent translation (green arrow) [71].

SR protein involvement in translation

SR proteins have been shown to influence translation either indirectly or directly. For example, the splicing activity of SF2/ASF influences alternative splicing of the pre-mRNA for the protein kinase MNK2, a kinase that regulates translation initiation. High levels of SF2/ASF promote the production of an MNK2 mRNA isoform that enhances cap-dependent translation, whereas low levels achieve the opposite [69]. SF2/ASF is also involved in regulating translation directly. It has been shown to associate with polyribosome fractions isolated from cytoplasmic extracts and to enhance the translation efficiency of an ESE-containing luciferase reporter [70], apparently through mediating the recruitment of components of mTOR (mammalian target of rapamycin) signaling pathway (Figure 6b). As a result of this recruitment, a competitive inhibitor of cap-dependent translation is released [71].

Importantly, other SR proteins have also been reported to function in translation. SRp20 promotes translation of a viral RNA initiated at an internal ribosome entry site [72], and 9G8 increases translation efficiency of unspliced mRNA containing a constitutive transport element [73].

Frontiers

The functional characterization of SR proteins has revealed a wealth of information, placing SR proteins in the context of regulating constitutive and alternative pre-mRNA splicing, mediating efficient transport of mRNAs, and modu lat ing mRNA translation. As such, SR proteins could easily be mistaken for 'Jacks of all trades, masters of none' in mRNA metabolism. However, many studies have demon strated their essential presence in the cell, even with occasional redundancies. Given the enormous functional real estate this family of proteins covers, one is now pressed to find out how it is possible to transition these proteins between their involvements in the various steps of mRNA processing. Clearly, reversible modification, such as serine phosphorylation within the RS domain, is likely to be the ticket for SR protein functional flexibility [51]. The challenge will be to determine the extent and dynamics of such modifications within SR proteins specifically involved in one of these activities and whether changes in modification lend support to the existence of an SR protein-modification code, perhaps similar in principle to the now well-described histone-modification code [74].

An old foe makes up another challenge: SR protein structure. For more than 15 years attempts have been made to obtain high-resolution structures of SR proteins. So far, these attempts have failed because of problems of low solubility and the likely heterogeneity of RS-domain modifications. As a first step towards gaining ground in this endeavor, clever modification approaches have been used to obtain a high-resolution structure of the SR protein RRM domain. This is a significant first step. However, the much more elusive RS domain is still the big prize, requiring further creative approaches and manipulations to freeze this seemingly unstructured domain in a conformation that permits its structural elucidation.

A different and experimentally challenging puzzle to address is the balance between the relatively low RNA-binding specificity exhibited by SR proteins and their usually specific functional impact. Given that SR proteins generally associate with exon sequences, it is likely that their interaction with the RNA is often aided by other factors. This suggestion is supported by the observation that at least 75% of the nucleotides in a typical human exon are part of sequence motifs that have been found to influence splicing, presumably through the binding of splicing activators, such as SR proteins, or the binding of splicing repressors, such as heterogeneous nuclear RNPs [75]. For example, it is possible that the binding of SR proteins to pre-mRNA is only guaranteed if they are flanked by spliceosomal components such as U2 snRNP auxiliary factor or U1 snRNP, thus establishing a network of protein-protein and protein-RNA interactions. The establishment of such a network would then permit the stable association of SR proteins with many different target sequences, thus enabling SR proteins to recognize the thousands of different exons present in higher eukaryotes [76]. Therefore, the relatively low RNA-binding specificity may have evolved to uphold the suitability of SR proteins to participate effectively in multiple RNA-processing events.

Clearly, SR proteins make up a family of regulators with important functions in RNA metabolism. This realization is exemplified when considering that changes in SR protein function or abundance have frequently been associated with human disease. For example, SF2/ASF has been described as a proto-oncogene [69] and the misregulation of alternative splicing has been associated with several types of cancer [77]. While the involvement of SR proteins in various aspects of gene expression has been shown to be widespread, it would not be surprising if they emerge as critical players in other important biological processes.