Introduction

Inteins were identified 20 years ago, when two groups reported an in-frame insertion in the VMA1 gene, which encodes a vacuolar membrane H+-ATPase of the yeast Saccharomyces cerevisiae (Hirata et al. 1990; Kane et al. 1990). The nucleotide sequence of the VMA1 gene predicts a polypeptide of 1,071 amino acids with a calculated molecular mass of 118 kDa, but the size of the VMA1 protein, as estimated from sodium dodecyl sulfate-polyacrylamide (SDS-PAGE) gels, is only 67 kDa. Furthermore, the N- and C-terminal regions of the deduced sequence were shown to be very similar to the catalytic subunits of vacuolar membrane H+-ATPases of other organisms, while an internal region of 454 amino acid residues displayed no detectable sequence similarity to any known ATPase subunits. Instead, the internal sequence exhibits similarity to an S. cerevisiae endonuclease encoded by the HO gene. The in-frame insertion was found to be present in the mRNA, translated with the Vma1 protein, and excised posttranslationally (Kane et al. 1990). By analogy to pre-mRNA introns and exons, the segments are called intein for internal protein sequence, and extein for external protein sequence, with upstream exteins termed N-exteins and downstream exteins called C-exteins. The post-translational process that excises the internal region from the precursor protein, with subsequent ligation of the N- and C-exteins, is termed protein splicing (Perler et al. 1994). The products of the protein splicing process are two stable proteins, the mature protein and the intein (Fig. 1). According to accepted nomenclature, intein names include a genus and species designation, abbreviated with three letters, and a host gene designation. For example, the S. cerevisiae VMA1 intein is called Sce VMA1. Multiple inteins from one protein are numbered with Arabic numerals (Perler 2002). Large-scale genome sequencing approaches have identified inteins in all three domains of life, as well as in phages and viruses. By the end of 2009, the intein registry InBase at http://www.neb.com/neb/inteins.html (Perler 2002) listed more than 450 inteins in the genomes of Eubacteria, Archaea, and Eukarya. In prokaryotes, intein sequences often reside within proteins involved in DNA replication, repair, or transcription, such as DNA and RNA polymerases, RecA, helicases, or gyrases, and in the cell division control protein CDC21. Others are located in metabolic enzymes including ribonucleoside triphosphate reductase, and UDP-glucose dehydrogenase (Perler 2002; Starokadomskyy 2007). Eukaryotic inteins are encoded in the nuclear genes of fungi, and in the nuclear or plastid genes of some unicellular algae. In fungi, intein sequences are found in homologs of the S. cerevisiae VMA1 gene or in the prp8 genes, but they are also found in genes encoding glutamate synthases, chitin synthases, threonyl-tRNA synthetases, and subunits of DNA-directed RNA polymerases (Elleuche and Pöggeler 2009; Poulter et al. 2007). In green and cryptophyte algae, inteins reside within the chloroplast ClpP protease, the RNA polymerase beta subunit, the DnaB helicase and the nuclear RNA polymerase II (Douglas and Penny 1999; Luo and Hall 2007; Turmel et al. 2008; Wang and Liu 1997).

Fig. 1
figure 1

Protein splicing. The intein coding sequence is transcribed into mRNA and translated to a nonfunctional protein precursor, which then undergoes a self-catalyzed rearrangement in which the intein is excised and the exteins are joined to yield the mature protein

Most genes encode only one intein, and inteins found at the same insertion site in homologous extein genes are considered intein alleles (Perler et al. 1997). In rare cases, genes encode more than one intein, such as the ribonucleotide reductase gene of the oceanic N2-fixing cyanobacterium Trichodesmium erythraeumsome, which encodes four inteins (Liu et al. 2003).

Structure of mini-inteins and large inteins

Inteins are classified into two groups, large and minimal (mini) (Liu 2000). Large inteins contain a homing endonuclease domain that is absent in mini-inteins. Homing endonucleases are site-specific, double-strand DNA endonucleases that promote the lateral transfer between genomes of their own coding region with flanking sequences, in a recombination-dependent process known as “homing.” Usually, homing endonucleases are encoded by an open reading frame within an intron or intein (Belfort et al. 2005; Chevalier and Stoddard 2001). Large inteins are bi-functional proteins, with a protein splicing domain, and a central endonuclease domain. Splicing-efficient mini-inteins have been engineered from large inteins by deleting the central endonuclease domain, demonstrating that the endonuclease domain is not involved in protein splicing (Chong and Xu 1997; Derbyshire et al. 1997; Shingledecker et al. 1998). The splicing domain is split by the endonuclease domain into N- and C-terminal subdomains, which contain conserved blocks of amino acids, with blocks A, N2, B, and N4 in the N-terminal subdomain, and blocks G and F in the C-terminal subdomain (Perler et al. 1997; Pietrokovski 1994, 1998) These domains can also be identified in mini-inteins (Fig. 2). The three-dimensional structures of naturally occurring mini-inteins and engineered mini-inteins reveal that the N- and C-terminal splicing domains form a common horseshoe-like 12-β-strand scaffold termed the Hedgehog/Intein (HINT) module (Ding et al. 2003; Hall et al. 1997; Klabunde et al. 1998; Koonin 1995; Perler 1998; Sun et al. 2005; Van Roey et al. 2007).

Fig. 2
figure 2

Structure of large and mini-inteins. Conserved elements in a large intein and mini-intein are indicated. The white and grey areas A, N2, B, N4, C, D, E, H, F, and G are conserved intein motifs identified by Pietrokovski (1994, 1998) and Perler et al. (1994). The exteins are illustrated in black and the intein sequence in blue. The site of insertion of the homing endonuclease in large inteins is indicated by the dark vertical line. Conserved amino acid residues of the intein and the C-extein are indicated below

All known inteins share a low degree of sequence similarity, with conserved residues only at the N- and C-termini. Most inteins begin with Ser or Cys and end in His-Asn, or in His-Gln. The first amino acid of the C-extein is an invariant Ser, Thr, or Cys, but the residue preceding the intein at the N-extein is not conserved (Perler 2002). However, residues proximal to the intein-splicing junction at both the N- and C-terminal exteins were recently found to accelerate or attenuate protein splicing (Amitai et al. 2009).

Cis- and trans-splicing mechanisms of inteins

Protein splicing is a rapid process of four nucleophilic attacks, mediated by three of the four conserved splice junction residues. In step 1, the splicing process begins with an N−O shift if the first intein residue is Ser, or N−S acyl shift, if the first intein residue is Cys. This forms a (thio)ester bond at the N-extein/intein junction. In step 2, the (thio)ester bond is attacked by the OH- or SH-group of the first residue in the C-extein (Cys, Ser, or Thr). This leads to a transesterification, which transfers the N-extein to the side-chain of the first residue of the C-extein. In step 3, the cyclization of the conserved Asn residue at the C-terminus of the intein releases the intein and links the exteins by a (thio)ester bond. Finally, step 4 is a rearrangement of the (thio)ester bond to a peptide bond by a spontaneous S−N or O−N acyl shift (Fig. 3). Details of the chemical process involved in protein splicing of standard (class I inteins) have been comprehensively described and reviewed (Gogarten et al. 2002; Liu 2000; Noren et al. 2000; Paulus 2000; Saleh and Perler 2006; Starokadomskyy 2007; Tori et al. 2010). In addition to the standard protein splicing pathway, new classes of inteins performing an alternative splicing process have been described recently. These inteins lack the N-terminal Ser or Cys residue and are classified as class 2 and class 3 inteins (Southworth et al. 2000; Tori et al. 2010). Both cannot perform the acyl shift that initiates the splicing reaction in class 1 inteins. In class 2 inteins, present in archaeal KlbA proteins, the first residue of the C-extein (nucleophile Cys) directly attacks the amide bond at the N-terminal splice site junction to form a standard branched intermediate (Johnson et al. 2007; Southworth et al. 2000). In the class 3 intein of the mycobacteriophage Bethlehem DnaB protein, a Cys residue of the conserved block F attacks the peptide bond at the N-terminal splice site junction, forming a branched intermediate with a labile thioester linkage. The N-extein is then transferred by a transesterification to the first residue of the C-extein (Thr), which results in the formation of a standard branched intermediate as in class 1 inteins (Tori et al. 2010).

Fig. 3
figure 3

Splicing mechanism of inteins. Intein splicing takes place in four reaction steps. See text for details

Site-specific cleavage of the intein−extein junctions in class 1 inteins can be achieved by mutation of the conserved intein residues. Mutation of the Asn residue at the intein C-terminus abolishes steps 3 and 4 of the splicing reaction and results in N-terminal cleavage. Since step 1 still occurs, the (thio)ester bond can spontaneously hydrolyze, separating the N-extein from the intein/C-extein portion. Mutation of the conserved first residue of the intein abolishes steps 1, 2, and 4 of the splicing reaction and leads to C-terminal cleavage. In such a mutated intein, Asn cyclization (step 3) still occurs, to separate the C-extein from the N-extein/intein portion. Controllable cleavage of modified cis-splicing inteins has been adapted for a wide range of useful applications in molecular biology and biotechnology (see below).

Interestingly, inteins can also exist as two fragments encoded by two separately transcribed and translated genes. These so-called split inteins self-associate and catalyze protein-splicing activity in trans. The first native split intein capable of protein trans-splicing was identified in the cyanobacterium Synechocystis sp. strain PCC6803. The N- and C-terminal halves of the Synechocystis catalytic subunit alpha of DNA polymerase III DnaE are encoded by the dnaE-n and dnaE-c genes, which are more than 700 kb apart (Wu et al. 1998).

Split inteins have been identified in diverse cyanobacteria and archaea (Caspi et al. 2003; Choi et al. 2006; Dassa et al. 2007; Liu and Yang 2003; Wu et al. 1998; Zettler et al. 2009), but have not been found in eukaryotes thus far. Recently, a bioinformatic analysis of environmental metagenomic data revealed 26 different loci with a novel genomic arrangement. At each locus, a conserved enzyme coding region is interrupted by a split intein, with a free-standing endonuclease gene inserted between the sections coding for intein subdomains. This fractured gene organization appears to be present mainly in phages (Dassa et al. 2009).

Trans-splicing of inteins can also be artificially engineered from cis-splicing bacterial and fungal inteins (Elleuche and Pöggeler 2007; Mills et al. 1998; Mootz and Muir 2002; Southworth et al. 1998). This is achieved mainly by separating naturally or artificially split inteins between motifs B and F, resulting in N-terminal intein fragments (IN) of 70−110 amino acids, and C-terminal intein fragments (IC) of ∼40 amino acids. Inteins can also be artificially split at other sites. The Ssp DnaB mini-intein can be split at different loop regions between β-strands, yet still maintain the ability to splice in trans. Even an intein split into three pieces can function in protein trans-splicing, even if one piece is only an 11-amino acid IN (Sun et al. 2004). Naturally split inteins and engineered split inteins can be used in various applications. In the following sections, we will briefly summarize some of the many uses of inteins as molecular biology and biotechnology tools.

Applications of inteins in biotechnology

Inteins are valuable tools in a wide range of biotechnological applications. The ligation of peptides and proteins using the natural splicing activity of inteins is known as intein-mediated protein ligation (IPL), or expressed protein ligation (EPL), and is already well established in molecular biology and biotechnology methods (Evans et al. 1999; Muir et al. 1998; Severinov and Muir 1998). Furthermore, inteins have been used for segmental labeling of proteins for NMR analysis, cyclization of proteins, controlled expression of toxic proteins, conjugation of quantum dots to proteins, and incorporation of non-canonical amino acids (Arnold 2009; Charalambous et al. 2009; Oeemig et al. 2009; Seyedsayamdost et al. 2007; Züger and Iwai 2005). In basic research studies, they have been used to monitor in vivo protein−protein interactions, or specifically translocate proteins into cellular organelles (Chong and Xu 2005; Ozawa and Umezawa 2005; Ozawa et al. 2003, 2005). Most of the inteins used in biotechnology are derived from prokaryotic organisms, or are engineered variants of the S. cerevisiae VMA1-intein.

Intein-mediated protein purification

Isolation of large amounts of highly purified proteins is a major task in biotechnology. The development of a wide range of affinity tags has greatly simplified the separation of recombinant proteins from crude extracts. Conventional affinity tags are fused as a tag sequence at the DNA level. Originally developed to isolate proteins on affinity columns or beads, and to detect proteins by Western blot, the fusion of different tag proteins and peptides can also improve the solubility and folding of the target protein (Terpe 2003; Waugh 2005). Furthermore, N-terminal tags have the advantage of enabling the efficient translation of a recombinant protein, by providing a reliable ribosome initiation site. A drawback of this approach is that the affinity tag often needs to be cleaved off the fusion protein by proteolysis with a site-specific endoprotease. For industrial applications, the removal of the affinity tag by endoproteases is the most costly step in protein production, and can interfere with the biological activity of the purified component (Wood et al. 2005). Therefore, intein-mediated bioseparation has become an excellent vehicle for affinity-tag-based protein purification techniques, and is an alternative to conventional cleavage by site-specific endoproteases.

The potential of intein-facilitated purification for a variety of proteins has been described in dozens of reports (Bastings et al. 2008; Chong et al. 1997; Gillies et al. 2008; Liu et al. 2008; Sharma et al. 2006; Singleton et al. 2002; Srinivasa Babu et al. 2009; Zhao et al. 2008).

Intein-mediated protein purification began in 1997 as a new field in biotechnology, when the Sce VMA1 intein was engineered to be used in purification of several prokaryotic and eukaryotic proteins (Chong et al. 1997). In principle, the exteins of an engineered intein are exchanged between the purification tag and the target protein. As described above, complete splicing of the intein can be inhibited by mutation of conserved residues at the splice junction fused to the affinity tag. This results in site-specific cleavage only at the intein−target protein border.

The immature precursor protein is usually produced in a heterologous host, and protein crude extracts are loaded on an affinity column. After immobilization of the engineered protein and washing, the N- or C-terminal cleavage reaction is induced by either a strong nucleophile such as dithiothreitol (DTT) for the N-terminal cleavage, or a pH or temperature shift for the C-terminal cleavage (Chong et al. 1997; Mathys et al. 1999; Wood et al. 1999).

The intein-mediated purification with affinity chitin-binding tag (IMPACT) system is commercially available from New England Biolabs. This system uses a modified Sce VMA1 intein, fused at its C-terminus to the chitin-binding domain (CBD), and at its N-terminus to the protein of interest (Chong et al. 1997) (Fig. 4). Mutation of the C-terminal reactive Asn to Ala in the intein blocks the splicing reaction after the N−S acyl shift, and prevents C-terminal cleavage. The fusion protein accumulates as an unspliced precursor, and is purified by absorption to a chitin resin. After the addition of thiols, which serve as reactants for the transesterfication reaction, N-terminal cleavage is initiated. This leads to release of the target protein with an activated thioester at the C-terminus, while the intein-CBD remains bound to the column. A new version of the IMPACT system (IMPACT CN) allows the fusion of a self-cleavable intein tag to either the C-terminus or the N-terminus of a target protein. In contrast to the first generation IMPACT-vectors, pTWIN-vectors (New England Biolabs) contain two inteins that can be used separately for protein purification, either by fusing the protein of interest to the C-terminus of one intein, or to the N-terminus of a second intein. The inteins can also be used in combination to purify a single protein by independent regulation of the cleavage reactions of intein 1 and intein 2. This method has been demonstrated using green fluorescent protein (Zhao et al. 2008).

Fig. 4
figure 4

Intein-mediated protein purification. Expression and purification of a protein fused to the N-terminus of a mutated intein variant. The intein is further fused to a chitin binding domain (CBD) at its C-terminus. Purification of the fusion construct is accomplished by the utilization of a chitin column. The intein is mutated at the C-terminus (Asnn), so that cleavage only occurs at the N-terminus, which results in the release of the target protein from the column while the intein-CBD tag remains bound to chitin

Intein-mediated protein purification using non-chromatographic tags

An alternative to classical tagging systems that does not require expensive affinity resins is intein-mediated protein purification with new tags developed for non-chromatographic purification techniques.

One example is the combination of elastin-like polypeptides (ELP) and self-splicing inteins (Wu et al. 2006). Intein-independent protein purification methods using ELP were developed more than 10 years ago (Meyer and Chilkoti 1999). Under high-salt conditions at about 30 °C, ELP reversibly self-associates and forms insoluble aggregates that can be precipitated by centrifugation (Wu et al. 2006). A target protein fused to an intein−ELP fusion can be separated from a crude protein extract by repeated aggregation and centrifugation cycles. Finally, inducible intein cleavage enables recovery of the target protein from the intein−ELP moiety, leading to a highly pure elution fraction (Wu et al. 2006). The aggregation characteristics of ELP can be controlled by temperature, concentration, type of salt, or polypeptide length (Floss et al. 2009; Fong et al. 2009). The intein−ELP approach was recently demonstrated to be a low-cost, convenient, and potential way of generating small antimicrobial peptides (Shen et al. 2010).

Another non-chromatographic purification approach takes advantage of a multiple phasin tag, produced by bacterial species like Cupriavidus necator (formerly known as Alcaligenes eutrophus and Ralstonia eutropha), that specifically binds to polyhydroxybutyrate granules (PHB) in Escherichia coli (Banki et al. 2005; Wieczorek et al. 1995). Polyhydroxyalkanoic acids are naturally produced as a wide range of storage polymers, by several bacterial species. Their biosynthesis genes can be heterologously expressed in E. coli, producing polymeric granules that can be used in place of classical affinity resins for protein purification (Choi et al. 1998). The co-production in E. coli of phasin-tagged proteins and PHB granules enables the easy separation of the tagged target protein from crude extract, after cell lysis and centrifugation. In an impressive advancement of this system, inducing cleavage of an engineered intein releases the untagged target protein from an intein−phasin moiety, and from the bound PHB granules (Banki et al. 2005; Georgiou and Jeong 2005). In a system invented by Wood and coworkers, a pH- or temperature-inducible Mtu RecA intein from Mycobacterium tuberculosis is used for phasin tag-mediated protein purification in E. coli, and the authors note that the system is easily applicable to a wide range of host systems (Banki et al. 2005; Gillies et al. 2009).

The ELP and PHB systems are both highly flexible, and function efficiently with a variety of proteins, under many different conditions (Banki et al. 2005; Ge et al. 2005; Wu et al. 2006). Furthermore, both systems have been adapted for the Gateway cloning system (Invitrogen), for rapid and easy characterization of a gene product using different vector systems (Gillies et al. 2008). The Gateway system generates a single Entry clone, from which the gene of interest is introduced directly, by simple recombination, into a number of different vectors. In addition to intein-mediated ELP and phasin fusion-protein purification, the Gateway system has been adapted for intein-mediated protein purification using classical affinity tags like maltose binding protein and CBD (Gillies et al. 2008).

Intein-mediated protein purification in large-scale processes

The use of intein-mediated procedures in bioseparation is well established at the laboratory scale and is attracting increasing interest in biotechnology. The potential of these protein purification techniques for large-scale protein production is clear, but intein-mediated protein purification systems under industrial, scaled-up conditions must be developed. The simplicity of intein-mediated protein purification, with its few purification steps and low requirement for agents, suggests that scale-up approaches have the potential to be economical in the future. Since intein-mediated cleavage does not require further downstream processing, it reduces the costs from expensive protease enzymes. Wood and co-workers designed a hypothetical scale-up method based on the DTT-inducible IMPACT system, and identified the Tris−HCl reaction buffer and the thiol compound to be the most costly ingredients in this process. They suggested exchanging the buffer system with a cheaper phosphate buffer. Cleavage induction by chemical compounds could be circumvented using inteins that are induced by physical changes (Wood et al. 2005). Furthermore, the use of non-chromatographic affinity tags could eliminate the need for expensive columns, and may also be easy to scale up. The development of recently published vectors based on Invitrogen's Gateway cloning system will facilitate production of a target protein fused to four different protein tags. Using this system, a target protein can be easily tested with different tags in a high-throughput manner (Gillies et al. 2008). In a recent review, Fong et al. (2010) describe various non-chromatographic self-cleaving purification tags and their potential industrial applications.

Self-circularization by inteins

The generation of cyclic peptides is a rapidly growing field in molecular biology and chemistry. Several methods have been established producing cyclic proteins that are exceptionally stable to chemical, thermal, or enzymatic degradation, and exhibit a higher specific activity in circular form. Increased stability is achieved from the resistance to exoproteases, which are not capable of degrading cyclic peptides. A variety of organisms produce circular peptides with a variety of bioactivities, including anti-bacterial, uterotonic, haemolytic, and cytotoxic activity (Craik 2006). For example, the filamentous ascomycete Tolypocladium inflatum produces the well-known cyclosporin A, which has long been used as an immunosuppressive drug (Thali 1995). Cyclosporin A and other known cyclic peptides are small rings of fewer than a dozen amino acids, and are produced by multidomain enzymes called peptide synthetases (Billich and Zocher 1987; Weber et al. 1994). However, many other circular proteins are synthesized as linear chains of amino acids, with the amino terminus of one residue linked to the carboxyl terminus of the next. These include cyclotides, a family of bioactive proteins from plants that contain a head-to-tale backbone and a knotted arrangement of three disulfide bonds (Craik et al. 1999).

Generally, cyclic antibiotics display an increased activity and stability in comparison to their linear analogs. Increased stability is derived from the resistance to exoproteases, which are not capable of degrading cyclic peptides. Furthermore, precisely designed small cyclic peptides can have similar specificity as endogenously produced antibiotics (Cheriyan and Perler 2009).

Inteins play a growing role in the production of cyclic peptides through the aforementioned IPL technique (Evans et al. 1999). The protein of interest is fused through its C-terminus, to the N-terminus of an intein, in which the C-terminal Asn has been mutated to be incapable of cleaving the C-terminal binding tag. The N-terminus of the target protein is altered so the second residue after the Met is a Cys. After purification of the target protein, the N-terminal Met is removed using methionyl-aminopeptidase from E. coli, resulting in an N-terminal Cys residue (Sancheti and Camarero 2009; Tavassoli et al. 2005). After purification, including elution from the column by thiol-induced N-terminal cleavage of the intein, the linear peptide contains a C-terminal thioester and a Cys at its N-terminus that can react to form a new peptide bond.

The pTWIN vector of the TWo INtein system contains two engineered inteins (Evans and Xu 1999). The mutated Synechocystis sp. Ssp DnaB intein allows C-terminal cleavage, while Mycobacterium xenopy Mxe GyrA intein undergoes N-terminal cleavage. The combination of both proteins fused to the N- and C-terminal ends of a target protein enables the production of an N-terminal Cys residue and an activated thioester at the C-terminus, which react, resulting in cyclization (Fig. 5a). A disadvantage to this method is the low cleavage efficiency of the Ssp DnaB intein, which is influenced by the second and third amino acid residues following the required Cys at the N-terminus of the target protein. The introduction of a non-native linker sequence improves cleavage efficiency, but also has the potential to interfere with the biological activity of the cyclic protein. Another problem is the possibility of polymerization instead of cyclization by activated peptides (Xu and Evans 2001).

Fig. 5
figure 5

Self-circularization of peptides. a Self-circularization of proteins using the TWIN-system. A target protein is embedded between two intein sequences, which are modified for N- or C-terminal cleavage, respectively. The inducible splicing reaction of the inteins leads to the generation of an activated thioester residue and an N-terminal Cys for the spontaneous circularization of the linear peptide. b Utilization of the Split Intein-mediated Ciruclar Ligation Of Peptides a ProteinS (SICLOPPS) also enables the circularization of peptides. In this system, the order of the naturally split Synechocystis sp. Ssp DnaE intein is inverted (IC–target protein–IN), and the reconstitution of the Ssp DnaE intein allows the efficient cyclization of the target protein

Split inteins have also been applied in the generation of cyclic proteins and peptides. The very timely and elegant Split Intein-mediated Ciruclar Ligation Of Peptides a ProteinS (SICLOPPS) system uses the naturally split Synechocystis sp. Ssp DnaE intein, which is fused in a rearranged order (IC–target protein–IN), allowing the efficient cyclization of the target protein by reconstitution of the Ssp DnaE intein (Fig. 5b). Using this method, it was possible to generate cyclic peptides that are short as eight amino acid residues. SICLOPPS has been used in inhibitor studies for the rapid synthesis of very large cyclic peptide libraries that are superior to the traditional chemically generated libraries, and which can be screened in vivo for new potent therapeutic drugs (Scott et al. 1999; Tavassoli and Benkovic 2007). In recent years, SICLOPPS has been impressively used, for instance, to identify several inhibitors for the dimerization of ribonucleotide reductase and 5-aminoimidazole-4-carboxyamide-ribotide transformylase (for a recent review, see Cheriyan and Perler 2009).

Inteins in selenoprotein production

The 21st amino acid selenocysteine (Sec) is encoded by a UGA codon in several prokaryotic and eukaryotic proteins. Sec is incorporated during translation in a process known as recoding (Driscoll and Copeland 2003). Many selenoproteins are selenoenzymes with a single Sec residue in the active site. Since prokaryotes and eukaryotes have different UGA recoding machineries, producing selenoproteins and analyzing the characteristics of selenoenzymes in heterologous hosts is challenging (Hondal 2009). The first attempts to produce Sec-containing mammalian thioredoxin reductase (TrxR) heterologously, were undertaken in E. coli. In the heterologous host, UGA codes for Sec only when a specific stem-loop structure called the selenocysteine insertion sequence element (SECIS) is present in the mRNA template in close proximity to UGA, and the trans-acting factors SelA-D is also synthesized in the cell. Since the Sec residue of mammalian TrxR is close to the C-terminus, a SECIS element was cloned at the 3′-end of the mammalian gene (Arnér et al. 1999).

Incorporation of internal Sec residues into heterologous proteins is achieved using native chemical ligation (NCL), the related IPL, or chemical conversion of reactive Ser residues. The NCL technique facilitates the synthesis of moderately sized proteins by ligation of a peptide with a reactive thioester at the C-terminus, and a second peptide containing a Cys or a Sec at the N-terminus (Dawson and Kent 2000; Hondal 2009). For IPL, the Sec-containing module is synthetically produced, while the Sec-less protein moiety is synthesized as a recombinant protein in a heterologous host, and is purified and activated by intein-mediated protein purification (Hondal 2009).

A new invention in Sec-protein production named sectein has recently been patented by the Arner group (Arnér et al. 2009). The sectein system couples expression of an intein sequence with a bacterial SECIS element and combines the advantages of SECIS elements with protein splicing, for a process that is independent of the Sec position or selenoprotein size (Fig. 6). Unlike the IPL method, chemical production of a Sec moiety is not required. The N-terminus of a selenoprotein containing the UGA Sec codon at its 3′-end acts as an N-extein. It is fused to the Penicillium chrysogenum Pch PRP8 intein containing a SECIS element at its 5′-end (Arnér et al. 2009; Elleuche et al. 2006). The SECIS element directs the incorporation of Sec into the peptide during translation. The C-terminus of the selenoprotein acts as the C-extein and is fused to the C-terminus of the intein. After translation, the precursor protein has a Sec residue in the N-extein, directed by SECIS element in the intein. Through protein splicing, the Sec-containing N-extein is fused to the C-extein and a mature selenoprotein is formed. The SECIS element is excised with the intein (Arnér et al. 2009).

Fig. 6
figure 6

Intein-mediated production of Sec-containing proteins. A UGA-codon is translated to a Sec, when a SECIS element is in close proximity at the N-terminal part of the modified fungal Pch PRP8-intein. The incorporation of the Sec residue followed by the protein splicing process leads to the production of a mature Sec-containing protein

Outlook

Since their discovery 20 years ago, the application of natural and artificial inteins has become a new and rapidly growing field in molecular biology. Protein splicing not only enriches the possibilities of posttranslational processing, but also has many prospects for applications. The protein splicing process as a protein engineering tool will become more widespread in industrial applications. The challenge is to scale up and optimize intein-mediated techniques, making them applicable and economically attractive for biotechnological processes.