Introduction

Proteins play a crucial role in fundamental physiological and biochemical functions of life. With the development in decoding genome sequences of many organisms, an infinite number of new peptide and protein sequences are discovered at a rapid rate. The structures, functions, and mechanisms of action of these sequences should be elucidated. These studies begin with the production of a particular protein. In general, three techniques are used for protein production, namely, in vivo expression, cell-free protein synthesis, and chemical synthesis.

Most studies on the structure and function of proteins are based on the recombinant DNA-based expression of proteins in genetically engineered cells. This powerful technology enables the production of a large amount of native proteins and allows changes in amino acid sequences of proteins via site-directed mutagenesis for further research. Although it is the most widely used method to produce a protein of interest, it has several disadvantages. First, it has difficulty in overexpressing proteins that are toxic to the cell, such as proteases [1]. Second, given that living cells are used for protein production, such synthesis systems are inherently limited to the 20 genetically encoded amino acids. Even though some non-natural amino acids can be incorporated genetically in bacteria, yeast, and mammalian cells through a number of orthogonal tRNA/synthetase pairs, these methods are currently not accessible by most researchers [2,3,4,5].

Cell-free protein expression is an alternative cell-based method because it offers a simple and flexible system for the rapid synthesis of folded proteins, drastically reducing the time taken to transform from DNA sequence to functional protein [6]. Cell-free protein production is achieved by combining a crude lysate from growing cells, which contains all the necessary enzymes and machinery for protein synthesis (including transcription and translation), with the exogenous supply of essential amino acids, nucleotides, salts, and energy-generating factors and introducing exogenous messages including RNA (mRNA) or DNA as template into the system [7, 8]. The cell-free system presents two obvious advantages over in vivo protein expression. One is that cell-free systems allow the efficient incorporation of non-natural or chemically modified amino acids into the expressed protein at desired positions during translation, thereby generating novel molecules for proteomic applications [9,10,11,12,13,14]. However, many unnatural amino acids are simply incompatible with ribosomal polypeptide synthesis [15]. The other is that cell-free systems can produce proteins that are not physiologically tolerated by the living cell—e.g., toxic, proteolytically sensitive, or unstable proteins [16].

Chemical synthesis is playing an increasingly important role in protein production, especially after Merrifield’s innovation of solid-phase peptide synthesis (SPPS) in 1963 [17]. Nevertheless, homogeneous long peptides are difficult to obtain through SPPS. However, chemical ligation, which needs the SPPS to produce the segments of protein, can solve this problem. The biggest advantages of this technology are the free incorporation of unnatural amino acids and synthesis of proteins, which are toxic to the living cell. In this paper, SPPS and methods of chemical ligation were reviewed.

SPPS

Prior to the development of SPPS, chemists performed peptide synthesis via classical solution-phase methods. Purification must be done after coupling every amino acid through extraction and chromatography using a column of silica gel, which is time consuming and inconvenient. In 1963, Merrifield proposed an alternative approach for preparing peptides [17]. The basic concept was to covalently attach the first amino acid to an insoluble support and to elongate the peptide chain from this support-bound residue. After incorporating the desired number of monomers through a series of coupling and deprotection steps, the peptide was removed from the solid support.

The principles of SPPS are illustrated in Fig. 1. The N-protected C-terminal amino acid residue is anchored via its carboxyl group to a hydroxyl or amino resin to yield an ester- or amide-linked peptide that will ultimately produce a C-terminal acid or a C-terminal amide peptide, respectively. After loading the first amino acid, peptide synthesis is elongated by the deprotection and coupling cycles from the C-terminus to the N-terminus (C → N strategy). Given that the removable protecting group in every cycle is protecting the amino group of the amino acid residue, the elongation of peptide synthesis is usually from the C-terminal to the N-terminal. After every step of deprotection or coupling, excess reagent is washed away by organic solvents (e.g., DCM and DMF). All the functional groups on the side chains of the amino acids should be protected by semi-permanent protecting groups, which avoid the side reaction with the reagents during the synthesis. In general, two strategies can be used for SPPS, namely, Boc and Fmoc approaches for the α-amino protecting group. Typically, Boc can be removed by 30% TFA in DCM and Fmoc can be removed by 20% piperidine in DMF. Final cleavage of the peptidyl resin and side-chain deprotection requires a strong acid, such as anhydrous hydrogen fluoride (HF) or trifluoromethanesulfonic acid (TFMSA), in the case of Boc chemistry, and TFA in Fmoc chemistry [18]. The crude peptides can be analyzed and purified by HPLC and characterized by mass spectrometry.

Fig. 1
figure 1

Principles of SPPS

For Fmoc-based SPPS, Rink amide resin, Pal resin, and Sieber resin are commonly used for the synthesis of C-terminal peptide amides [19,20,21], whereas Wang resin, Sasrin resin, HMPB resin, trityl chloride resin, and 2-chlorotrityl chloride resin are widely used for the synthesis of C-terminal peptide acids [22,23,24,25] (Fig. 2).

Fig. 2
figure 2

Resins for Fmoc-based SPPS

For Boc-based SPPS, Merrifield resins are commonly used for C-terminal peptide acid synthesis [26], and MBHA resins are widely used for C-terminal peptide amide synthesis [27, 28] (Fig. 3).

Fig. 3
figure 3

Resins for Boc-based SPPS

Unprotected peptide segments, which can be made by SPPS, have good solubility in solution and can be directly characterized. Using SPPS, peptides can be easily synthesized in repetitive steps of amino acid assembly. However, given the imperfection of the chemistry used for peptide synthesis and possible interchain aggregation of the long growing protected peptide chain, the use of SPPS for the synthesis of very large peptides or proteins remains challenging. For this reason, chemical ligation methods appear attractive from a strategic perspective.

Chemical Ligations

Given that long peptides cannot be easily obtained through SPPS, chemical ligation can help address this problem. First, small peptides are prepared by SPPS and then coupled via chemical ligation (Fig. 4).

Fig. 4
figure 4

Principle of chemical ligation

Classical solution synthetic chemistry uses fully protected peptide segments for condensation in organic solvents [29]. This approach to the synthesis of long polypeptide chains suffers from a number of shortcomings [30]. First, the activation of the C-terminal carboxyl group of a protected peptide chain gives rise to epimerization of the C-terminal amino acid under basic conditions for condensation with another peptide segment nucleophile. Second, fully protected peptide segments are difficult to effectively purify and characterize. For example, electrospray mass spectrometry has become one of the most useful tools for determining the covalent structure of peptide; this powerful method involves direct ionization of an analyte from aqueous solution. The efficiency of such ionization depends on the presence of multiple ionizable groups in the molecule under study. The lack of such groups in fully protected peptides precludes direct analysis by electrospray mass spectrometry. Third, and most seriously, many fully protected peptides have only limited solubility in organic solvents that are used for peptide synthesis. The low concentrations of reacting peptide segments often lead to slow and incomplete coupling reactions [31, 32].

The method developed by Blake and Li [33] (see Fig. 5) is a useful segment condensation method because it involves the use of a unique C-terminal thiocarboxyl group for coupling with an N-terminal amine. The condensation between two peptides is achieved through selective activation of the thiocarboxyl group by silver ions. [Gly17]-β-endorphin, β-lipotropin, α-inhibin-96, human pancreatic growth hormone-releasing factor, and two omission analogs [34] were successfully synthesized using this strategy. An alternative thioester-based method can be employed to replace the thiocarboxylate, which is unstable and gradually decomposes by oxidative hydrolysis [34, 35]. A thioester can also be activated in the presence of silver ions. The activation of thiocarboxylate or thioester by silver ions is specific and functions in the presence of other free carboxyl groups on the amino acid side chains.

Fig. 5
figure 5

Synthesis of [Gly17]-β-endorphin [33]

However, these condensation methods still require partially protected peptides as building blocks, and the efficiency of the intermolecular coupling reaction is inherently low for large peptide segments because of the difficulty of achieving high molar concentrations for high-molecular-weight compounds [36]. Nevertheless, if the local concentration of the reacting carboxyl and amino groups can be increased by bringing the two peptides together, the proximity effect may facilitate the peptide bond formation reaction.

Chemoselective ligation, which refers to the aqueous covalent coupling of unprotected, highly functionalized biomolecules that contain mutually and uniquely reactive functional groups, is an efficient way for peptide coupling. In chemoselective peptide coupling, the pair of functional groups that undergoes ligation reacts together exclusively; other functional groups remain unreactive. Therefore, fully unprotected peptides that are dissolved in aqueous buffer with high concentration can be directly used for peptide ligation. Several different chemoselective ligations are reviewed below.

Thioester-Forming Ligation

Schnölzer and Kent [37] developed a novel total chemical synthesis approach that combined the act of joining two peptides with the generation of an isosteric analog linkage. Using this method, they successfully synthesized an HIV-1 PR analog with full enzymatic activity. Their strategy was to react the unprotected N-terminal of HIV-1 PR with the corresponding unprotected C-terminal to obtain a full-length molecule with the newly formed pseudo-peptide bond at the ligation site (Fig. 6). A thioacid sulfur nucleophile at the C-terminus of N-terminal peptide was used in nucleophilic substitution reaction to attack an alkyl bromide at the terminus of C-terminal peptide, thereby covalently linking the two segments together through a thioester analog of the peptide bond. The chemical selectivity was so high that all the protecting groups of the two segments were unnecessary. This chemical ligation approach is also versatile. By replacing bromoacetic acid with another α-bromo carboxylic acid, analogs of other amino acids can be introduced [38].

Fig. 6
figure 6

Strategy for the total chemical synthesis of the HIV-1 PR analog [37]

Oxime and Hydrazone-Forming Ligation

Thioester-forming ligation is efficient and versatile, but the thioester products are unstable at neutral pH. Rose [40] (Fig. 7) developed oxime-forming ligation to synthesize stable macromolecules of defined structure. First, purified polypeptide fragments, carrying either aldehyde or aminooxy groups but devoid of protecting groups, were prepared. Second, these fragments spontaneously self-assembled on an appropriate synthetic template under very mild conditions through formation of oxime bonds. The resulting oximes were stable in water at room temperature at pH 2–7.

Fig. 7
figure 7

Oxime-forming ligation

Similar to oxime-forming ligation, at pH 3–5, hydrazine-labeled fragments ligated with fragments possessing an aldehyde functionality [41, 42] (Fig. 8). The obtained ligation products were less thermodynamically stable and more reversible than those from other techniques. The products can be dissociated to their individual protein components by an exchange reaction with excess acetyl hydrazide.

Fig. 8
figure 8

Hydrazone-forming ligation

Thiazolidine/Oxazolidine-Forming Ligation

Thiazolidine/oxazolidine-forming ligation was first developed by Liu and Tam [36]. It involves the formation of a thiazolidine/oxazolidine as the capture device between a C-terminal peptide-ester aldehyde and an N-terminal 1,2-thiol/hydroxyl amine group. The capture reaction is especially efficient for the 1,2-thiol amine moiety, as the aldehyde group selectively condenses with an N-terminal cysteine residue to form a stable thiazolidine intermediate. Subsequently, the acyl transfer reaction occurs through a five-membered ring transition state to yield a proline-like imidic bond (as shown in Fig. 9a). The synthesis of the peptide-ester aldehyde is shown in Fig. 9b.

Fig. 9
figure 9

a Outline of ligation through thiazolidine/oxazolidine capture. b Synthesis of peptide-ester aldehyde (i) DMF, 60 °C, 24 h; (ii) 30% TFA in CH2Cl2 (2–5% H2O)

Although there are potentially many different ligation products, which may be generated from nucleophilic attack on aldehyde by different amino acid side chains, only thiazolidine is stable during ligation and present in a significant amount. Ligation at acidic pH can avoid ester hydrolysis and side reactions with other nucleophiles, such as side chains of lysine or arginine [43]. Once the ligation intermediate forms, intramolecular acyl transfer can be facilitated by increasing the reaction pH. Ligation by thiazolidine appeared to be superior to ligation by oxime or hydrazone in reaction rate and product stability [44].

The above-mentioned ligation chemistries gave a nonpeptide bond at the site of ligation. Many attempts were also made to yield a native amide bond at the ligation site of chemical ligations. A study in 1953 reported that peptide bond formation between an amino acid thioester and a cysteine derivative can occur via an intramolecular acyl transfer reaction [45]. Therefore, the same principle was applied to the development of the currently known peptide ligation methods. The most popular ones are native chemical ligation (NCL) and thioacid-capture ligation.

Ligation by Disulfide Exchange (Thioacid-Capture Ligation)

In this ligation method, as shown in Fig. 10, the N-terminal Cys thiol group of one peptide is activated by 5-nitropyridyl-2-sulfenyl (Npys) and then attacked by the nucleophilic sulfur atom of a C-terminal thioacid of the other peptide. The acyl-disulfide intermediate then undergoes rearrangement through a six-membered ring transition state to give a ligation product, in which hydrosulfide is added to cysteine thiol. The final ligation product can be generated by removing hydrosulfide using a reducing agent [46]. A 32-residue model peptide was first synthesized by this disulfide ligation method by coupling a 17-residue Npys-modified C-terminal peptide and a 15-residue N-terminal peptide thioacid. Capture and acylation were performed in a mixed acetonitrile–water solution. Given the activated nature of Npys–S bond and strong nucleophilicity and low pKa of the thioacid compared with the alkylthiol of cysteine, the capture reaction proceeds rapidly and is completed almost instantaneously at pH 2–3. After adjusting the pH to 5–6, efficient acyl transfer occurs through a six-membered ring intermediate, resulting in 90% of a hydrosulfide-derivatized peptide in only 5 min. The desired ligation product is obtained by treatment with dithiothreitol.

Fig. 10
figure 10

General scheme of thioacid-capture ligation

NCL

As mentioned earlier, facile peptide bond formation was observed by Wieland et al. [45] when reacting a thioester with a cysteine compound. A thiol–thioester exchange reaction was proposed, which formed a new thioester between the acyl group and the side-chain thiol of cysteine. This reaction was followed by an intramolecular S to N acyl transfer for peptide bond formation. In 1994, based on the original principles of the chemical ligation method, Dawson et al. [47] introduced an ingenious extension of the chemistries used for the chemoselective reaction of unprotected peptide segments—NCL (Fig. 11). Thus, for the two components of the ligation reaction, one has a cysteine residue at its N-terminus and the other has a thioester at its C-terminus. The thiol group of the cysteine residue participates in an exchange reaction with the thioester in a highly chemoselective manner to form a new thioester through a thiol additive (i.e., PhSH). This thioester intermediate spontaneously rearranges, resulting in the formation of an amide bond between the two peptides with the cysteine residue at the ligation site. A feature of NCL is that ligation occurs at a unique N-terminal Cys residue. Regardless of how many additional internal Cys residues are present in either segments [47, 48], no protecting groups are necessary for any of the side-chain functional groups normally found in proteins. Subsequently, quantitative yields of the ligation product are obtained.

Fig. 11
figure 11

Native chemical ligation

Homocysteine can also be used for ligation, which can be further methylated to form a methionine residue at the coupling site [49]. Preparation of peptide thioalkylesters by direct SPPS further enhanced the practical value of this type of chemical ligation [50]. Moreover, addition of the mixture of tris(2-carboxylethyl)phosphine (TCEP) and 3-mercaptopropionic acid maintains the reducing environment of the ligation reaction to prevent N,S-bisacylated byproduct and reduce disulfide formation, thereby improving the overall ligation yield [50].

The Kent group [51] studied the effects of thiol catalysts on the ligation rate. Among a set of different thiol catalysts, phenyl thiols are superior ligation catalysts compared with alkanethiols. Although thiol–thioester exchange is rapid and not rate limiting for alkanethiols, such as MESNA and benzyl mercaptan, they are not good leaving groups. Therefore, transthioesterification by the N-terminal cysteine becomes rate limiting. To facilitate the reaction, an optimal phenylthiol catalyst that can undergo rapid and complete thiol exchange with the peptide thioester while maintaining high reactivity toward cysteine transthioesterification can be used to catalyze the ligation reaction.

Thioester-mediated NCL is a widely used peptide ligation method. It involves the reaction of a thioester fragment and an N-terminal cysteine fragment followed by S → N acyl transfer to yield a ligation product with a cysteine as the ligation junction [47]. This process has stimulated considerable interest in developing new and convenient methods to prepare these component compounds for NCL in recent years [52, 53]. Cysteinyl peptides can be obtained by peptide synthesis via standard Fmoc SPPS [54]. However, unprotected C-terminal thioesters are not easily acquired. Traditionally, peptide thioesters are synthesized by Boc SPPS [17] on a thioester-linked peptidyl resin [50, 55, 56]. However, thioester is not stable during the removal of Fmoc groups via treatment with 20% piperidine/DMF in Fmoc SPPS. Although the Boc SPPS method is highly effective, the need for a highly hazardous strong acid such as HF at the final cleavage step is a deterrent in many research laboratories. Such a requirement is a limiting factor for the wide use of NCL. Many methods have been developed to overcome this problem.

Direct Fmoc solid-phase synthesis of peptide thioesters on a thioester linker is also possible using a modified Fmoc deprotection protocol. Aimoto has reported a direct preparation of peptide thioesters using 25% (v/v) 1-methylpyrrolidine, 2% (v/v) hexamethyleneimine, and 2% (w/v) HOBt in a one-to-one mixture of NMP-DMSO as the deprotection reagent, through which the Fmoc group is removed and the thioester is intact [57, 58]. However, this method is restricted to relatively small peptides, and racemization of the C-terminal amino acid residue is a non-negligible problem. Considerable efforts have been devoted in developing alternative strategies to obtain thioester peptides indirectly from non-thioester precursors; this type of synthesis is compatible with standard Fmoc chemistry [39, 59,60,61,62,63].

Some compounds such as glycopeptides or glycoproteins will be damaged through repeated exposure of the peptide to TFA and cleavage/deprotection with HF in Boc chemistry. Thus, modified Fmoc chemistry that requires only a single cleavage/deprotection step with TFA to synthesize such peptides should be developed. An example is the use of an alkanesulfonamide “safety-catch” linker, which permits the solid-phase synthesis of peptide thioesters using standard Fmoc chemistry protocols [61,62,63]. After assembling all the building blocks of the peptide, iodoacetonitrile is added to activate the peptide to yield the secondary sulfonamide. The addition of nucleophilic thiols leads to cleavage of the protected peptides as thioesters (see Fig. 12).

Fig. 12
figure 12

Adapted from Ref. [52]

Strategy for generating thioester by “safety-catch” resin.

Another common approach for synthesizing thioesters employs a highly acid-sensitive resin (e.g., 2-chlorotrityl resin), which allows cleavage under mild acidic conditions to yield protected peptides. The protected peptides after cleavage from the resin can be activated and coupled to a thiol to generate thioesters. Global deprotection finally affords an unprotected peptide thioester (see Fig. 13) [57].

Fig. 13
figure 13

Adapted from Ref. [52]

Strategy for generating thioester.

An ester–thioester rearrangement method is used for generating thioester as well (see Fig. 14) [64]. Peptide assembly is performed on resin-bound S-protected β-mercapto-α-hydroxypropionic acid. After cleavage, thiophenol is added, which removes the disulfide protection and triggers an intramolecular O → S shift to yield thioester.

Fig. 14
figure 14

Adapted from Ref. [52]

Strategy for generating thioester.

Another protocol that also utilizes a disulfide-protected thiol in the β position employs phenyl ester. After coupling with a protected peptide and deprotection, an excess of thiol is added, which induces the O → S shift and thiol–thioester exchange (see Fig. 15) [65].

Fig. 15
figure 15

Adapted from Ref. [52]

Strategy for generating thioester.

Blanco-Canosa and Dawson et al. [60] developed a new method of generating thioesters by Fmoc chemistry based on an acylurea linker (see Fig. 16). O-aminoanilide 1 can be efficiently transformed into an aromatic N-acylurea moiety 2 following chain elongation. The resin-bound acylurea peptide 2 can be deprotected and cleaved from the resin (e.g., Rink or Wang resins) with TFA. The resulting mildly activated peptide-Nbz (N-acyl-benzimidazolinone) 3 is stable under the acidic conditions used for peptide handling and purification. However, in neutral pH, the fully unprotected acylurea peptides undergo rapid thiolysis, enabling thioester peptide 4 to be generated or in situ during NCL.

Fig. 16
figure 16

Synthetic strategy for Fmoc SPPS of thioester peptides. Following SPPS, aminoanilide 1 undergoes specific acylation and cyclization to yield the resin-bound acylurea peptide 2. Following cleavage and deprotection, peptide-Nbz 3 can undergo thiolysis to yield peptide thioester 4, or direct ligation to yield the native peptide 5 [60]

Thereafter, an unprecedented method for activating a backbone amide in a peptide by formation of a backbone pyroglutamyl imide, which, after displacement by a thiol, provides the peptide thioester, was developed (see Fig. 17) [59].

Fig. 17
figure 17

Strategy for the synthesis of peptide thioesters through backbone amide activation. Pg protecting group, TES triethylsilane, TFA trifluoroacetic acid, R1 amino acid side chain

Other systems that utilize N → S acyl transfer to produce thioesters are reviewed below [66,67,68,69,70,71].

Aimoto reported peptide thioester preparation based on an N → S acyl shift reaction mediated by a thiol ligation auxiliary under acidic conditions [67] (see Fig. 18). They developed the N-4,5-dimethoxy-2-mercaptobenzyl (Dmmb) group as an auxiliary for extended chemical ligation. Ollivier et al. [66] described the synthesis of peptide thioesters based on the alkylation of the safety-catch sulfonamide linker with a protected 2-mercaptoethanol derivative.

Fig. 18
figure 18

4,5-Dimethoxy-2-mercaptobenzyl group-mediated peptide thioester synthesis [67]

Another method was developed based on a peptide carrying a mercaptomethylated proline derivative at the C-terminus and converted to the thioester of 3-mercaptopropionic acid (MPA) by aqueous MPA under microwave irradiation [68] (see Fig. 19). This post-SPPS thioesterification reaction was successfully applied to the synthesis of a glycopeptide thioester. N-alkyl cysteine derivatives were also used for the preparation of peptide thioesters [69, 70]. Tsuda reported N → S acyl transfer-mediated synthesis of peptide thioesters using anilide derivatives [71].

Fig. 19
figure 19

Generating thioester through mercaptomethylated proline derivatives

Aimoto’s group reported the use of an autoactivating C-terminal Cys-Pro ester (CPE) to mediate amide-to-thioester conversion at neutral or slightly basic pH, which is driven by diketopiperazine formation to trap the transiently exposed α-amine of Cys through intramolecular aminolysis of the prolyl ester (see Fig. 20) [72]. Despite an inconvenience in loading the first amino acid to the CPE linker and the need for a relatively reactive glycolic ester at the C-terminus, this method is appealing for its clever design.

Fig. 20
figure 20

Peptide ligation using a CPE as an autoactivating unit

Recently, Dr. Liu group [73] and Dr. Melnyk group [74] showed that a C-terminal BMEA (N,N-bis(2-mercaptoethyl)-amide) peptide can undergo facile intramolecular thiolysis (or N-to-S acyl transfer) to form a thioester (Fig. 21). Although formed transiently, the thioester possesses a lifetime that is sufficient to allow its capture for direct reaction with a cysteinyl peptide at weakly acidic pH. Remarkably, low-power microwave irradiation has been shown to be a viable means to assist BMEA-mediated ligation. The reaction system is applicable to a wide range of C-terminal amino acid residues, pointing to its broad utility. The successful demonstration of the synthesis of histone H3 protein validates the practical value of this BMEA methodology in synthetic protein chemistry. Compared with most existing methods, this system does not require a preactivation step or an intrinsic driving device for the thioester formation reaction, making it one of the simplest and most straightforward Fmoc-based systems for peptide thioester synthesis.

Fig. 21
figure 21

N-to-S acyl transfer using an N,N-bis(2-mercaptoethyl)-substituted tertiary amide to generate a thioester peptide for NCL

The scope of ligation chemistry can be expanded by the use of removable thiol-based auxiliaries. The auxiliaries are linked synthetically to the N-terminal amine of the second peptide. Once the two peptides are ligated together, the removal of the extraneous atoms of the auxiliary gives the final ligation product (as shown in Fig. 22).

Fig. 22
figure 22

Auxiliary-mediated peptide ligation

N α-ethanethiol and N α-oxyethanethiol were the first auxiliaries used for peptide ligation. The coupling between two glycine residues is highly efficient, whereas no ligation is detected between two nonglycyl residues. After ligation, N α-oxyethanethiol is released from the peptide by reduction with zinc under acidic conditions [75]. The second class of auxiliaries is based on N α-2-mercaptobenzylamine. Two peptides are linked together by the help of the auxiliaries in a manner similar to NCL, and acyl transfer occurs via a six-membered ring intermediate. Removal of the auxiliaries can be conducted under acidic conditions [76, 77]. The third class of auxiliaries, N α-(1-phenyl-2-mercaptoethyl) group bearing a nucleophilic alkyl thiol, can enhance transthioesterification during capture step and mediate acyl transfer via a five-membered ring intermediate [78]. In general, the use of auxiliaries leads to increased steric hindrance at the transition state, limiting ligation mainly to the Gly–Gly junction [79].

A new method for the synthesis of cysteine-free glycopeptides using auxiliary-assisted ligation was developed by Brik et al. [80]. In this ligation (see Fig. 23), a glycopeptide is modified at the second residue with a carbohydrate (N-acetyl glucosamine) derivatized with a mercaptoacetate auxiliary. After linking two peptides through thiol–thioester exchange, S → N acyl transfer proceeds to obtain a ligated product with a native peptide backbone. The final product can be obtained by desulfurization [81]. This method can be applied for the total synthesis of the antibacterial glycoprotein diptericin.

Fig. 23
figure 23

Glycopeptide ligation through a side-chain auxiliary

Expressed Protein Ligation (EPL)

As mentioned in NCL, a thioester and a cysteinyl peptide are combined together to form a ligated peptide or protein. However, many proteins are of great size and beyond the reach of total synthesis, such as RNA polymerase, which contains more than 1000 amino acid residues. EPL helps solve this problem. In principle, synthetic and recombinant building blocks can be used for NCL in a semisynthetic manner. Thus, one of the two segments can be acquired by recombinant protein expression, and the other segment can be obtained by SPPS. Notably, non-natural amino acids can be introduced into the synthetic segment.

Both the thioester and cysteinyl segments can be acquired by recombinant protein expression. A cysteinyl protein can be expressed directly from prokaryotic systems after the removal of Met via endogenous methionyl aminopeptidases (MetAPs) [82, 83]. However, the expression of thioester protein is difficult. The idea of thioester protein expression was inspired by protein splicing, a posttranslational process that was first described by Anraku and coworkers in 1990 [84]. Protein splicing involves the excision of an intervening polypeptide (the “intein”) from a precursor protein and the linking of the flanking N-terminal and C-terminal regions (the extein) through a peptide bond, generating an active protein. The mechanism is now relatively well understood (see Fig. 24). Conserved residues are found at both the splice junctions in standard inteins. Amino acids at the N-terminus of the intein and C-extein bear a nucleophilic side chain such as cysteine, serine, or threonine, which can provide a nucleophilic attack on an amide or ester bond. At the C-terminus of intein, asparagine is conserved, which is engaged in spontaneous cyclization reactions that result in deamidation or peptide bond cleavage. The process of protein splicing was elucidated well [85]. The first step involves an N → S or N → O acyl shift between the amide bond of the C-terminus of N-extein and SH/OH group of Cys/Ser residue located at the N-terminus of intein under weakly acidic condition. The side chain of the N-terminus amino acid of C-extein launches a nucleophilic attack on this linear thioester/ester intermediate, giving a branched intermediate by transthioesterification. This branched intermediate is cleaved by the attack of the amido group of the asparagine side chain to the carbonyl carbon. Such cleavage leads to succinimide formation and excision of the intein, as well as formation of a ligated extein intermediate followed by a spontaneous S → N or O → N acyl rearrangement, which links the spliced exteins with a stable peptide bond.

Fig. 24
figure 24

Mechanism of protein splicing

Mutation of the C-terminal asparagine residue within the intein to an alanine residue blocks the final step in protein splicing by preventing succinimide formation [86] (see Fig. 25), whereas the N-to-S acyl shift still occurs between the extein and intein to form a thioester intermediate. The addition of a thiol reagent such as MESNa can induce the thioester exchange, resulting in the formation of a reactive thioester protein, which is the building block for subsequent EPL (Fig. 25). Using the modified intein, Xu and coworkers developed a novel protein preparation and purification system [87], known as IMPACT. In this system, a recombinant intein fusion protein is fused to a chitin binding domain (CBD) from Bacillus circulans. When the target protein-intein-CBD passes through a chitin resin, it is separated from other cell proteins by binding to the resin. Incubation of the complex with a nucleophilic thiol such as R-SH results in the cleavage of the target α-thioester protein. Using the same principle, a thioacid protein can be prepared similarly when the nucleophile is HS– [88]. Recently, this strategy has been successfully used in the synthesis of H3 histone proteins by thioacid-capture ligation [89].

Fig. 25
figure 25

Protocol for expressed thioester protein

As mentioned above, either recombinant protein α-thioesters, thioacids, or cysteinyl protein can be obtained by protein expression technology. Synthetic and recombinant building blocks can be used for NCL and thioacid-capture ligation to generate semisynthetic proteins. Such an approach was first reported in 1998 named as EPL [90, 91]. From then on, an increasing number of large proteins have been studied using EPL. The majority of EPL studies involve modifying at least one native residue within the protein. The non-natural amino acids can be introduced to the synthetic segment (by SPPS), which can be ligated to either the N or C-terminus of the recombinant segment of the protein. However, optimized SPPS approaches allow peptides of up to ~50 residues to be reliably prepared [92]. Thus, if the region of interest is out of the 50 residues of the N or C-terminus of the protein, more ligation strategies, such as sequential peptide ligation, should be used.

Sequential Peptide Ligation

The chemical ligation techniques have significantly expanded the size limit of SPPS. Many biologically active proteins have been successfully prepared by total chemical synthesis. However, the products formed by ligation of two segments are still smaller than majority of natural proteins. Although protein semisynthesis involves the preparation of one segment by recombinant methods to synthesize full-length proteins, the size of the other segment of the synthetic peptide remains a restriction. Considering that typical protein molecules in nature consist of ~300 amino acids, the chemical synthesis of proteins from multiple segments is necessary. Sequential peptide ligations from C-to-N and N-to-C directions are reviewed below.

Sequential Ligation in the C-to-N Direction

At the beginning, most sequential peptide ligations were based on the C-to-N strategy. For example, a 166-aa polypeptide chain of Ras was synthesized by NCL (see Fig. 26) [93]. Ras protein was divided into three unprotected peptide segments. The central peptide thioester segment was S-Acm (acetamidomethyl)-protected at its N-terminal cysteine residue to prevent cyclization or self-ligation. The S-protecting group Acm was removed by mercury acetate (or silver acetate), and the reaction was quenched by mercaptoethanol (or dithiothreitol). Ras protein was synthesized successfully after two NCLs. However, the procedure for the removal of Acm led to the formation of a metal-thiol precipitate, which resulted in a significant loss of material. To solve this problem and decrease the multiple intermediate purification and lyophilization steps, Kent’s lab developed a new method, namely, one-pot ligation [94]. They used the 1,3-thiazolidine-4-carboxo (Thz) group in place of Acm to protect the N-terminal Cys of the central peptide segment. The Thz group can be removed simply by adding methoxyamine·HCl, and the purification of the deprotected cysteinyl thioester peptide was unnecessary.

Fig. 26
figure 26

Strategy for the ligation of multiple fragments: synthetic Ras protein

An alternative method for avoiding the multiple time-consuming purification and lyophilization steps is solid-phase EPL [95]. Using this method, a doubly fluorescently labeled Crk II protein was synthesized by NCL and EPL. The protein was also divided into three segments. The C-terminal segment was immobilized on the support, so the ligation reaction could be driven to completion using excess reagents, which could be simply removed by washing. The central thioester segment was produced in bacteria as an intein fusion protein, and the N-terminal Cys residue was achieved by embedding the residue into the recognition site for a protease factor Xa. This method was an efficient way to produce large proteins.

Another way of sequential ligation was using enzyme to ligate the segments. A prominent example was the enzymatic synthesis of fully functional ribonuclease A from six synthetic fragments by subtiligase, which efficiently ligated esterified peptides in aqueous solution [96].

Sequential Ligation in the N-to-C Direction

Kent’s lab developed a strategy called “kinetically controlled ligation”, which was a highly practical, fully convergent strategy for the synthesis of a target protein, Crambin [97]. The protein was divided into six segments. The first three (left half part) segments were ligated from the C-to-N direction using Thz protecting group for the N-terminal Cys residue of the central segment as described in the last section, and the other three (right half part) segments were ligated from the N-to-C direction utilizing the different reactivities of aryl and alkyl thioesters in NCL. Alkyl thioesters were sufficiently unreactive and did not participate in ligation reactions when competing aryl thioesters were present. They even tolerated the unprotected N-terminal Cys residues without significant amounts of self-ligation, especially for the hindered C-terminal amino acids-(αthioalkylesters). Furthermore, these alkyl thioesters could be activated by adding excess aryl thiols in NCL. The left and right parts were then converged to obtain a full-length Crambin via final NCL (see Fig. 27).

Fig. 27
figure 27

Fully convergent synthesis of the model protein Crambin from six peptide segments [97]

Aimoto’s group reported the use of an autoactivating C-terminal CPE to mediate amide-to-thioester conversion at neutral or slightly basic pH, which was driven by diketopiperazine formation to trap the transiently exposed α-amine of Cys through intramolecular aminolysis of prolyl ester [72]. The sequential ligation could be controlled by the CPE unit (see Fig. 28). When the thiol group in the CPE unit was protected, the peptide A could not be converted to thioester. When the protecting group (4-methoxybenzyl group) in the CPE unit was removed by treatment with 1 mol/L TFMSA in TFA, the peptide A was converted to thioester automatically and then ligated with another cysteinyl peptide B. Thus, this method represented a promising alternative for achieving ligation in the N-to-C direction.

Fig. 28
figure 28

Peptide ligation using a cysteinyl prolyl ester (CPE) as an autoactivating unit

Our lab also developed a ligation method in the N-to-C direction, which combined enzymatic peptide ligation with thioacid-capture ligation [98]. A model peptide was demonstrated (see Fig. 29). Thioacid group was negatively charged and unlikely to be recognized by subtiligase. As a result, a thioacid peptide with a suitable sequence could be ligated to a peptide-ester by the enzymatic catalyst of subtiligase.

Fig. 29
figure 29

Synthesis of a model peptide by three-step sequential chemoenzymatic ligation

Several years ago, successful application of thioacids as the central segments to sequential NCL in the N-to-C direction was achieved. The thioacids could be converted to thioesters by adding aryl disulfides, such as Ellman’s reagent, followed by adding TCEP to reduce excess Ellman’s reagent for subsequent chemical ligation (see Fig. 30) [99].

Fig. 30
figure 30

Sequential NCL utilizing peptide thioacids as N-terminal and middle fragments. TCEP tris(2-carboxyethyl)phosphine

Our research group also developed an alternative method, which allowed the sequential peptide ligation of several segments from the N-to-C direction. The proposed technique was based on different reaction kinetics of BMEA peptide ligation and traditional NCL [100]. The BMEA peptide reacts with cysteinyl peptide slowly at room temperature. Given that the reaction rate of BMEA-mediated ligation is much slower than that of NCL, this kinetic difference can be exploited in a sequential ligation strategy. Based on this difference, we hypothesized that thioester may be ligated with a cysteinyl BMEA peptide at room temperature, with minimal risk of self-ligation or cyclization of the cysteinyl BMEA peptide. The ligated BMEA peptide can then react with another cysteinyl peptide under microwave irradiation. We successfully synthesized a mediated length peptide and a protein ubiquitin using this strategy (Fig. 31).

Fig. 31
figure 31

Strategy for ligation of multiple segments from the N-to-C direction

Conclusions

Chemical synthesis of a peptide or protein is crucial to analyze the structure and function of a protein. We reviewed most of the methodologies of chemical synthesis of proteins in this study. The successful application of peptide or protein drugs would drive the field of chemical protein synthesis to a broad space.