Review

Although all the information necessary for a protein to reach a native structure is contained in its amino acid sequence, in vivo protein folding requires the participation of protein factors; molecular chaperones, folding catalysts and proteases, that monitor or regulate this process through many cellular functions. Under physiological conditions, these factors maintain quality control of protein biosynthesis, and errors or failures of the protein folding process are rare. However, incorrectly folded or misfolded proteins can appear as a result of (i) spontaneous or inducible mutations that affect protein folding pathways, (ii) exposure of the cell to environmental stress, such as elevated temperatures or hyperosmolarity, and (iii) overexpression of recombinant genes. In these cases, the polypeptide chain, instead of folding into the native biologically active state, misfolds, and eventually induces a stress response [1]. The resulting misfolded proteins may either be degraded by proteases, repaired by chaperones, or aggregated and sequestered as inclusion bodies (IBs), when escaping protein folding quality control [2]. In the cytoplasm of Escherichia coli, this control is performed mainly by a set of stress-inducible chaperones and proteases collectively known as heat shock proteins (Hsps). The presence of cellular membrane-bounded compartments in E. coli further complicates these issues of protein biosynthesis, and implies the existence of distinct folding and degradation machineries, quality control systems, for extracytoplasmic proteins [3]. The bacterial periplasm is separated from the extracellular milieu only by the porous outer membrane, and hence is more susceptible to changes in the environment than the cytoplasm. Thus, protein quality control in the periplasm is crucial for the survival of bacterial cells. Here we summarize the main features of protein folding in the periplasm of E. coli, and focus on studies using a model system, the maltose-binding protein.

Biosynthesis of periplasmic proteins

In E. coli, exported proteins with an ultimate destination of the periplasm and outer membrane are synthesized as precursors with a cleavable amino-terminal signal sequence. Depending on the nature of the precursors, different pathways exist for their transport across the inner membrane. Although the translocation of most exported proteins occurs by the general secretory (Sec) pathway [4], across a channel formed by the heterotrimeric membrane complex SecYEG, a subset of precursors, containing two consecutive arginine residues near the N-terminus of the signal sequences, utilizes the twin-arginine translocation (Tat) pathway. The Tat machinery, encoded by the tat ABCE genes, exports precursors that must completely fold in the cytoplasm to bind cofactors [5].

Targeting of precursors to the SecYEG translocase

In marked contrast, the Sec pathway transports proteins which are almost unstructured, as it was confirmed by the recently solved crystal structure of the SecY complex from Methanococcus janansahii [6]. Since SecYEG forms a passive channel, it must be associated with other components to provide the driving force for translocating preprotein substrates. According to their degree of coupling to translation, two alternative modes of protein translocation have been identified. In cotranslational translocation, as the signal sequence emerges from a translating ribosome, it binds to the signal recognition particle (SRP), and, in concert with its membrane-bound receptor FtsY, the nascent polypeptide is targeted to SecYEG. The energy for translocation comes from GTP hydrolysis during translation. In posttranslational translocation, many newly synthesized precursors interact with the chaperone SecB (see below). Then, both SecB and preproteins provide binding sites for SecA, a peripheral membrane bound ATPase, that targets these initial complexes to the translocase. By mediating repeated cycles of ATP binding and hydrolysis, SecA pushes the polypeptide chain through the SecYEG channel [7]. The combined data available suggest that the SRP-dependent pathway is probably mainly involved in the cotranslational assembly of cytoplasmic membrane proteins [8, 9], and that most periplasmic and outer membrane proteins are targeted posttranslationally to the SecA/SecB pathway [9]. However, recent studies indicate that some extracytoplasmic proteins exhibit a combined dependence on SRP/FtsY and SecA [10, 11]. Even though protein translocation in E. coli has been well studied, the molecular mechanisms which allow the discrimination between both pathways converging at the SecYEG translocase remain unknown.

When a nascent chain emerges from the ribosome, a range of folding and targeting factors are waiting to bind the precursor. The first folding helper encountered is probably the trigger factor [12], a ribosome-bound peptidyl-prolyl isomerase [13] which displays chaperone activity in vitro. In addition to trigger factor, the emerging signal sequence is accessible to SRP and SecA [14]. These initial contacts are then modified upon further growth of the nascent chain and binding to SecYEG. In the Sec pathway, during elongation or after release from the ribosome, many, but not all, polypeptide chains bind to SecB, a tetrameric chaperone which stabilizes the precursor in a translocation-competent conformation [15]. The SecB-bound precursors, in cooperation with SecA, are then targeted to SecYEG. Although other cytoplasmic chaperones, like DnaK (Hsp70) and GroEL (Hsp60), may participate in protein export by preventing early folding of precursors [16, 17], these general chaperones have a marginal role in targeting precursors to the Sec translocase. However, when translocation is impaired, by slowly exported proteins or defective signal peptides for example, precursors could accumulate in the cytoplasm. Structurally, a precursor in an export-competent conformation may be similar to an unfolded protein. Indeed, Wild et al. [17] showed that accumulation of precursors in strains lacking SecB induces a stress response mediated by the specific sigma factor, σ32. Under normal conditions, DnaK and DnaJ sequester σ32 through direct binding, inhibiting its association with RNA polymerase [18]. Accumulation of unfolded precursors displaces this complex and allows σ32 binding to RNA polymerase to direct transcription of heat shock genes. It has been suggested that the final distribution of the precursors among the different pathways is determined by a kinetic partitioning that is dependent on the rate of folding, relative to the rate of chaperone binding [19]. Finally, how these different factors compete will determine whether initiation of translocation occurs efficiently (Figure 1).

Figure 1
figure 1

Alternative folding pathways for periplasmic protein in E. coli. This scheme illustrates the present discussion and tries to emphasize the alternative and competing process between folding and misfolding of periplasmic proteins before and after translocation across the cytoplasmic membrane.

In the next step, the precursor inserts as a loop into the SecYEG channel, with its signal sequence intercalated between two transmembrane helices of SecY, and with its mature region in the pore. The polypeptide chain is then transported through the pore, and the signal sequence is cleaved at some point during translocation by the signal peptidase, a membrane-bound endopeptidase with its catalytic domain facing the periplasm [20].

Protein folding in the periplasm

Once they reach the periplasm and after signal sequence cleavage, periplasmic proteins may encounter two types of protein folding catalysts: protein disulphide isomerases (Dsb proteins), which catalyse disulfide-bond formation, and peptidyl-prolyl isomerases (PPIase), which catalyse cis-trans isomerization of peptidyl bonds. These enzymes are known to catalyze rate-limiting steps in the in vitro protein folding process.

The formation of disulfide bonds is essential for the oxidative folding, activity, and stability of many proteins exported from the cytoplasm. Disulfide bonds formation results from elaborate electron transfer pathways between the Dsb oxydoreductases [21]. DsbA is the strongest thiol oxidant which catalyses oxidation of cysteines of folding proteins. This reaction occurs via formation of mixed disulfide complexes between DsbA and its substrates. DsbB, a membrane-bound enzyme, maintains DsbA oxidized by transferring electrons from reduced DsbA to quinones in the respiratory chain [22]. It has become clear that disulfide bond formation, far from being perfect, occurs between incorrect pairs of cysteines resulting in the presence of non-native disulfides in proteins. As periplasmic thiol isomerases, DsbC and DsbG promote the rearrangements of incorrect disulfide bonds. The maintenance of these thiol isomerases in the reduced state depends on the inner membrane protein DsbD, which utilizes the thioredoxin/thioredoxin reductase system in the cytoplasm as a source of reducing equivalents [23]. Recently, covalent intermediates between DsbA and exported substrate proteins were detected in vivo by using a DsbA variant carrying the Pro151 → Thr substitution in its active site [24]. This study opens the way to understand at what point during protein translocation and folding, disulfide bond formation takes place in the periplasm.

Representatives of the three unrelated families of PPIases have been identified in the periplasm: PpiA (also known as CypA or RotA) is a cyclophilin that is not inhibited by cyclosporin A [25]; FkpA is related to the FK506-binding protein family, or FKBP, [26]; and PpiD [27] and SurA [28], two parvulin homologs. Although all PPIases catalyze the rate-limiting step in the refolding of RNase T1 [29]in vitro, their cellular role remains enigmatic. With the exception of surA, the absence of a significant phenotype for null mutants indicates that these periplasmic PPIases are not essential for viability or have overlapping functions.

Aside from specific chaperones, such as PapD involved in pilus assembly [30], few periplasmic chaperones have been identified, and there are no classical Hsp chaperones, such as DnaK or GroEL, in this compartment. Indeed, as the periplasm lacks ATP, periplasmic chaperones must be mechanistically distinct from their cytoplasmic counterparts, most of which use ATP to drive their cycles of substrate binding and release. However, global searches using genetic selections based on σE activity (see below) have identified the surA and skp/ompH genes [28, 31], which are involved in the folding and assembly of outer membrane proteins [32]. Furthermore, in vitro chaperone activity of several periplasmic proteins such as DsbC [33] and DsbG [34] or substrate-binding proteins [35, 36] have already been reported, but their contribution has not yet been studied in vivo. It should be noted that studying the in vivo function of a putative chaperone is more difficult than assessing its activity in vitro, using a classical chaperone substrate. However, it is tempting to speculate from these observations that, because of their unique active sites, protein folding catalysts can perform chaperone functions in order to compensate for the apparent deficiency of general chaperones in the periplasm. Indeed, FkpA and SurA exhibit PPIase-independent chaperone activity in vivo [37, 38]. Although these two PPIases are unrelated, their crystal structures share a similar organization: a chaperone activity residing in the N-terminal domains and the C-terminal catalytic domains tethered ~30Å apart, connected either by a long α helix [39], or an extended polypeptide [40]. In other foldases that exhibit both chaperone and protein folding catalytic activities, these two functions have also been assigned to distinct regions or domains of the molecule. For instance, the trigger factor comprises three domains, with the central FKBP-like domain displaying PPIase activity and the C-terminal domain being implicated in chaperone activity [41]. The structures of DsbC and FkpA present also further striking similarities. Both enzymes are homodimeric, with subunits divided into two domains where the N-domains form the dimeric interfaces. The C-domains, carrying the catalytic sites, are connected to the N-domains by a pliable helix, conferring a V-shaped form on the dimers, and orienting the two catalytic sites face inwards towards each other. The cleft formed in the DsbC dimer by association of the N-domains is hydrophobic in character, and it was proposed that this region forms the binding site of the unfolded polypeptide for the chaperone activity, sequestering the substrate from aggregation and, at the same time, giving it access to the catalytic sites to facilitate disulfide bond exchange [42]. In FkpA, the cleft formed by association of the N-domains has approximately the same dimensions to that formed in DsbC. In contrast to the hydrophobic nature of the DsbC cleft, that of FkpA has an intermediate negative potential, reflecting probably a difference in their substrate specificity. However, it is an attractive idea that the V-shaped form of these foldases cradles the unfolded polypeptide substrates while giving access to the isomerase catalytic sites. The innate flexibility of the C-domains would allow for adaptability between the chaperone and isomerase activities.

An alternative fate for misfolded periplasmic proteins is degradation by proteases (Figure 1). Although more than 10 periplasmic proteases have been identified in E. coli, DegP (also known as HtrA or Do) is the only protease identified as a heat shock protein involved in the degradation of misfolded proteins [43, 44]. Indeed, degP, which is essential for the survival of bacteria above 40°C, is under the transcriptional regulation of the alternative heat shock sigma factor, σE [45]. The reason for this essential role is not clear, but presumably an increased number of misfolded proteins accumulate under these conditions. DegP is a widely conserved serine protease composed of an N-terminal domain believed to have regulatory functions, a trypsin-like protease domain, and two PDZ domains [46]. PDZ domains are protein modules that mediate specific protein-protein interactions. The crystal structure of DegP reveals a staggered association of two trimeric rings forming a hexamer [47]. The proteolytic sites are located in a central cavity that is accessible only laterally from the PDZ domains, which probably recognize and bind substrates. It has been shown that the activity of DegP could switch between chaperone and protease activities in a temperature-dependent manner [48]. At temperatures below 28°C, DegP acts as a chaperone, protecting misfolded proteins from irreversible aggregation, and above 28°C, its protease activity increases dramatically degrading misfolded proteins. However, in vivo DegP which does not act as a general chaperone in the periplasm [49], is the major proteolytic machine responsible for periplasmic protein quality control.

Extracytoplasmic stress response

Because optimal cellular growth requires that the cell be able to sense and respond to changes in subcellular compartments, it is not surprising that the stress response in Gram-negative bacteria is compartmentalized into cytoplasmic and extracytoplasmic responses. A common feature of inducers of the E. coli extracytoplasmic stress response is their potential to generate excessive amounts of misfolded proteins in the envelope, particularly outer membrane proteins, and many of the proteins involved in periplasmic protein quality control have been characterized from studies on the extracytoplasmic stress responses.

In contrast to cytoplasmic stress, where the sensing of misfolded proteins and the accompanying response take place in the same compartment, extracytoplasmic stress signals must cross the cytoplasmic membrane by a signal transduction system. E. coli senses and responds to extracytoplasmic stress via at least two overlapping, but distinct, transduction pathways, the Cpx two-component system and the heat shock σE pathway (Figure 2). Both regulatory systems control the expression of several genes whose products are envelope-localized protein folding catalysts, chaperones and proteases, as well as genes involved in lipid and lipopolysaccharide metabolism. Together, these proteins serve to ensure proper biogenesis of the bacterial envelope by preventing any perturbation in periplasmic protein folding [50].

Figure 2
figure 2

Two signaling pathways for extracytoplasmic stress responses in E. coli. Components of these pathways reside on each side of the cytoplasmic membrane (periplasm and cytoplasm). In the presence of misfolded periplasmic proteins, an unknown mechanism triggers a conformational change of the CpxA that leads to an activation of its kinase activity and promote an increase in CpxR phosphorylation. Under normal physiological conditions, σE is sequestred by RseA and DegS is inactive, with its PDZ domain inhibiting the protease activity. Recognition by the PDZ domain of misfolded periplasmic proteins relieves inhibition of the DegS protease. Activated DegS cleaves RseA, which releases σE. For both signal transduction systems, a partial list of the downstream targets is indicated.

The Cpx pathway is a typical two-component signal transduction system, consisting of the hisitidine kinase CpxA and the response regulator CpxR [51]. The input signal is transduced across the cytoplasmic membrane by the sensor kinase CpxA, which is composed of a periplasmic domain and a conserved cytoplasmic signaling domain separated by two transmembrane α-helices [52]. Upon detection of envelope stress, CpxA autophosphorylates by using ATP at a conserved histidine and then transfers this phosphate to a conserved aspartate in the N-terminal domain of CpxR. Phosphorylated CpxR activates transcription of genes whose products are involved in envelope physiology. The Cpx pathway is induced by elevated pH [53], altered inner membrane composition [54], and overproduction of envelope proteins like the outer membrane lipoprotein NlpE [55] or pili subunits [56, 57]. The Cpx regulon is positively autoregulated and is feedback inhibited by CpxP, a small periplasmic protein [58]. The response seems to be modulated by direct interaction of the periplasmic domain of CpxA, probably in concert with CpxP [59]. Some of the targets regulated by CpxR are the heat shock protease DegP [60], the peptidyl prolyl cis/trans isomerases PpiA [61] and PpiD [27], and the disulfide oxidoreductase DsbA [61]. Apart from the extracytoplasmic stress response the Cpx system is implicated in conjugation [62], invasion of host cells [63], and biofilm formation [64, 65].

The σE envelope stress response is induced by heat shock [66, 67] and by perturbations to outer membrane protein folding [31, 68]. Under nonstress conditions, the activity of σE is negatively regulated by the inner membrane anti sigma factor RseA and by the periplasmic protein RseB [6971]. In response to extracytoplasmic stress, RseA is rapidly degraded by the sequential action of DegS [72] and YaeL/EcfE proteases [73] that frees σE to associate with RNA polymerase and directs the transcription of its regulon. Recently, it was shown that peptides mimicking the C-terminal sequence of outer membrane proteins bind the DegS PDZ domain, activate DegS cleavage of RseA, and induce σE-dependent transcription [74]. These data suggest that DegS acts as a sensor of extracytoplasmic stress by binding unassembled outer membrane proteins, and that DegS activation involves relief of inhibitory interactions between its PDZ and protease domain. However, RseB, which interacts with misfolded periplasmic proteins [71], might influence the accessibility of RseA to cleavage by DegS. At least 20 genes were identified to be regulated by σE [75] which include rpoH, degP [45], yaeL/ecfE [76], and fkpA [77].

Focus on the maltose-binding protein

To study protein folding in the periplasm of E. coli, we use a model system, the maltose-binding protein (MalE or MBP). MalE is the soluble receptor for the high affinity transport of maltose and maltodextrins into the periplasm [78]. Because of its particular function in maltose transport, correct export and folding of MalE in this cellular compartment is essential for cells to utilize maltose as a carbon source [79]. This feature has facilitated the development of powerful genetic selections or of screens which led to the discovery of several genes composing the Sec pathway [80]. MalE has also been extensively used as a model of protein translocation [81]. Translocation of MalE requires that the nascent preMalE polypeptide chain reach a critical size corresponding to 80 % of the final length, and that the precursor exist in an export-competent conformation representing a partially unfolded state [82]. SecB binding mainly participates in the maintenance of this initial conformation. In addition to this chaperone activity, SecB is an important targeting factor for MalE. Indeed, when the signal sequence is deleted, translocation of MalE becomes entirely SecB dependent [83]. Then, MalE is translocated across the SecYEG channel, released into the periplasm after signal sequence cleavage, and the newly exported MalE folds into its native structure. The crystal structure of MalE consists of two discontinuous domains constructed from secondary structural βαβ units surrounding a cleft that forms the binding site for maltose [84]. These properties make MalE an ideal model with which a direct comparison of the mechanism of in vitro and in vivo folding is greatly facilitated.

MalE31, a model for periplasmic protein misfolding

For many years, and well before the resolution of its 3D-structure, the laboratory of M. Hofnung studied the MalE structure-function relationships. One genetic approach, using random insertion of a BamH I oligonucleotide linker, led to the identification of the so-called 'permissive sites' mutants of malE in which insertions or deletions were structurally tolerated without loss of biological properties [85]. In contrast, other random insertions in malE prevented both growth on maltose and periplasmic release of the corresponding proteins [86]. In many cases, a large deletion of several amino acid residues from the mature region of the protein could explain their defective phenotypes. However, in one case, malE31, the linker insertion did not modify the length of the protein and resulted in Gly32 → Asp and Ile33 → Pro substitutions. Structurally, these residues are located in a turn connecting the helix αI to strand βB of the first βαβ element within the N-domain (Figure 3A). The Ile33 → Pro substitution was identified as the major factor responsible for the Mal- phenotype [87]. Indeed, cells expressing a malE variant carrying the Gly32 → Asp substitution displayed biological properties similar to those of cells expressing wild-type malE. The subcellular location of these mutants revealed that MalE31 was entirely recovered in the membrane fraction. This result suggested that export of MalE31 was blocked somewhere in the membrane or, alternatively, that MalE31 formed inclusion bodies (IBs) either in the cytoplasm from the precursor, before translocation, or in the periplasm, from the mature protein, after release from the translocase. It is technically very difficult to rigorously differentiate between these cellular fates by cell-fractionation experiments only. Thus, to determine whether insoluble MalE31 which co-sedimented with membranes was consistent with aggregated proteins rather than membrane-associated proteins, a crude lysate was fractionated by flotation gradient centrifugation, and revealed that MalE31 remained near the bottom [88]. Although cell fractionation from spheroplasts is less sensitive to artifacts than the osmotic shock procedure, both methods cannot discriminate between cellular location (cytoplasmic or periplasmic) of IBs. However, processing of the precursors, a late step of export (see above), is a reliable indicator that the protein has, at least partially, crossed the inner membrane. Pulse-chase experiments showed that the MalE31 precursor was efficiently processed at the same rate as the wild-type protein and that no export interference or jamming was observed upon overexpression of malE31. Finally, transmission electron micrographs of cells overproducing MalE31 revealed the presence of small IBs between the inner and outer membranes (Figure 3B). Based on these data, it became clear that the mature MalE31 protein aggregates in the periplasm. As other aggregation-prone proteins, MalE31 could be purified from IBs, after a urea-solubilization step, and the renaturared protein in vitro exhibited complete maltose-binding activity. Thus, we concluded that the modified αβ turn, which is distant from the binding site, did not perturb the function but rather the periplasmic folding of MalE. Since these initial characterizations, several groups have used MalE31 as a protein model for cellular protein misfolding [71] or aggregation [89].

Figure 3
figure 3

In vitro and in vivo fates for MalE31. A- The maltose-bound state of MalE31. The helix α1 and strand βB of the N-domain (MalE31), colored in magenta, and the helix αVII and strand βJ of the C-domain (MalE219), colored in blue, are located in the crystal structure of MalE. For both turns, the Gly residue is shown in yellow. The bound maltose substrate in black is located in a cleft between both domains. B- Periplasmic inclusion bodies of MalE31. Transmission electron micrograph of ultrathin sections of E. coli overproducing malE31 at 30°C. The black arrows indicate the position of periplasmic inclusion bodies.

Compartmentalized cellular stress responses induced by malE31 mutants

To examine the alternative and competitive export and folding processes in the cytoplasm, we analyzed the cellular fates of MalE31 carrying either a wild-type or defective signal sequences [90]. Since we never observed MalE31 misfolding in the cytoplasm when it was expressed with its wild-type signal sequence, we concluded that the MalE31 precursor enters rapidly into the Sec pathway. The mutational deletion or substitution in signal sequences has been shown to strongly decrease the efficiency of MalE export [91]. The presence of these altered signal sequences in combination with the defective folding malE31 mutation resulted in the cytoplasmic accumulation of precursors. In the steady state, the cytoplasmic soluble fraction of MalE31, when expressed without a signal sequence (69%), is higher than the periplasmic soluble fraction of MalE31, when expressed with a wild-type signal sequence (5%). In the former case, the increased level of cytoplasmic chaperones (Dnak and GroEL), resulting from induction of the σ32 stress response, mainly suppresses the aggregation of MalE31 in the cytoplasm. Although overproduction of MalE31, carrying a wild-type signal sequence, induced an extracytoplasmic stress, the increased level of periplasmic Hsps failed to prevent the aggregation of MalE31. Thus, it appeared that there are insufficient or no general heat shock chaperones in the periplasm. Furthermore, we showed that aggregation of MalE31 in the periplasm did not induce a stress response via σ32 as it did when expressed with defective signal sequences, confirming that the stress responses induced by the presence of misfolded proteins in E. coli are compartmentalized and controlled by different pathways [90]. Technically, the determination of stress responses in E. coli is facilitated by using transcriptional gene fusions and allows the rapid discrimination of cellular location of the protein misfolding.

One physiological consequence for the cells overproducing MalE31 was to induce an extracytoplasmic stress response by increasing the expression of the heat shock protease DegP. Because the Cpx and σE regulatory systems both respond to the presence of misfolded exported proteins, we determined which pathway could be more specifically affected by MalE31. Therefore, by using two different transcriptional fusions which are independently regulated, either by the σE factor (P3rpoH-lacZ) or by CpxR (PcpxP-lacZ), we showed that overproduction of MalE31 induced the Cpx two-component system [49]. Because, the exact nature of inducers that activate this signaling pathway remained unclear, we examined whether the presence of periplasmic inclusion bodies is required. Interestingly, the Cpx system was also induced by the production of the destabilized MalE219 variant, which remains entirely soluble and functional in the periplasm (see below). This observation was further confirmed by producing truncated variants of MalE that did not accumulate at steady states [49]. It seems that the signal sensed by the Cpx pathway will be the presence of periplasmic unfolded or misfolded proteins rather than envelope alterations resulting from the formation of IBs.

Structure-dependence of the malE31 mutation

Despite a synergistic effect, the single Ile33 → Pro substitution was the major factor responsible for MalE31 aggregation. Thus, introduction of a proline residue in the first αβ turn of the N-domain could introduce either a conformational strain in the folded state or reduced conformational freedom of the polypeptide backbone in a critical folding intermediate state, because of the limited value of its dihedral angles. To probe genetically the structural role of this surface turn, we generated a library of random mutations at codons 32 and 33 of malE, and defective folding variants of MalE were screened [88]. Although many combinations of amino acid pairs replaced the native residues at both positions, underlining the passive role of this turn in determining the folded structure of MalE, the majority of critical folding substitutions, contained a proline residue at both positions. A second critical pair of residues involved the association of bulky hydrophobic residues (Phe, Leu, or Val) at position 32 and basic residues (Lys or Arg) at position 33. Close examination of the wild-type MalE crystal structure suggested that the combined negative effects of decreased conformational flexibility and exposed large hydrophobic side-chain residues at position 32, and a buried charged side-chain residue in the hydrophobic position 33, might create these defects. Since this study, the crystal structure of MalE31 was determined at 1.85Å resolution, and provided experimental evidence that MalE31 can attain a wild-type folded conformation in vitro and supports a model in which the structural effect of the malE31 mutation is exerted at the level of folding intermediates, rather than of the native conformation [92].

The secondary and tertiary context-dependence of the malE31 mutation was also assessed by probing the tolerance of an equivalent turn of the C-domain to the same double amino acid substitution. Among the four αβ turns of the MalE structure, we selected the turn connecting the helix αVII to strand βJ of the C-domain, and the corresponding Gly220 → Asp and Glu221 → Pro substitutions were introduced by site-directed mutagenesis [93], defining the MalE219 variant (Figure 3A). In contrast to MalE31, this modified-turn MalE219 variant was correctly folded and functionally active in the bacterial periplasm. Furthermore, in vitro unfolding/refolding experiments showed that both turn MalE variants were destabilized, but their intrinsic tendency to aggregate, correlated well with their periplasmic fates in E. coli, and arose from differences in refolding rates allowing long-term persisting folding intermediates of MalE31, probably kinetically trapped in an off-pathway folding reaction, to aggregate. These data supported the notion that the formation of this βαβ supersecondary structure of the N-domain is a rate-limiting step in the folding pathway of MalE. Indeed, previous intragenic suppressive mutations of export-defective signal peptides of MalE were identified in this structural element, and the analysis of refolding properties of the corresponding variants resulted in the same conclusion [94]. Together, these results argued that the information directing the polypeptide chain to the misfolding pathway is not only determined by a local amino acid sequence (the primary structure), but also by the tertiary structure surrounding the critical MalE31 substitution.

Cellular factors influencing inclusion body formation of MalE31

In a following series of experiments, we manipulated two cellular functions affecting protein folding pathways; the promoter activity of malE31 that modulates the rate of protein biosynthesis and the state of periplasmic DegP protease which determines cellular fates of misfolded MalE31 [95].

When expressed from its own promoter, the cellular level of MalE31 is controlled by the transcriptional activator MalT. Depending on the presence of either the wild type or constitutive MalT variant (MalTC), the production of MalE31 varied from a very low level to a high and constitutive level in the host cells. We took advantage of this regulation to modulate the biosynthetic rate of MalE31, and observed that at a low level of expression, MalE31 was rapidly degraded by the DegP protease, and that at a high level of expression, the misfolded protein formed IBs [95]. Surprisingly, the steaty-state level of soluble MalE31 was only dictated by the ratio of rate constants governing the misfolding and folding steps, and was independent on the promoter activity. The high production level of MalE31 in the malTC context had two important consequences. First, increased amounts of aggregating MalE31 species increases the formation of IBs, the compact nature of which renders them resistant to proteases. Second, although overproduction of MalE31 increased the rate of DegP synthesis via the Cpx system [49], the majority of that increased amount of protease was localized in the insoluble fraction. Because DegP is intimately involved in the degradation of misfolded proteins in the periplasm, it was conceivable that DegP co-aggregated with misfolded MalE31 during the formation of IBs. All these data were quantified and analyzed by numerical simulations [95] on the working model presented in Figure 4. Because the kinetics of signal sequence processing of precursors were unaffected by MalE31 overproduction, export to the periplasm does not constitute a rate-limiting step for the periplasmic folding pathway of MalE31. The extent of aggregation is determined by both the rate of protein biosynthesis, which is related for MalE31 to the activity of its gene promoter, and protein folding or misfolding rates. At a high level of expression, second order aggregation dominates over first order folding. The fate of misfolded proteins depends obviously on the existence of cellular proteases. At the misfolding stage a second kinetic competition between degradation and aggregation might occur, and a further increase in the rate of expression (as in the malTC background) will favor the aggregation reaction. At the time this model was proposed, we speculated that periplasmic chaperones and foldases might act directly on some kinetic partitioning steps.

Figure 4
figure 4

Kinetic competition between folding, degradation and aggregation in the periplasm. The model represents the export (zero order rate constant, kexport), folding (first order rate constant, kfold), degradation (first order rate constant, kdeg), and aggregation reactions (second order rate constant, kagg). In the postulated scheme, the newly exported protein is assumed to have two alternative pathways, either it correctly folds to give the folded protein or it proceeds via a side reaction leading to a misfolded protein. Once again, a kinetic competition is thought to occur between degradation and aggregation.

FkpA suppresses the formation of MalE31 inclusion bodies

The potential role of the three known periplasmic PPIases was studied both in vivo, by overproducing them, and in vitro by recording the refolding kinetics of MalE31 in the presence of purified PPIases. Among these folding catalysts, only FkpA prevented the periplasmic aggregation of MalE31 [37]. Using active site and deletion variants, we demonstrated that the chaperone activity of FkpA was independent of its prolyl isomerase catalytic activity, but requires an intact dimerization N-domain [39]. In vitro, the presence of FkpA at stoichiometric amounts in the refolding buffer did not change the rate of the slow refolding phase of MalE31, but increased its relative amplitude as a chaperone does in general [37]. These results suggested that the ability of FkpA to increase the yield of soluble MalE31 can be explained by the blocking of the initial misfolding step through its binding to newly translocated proteins rather than to misfolded proteins which leads to the second competition between aggregation and degradation, as shown in Figure 4. Indeed, in the absence or in the presence of FkpA, the steady state level of MalE31 remained unchanged, and any modification in the balance between the relative rate of aggregation and degradation would lead to different levels, as we observed with the protease DegP. Thus, FkpA would bind newly exported MalE31 proteins in the periplasm, consistent with the definition of 'holder' chaperones, which can prevent the aggregation of unfolded protein without mediating protein reactivation.

Temperature effect on MalE31 inclusion body formation

Generally, the extent of aggregation is greater at higher temperature due to the strong temperature-dependence of the hydrophobic interactions, which dominate protein aggregation. Although MalE31 aggregated at 30°C, the formation of periplasmic IBs did not affect bacterial growth in rich medium [87]. In contrast, at 37°C, accumulation of IBs became very toxic [49]. Surprisingly, at 42°C, IBs did not accumulate and bacterial growth was restored. To explain this toxicity we hypothesized that some periplasmic or outer membrane proteins, which are essential for envelope biogenesis or cell division, could be incorporated into IBs and therefore lost to the cells. Indeed, DegP [95] and RseB [71] have previously been found trapped with MalE31 aggregates at 30°C. Under heat shock conditions no IBs were detected, thus we examined whether a temperature shift up could suppress the growth defect by growing cells at 37°C and then shifting them to 42°C. Interestingly, after transfer to 42°C the bacterial growth resumed, while the turbidity of cells left at 37°C decreased. We showed that the suppression of MalE31 toxicity after the temperature upshift resulted in the degradation of pre-existing IBs accumulated at 37°C. The only clearly documented heat shock protease which degrades misfoded periplasmic proteins is DegP. The synthesis of DegP at 42°C, as for other σE regulon members, is activated and its proteolytic activity also significantly increases with rising temperature [96]. These cumulative effects could contribute to a further increase of proteolytic activity in the periplasm at 42°C, and the degradation of MalE31 would be favored over aggregation in the model (Figure 4). In addition, because the steady-state level of wild-type MalE was slightly lower at 42°C [49], a further decrease in the rate of MalE31 synthesis, at this temperature, will also facilitate the degradation reaction. However, it has been reported that highly aggregated proteins were not proteolyzed by DegP in vitro [97], and we do not know if DegP is able to degrade IBs. Alternatively, we have no evidence that misfolded folding intermediates of MalE31 can dissociate from IBs and then be degraded by DegP. Recent studies have revealed an unexpected dynamic process of IBs [98], from which aggregated proteins are steadily released or disaggregated to be further refolded or degraded by intracellular proteases [99]. Nevertheless, it is clear that under heat shock conditions, the progressive clearing of periplasmic IBs eliminated toxicity and restored bacterial growth.

Perspectives

Protein folding is one of the great unsolved problems of modern molecular biology. Not only is this question intrinsically fascinating, but it is closely tied to a wide range of applications and debilitating diseases, which are of critical importance in both biotechnology and molecular medicine. Most proteins that require chaperones to fold efficiently are poor folding models for biophysical experiments and, conversely, proteins suitable for in vitro folding studies do not require chaperone assistance. The complete characterization of a protein model, as MalE, and the availability of both in vivo and in vitro approaches, complementing each other, make it a particularly attractive system for mechanistic studies of this central issue. Our understanding of a number of cellular factors relevant to protein folding and assembly has increased enormously during the past five years, and quite remarkable progress has been made in the applications of this knowledge.

Although many alternative organisms and expression systems are now being developed, the most widely used host for high-level expression of recombinant proteins remains E. coli. Many important proteins for therapeutic or diagnostic applications including secreted enzymes contain disulfide bonds that are required for correct folding. Thus, studies on periplasmic protein folding have significant ramifications for biotechnology. However, many of these proteins are produced in biologically inactive and aggregated forms (IBs). Although solubilization and refolding procedures exist to obtain functionally soluble proteins, as well demonstrated with MalE31, they are lengthy processes, and not generalisable. For these proteins, modification of the amino acid sequences by mutagenesis, the folding code, may be one way to isolate a soluble form, and we can envision that genetic screens or selections for the rapid identification of correctly folded proteins expressed in the bacterial periplasm will be developed.

Other potential applications include the use of envelope folding factors as targets for the development of new antimicrobial drugs. Indeed, several proteins involved in periplasmic protein quality control are either essential or play an important role in the virulence of many pathogenic Gram-negative bacteria.