Introduction

The serpin superfamily, with members in every phylogenetic kingdom, underwent a marked expansion in diversity following the plant-animal split1,2,3. The majority of serpins inhibit serine proteases, although some have been found to target classes of cysteine protease4,5,6 or perform alternative roles such as hormone transport, as chaperones and as tumour suppressors7,8. In contrast to most proteins9, the serpin native conformation is a kinetically-trapped, thermodynamically metastable folding intermediate. This characteristic is intimately associated with the mechanism of inhibition10, the essence of which is a transition of a central 5-stranded β-sheet (denoted ‘β-sheet A’) to a thermodynamically-preferred 6-stranded conformation. The serpin fold (Fig. 1A) is highly conserved despite this functional diversity, with structural deviations largely restricted to extensions at the N- and C- termini and in a defined inter-helical region1. The unadorned core structure is epitomised by one of the most studied serpins, α1-antitrypsin (α1-AT).

Figure 1
figure 1

Properties of a conserpin variant. (A) The serpin fold is comprised of three β-sheets and eight or nine α-helices, illustrated using a cartoon representation of native conserpin (PDB 5CDX)34. Connecting β-strand 5 A and β-strand 1 C is the solvent exposed reactive centre loop (RCL) which dictates the target protease specificity of inhibitory serpins (unresolved in the structure and therefore denoted by a dashed line). Upon proteolytic cleavage, the RCL becomes a 6th central strand of the normally 5-stranded β-sheet A. The positions of the tryptophan residues mutated in this study are indicated. (B) Upper panel: Radial plot of the average sequence identity between the consensus sequence and members of individual serpin phylogenetic clades1, calculated using all alignment positions (white central area), or the 75%, 50%, 25% and 10% lowest variability sites (shaded from light grey to dark grey). Middle panel: Reciprocal RMSD of native conserpin (PDB 5CDX) compared with the common core of representative structures from different phylogenetic clades. Lower panel: A comparison between latent conserpin (PDB 5CDZ) and representative cleaved/latent structures. (C) Left: SDS-PAGE (10% w/v, see full gels in Supplementary Fig. S1) section of cAT after 20 hours of auto-induction, visualised using Coomassie Blue stain. Right: Western blot of the corresponding SDS-PAGE, developed using an anti-His6 antibody. Lane MM, molecular weight markers; I, insoluble fraction; S, soluble lysate fraction. cAT is indicated by the arrow. (D) Far-UV spectrum of cAT (black) compared with α1-AT (grey) at a concentration of 0.2 mg mL−1, using a path length of 0.1 cm.

Like most mesophillic serpins, α1-antitrypsin follows a three-state folding mechanism that proceeds from an unfolded state via an intermediate to a fully folded form11. The intermediate is of great interest, as it has been proposed to be the aggregating species central to the pathology of a number of hereditary conditions involving serpin deficiency12,13. Mutations, or environmental conditions, that increase the intermediate population by perturbing the energy landscape increase the formation of non-functional serpin polymers12,14,15,16. Conversely, minimal population of this intermediate has been observed in serpins produced by hyperthermophiles17,18, and is proposed as a mechanism whereby folding is possible, and activity is maintained, under destabilising conditions. The deficiency in functional α1-AT resulting from mutations such as Z, MMalton or Siiyama is associated with neonatal hepatitis, liver disease, hepatocellular carcinoma and emphysema. Other serpins important to human physiology, such as antithrombin III, α1-antichymotrypsin, C1-inhibitor and plasminogen activator inhibitor-1, show a similar vulnerability to mutation19.

The formation of serpin polymers can be reduced under experimental conditions by stabilisation of the native state of the protein. This has been achieved by the rational introduction of point mutations to fill in surface cavities of α1-AT20, disulphide constraints of structural elements involved in conformational change13,21,22,23, the addition of osmolytes23,24,25 and random mutagenesis26. However, serpin function is intimately associated with a finely-tuned native state instability that is distributed throughout the molecule23,27, a context in which some, but not all, stabilising mutations compromise inhibitory activity28.

A further strategy that has been employed to increase the thermodynamic stability of a protein is the consensus approach to rational protein design. This technique identifies residues in a native protein that are putatively the most compatible with a given fold based on conservation between members of a protein family. Mutations have systematically been identified that are compatible with function and enhance stability for a range of proteins with include DNA binding proteins29, antibodies30,31, leucine rich repeat proteins32, and enzymes33. By extending this approach to the whole protein rather than selected residues, conserpin, an entirely artificial serpin, has recently been engineered34 with a sequence that reflects the most frequently observed residue at each site in an alignment of eukaroytic serpins1. This protein was found to adopt the canonical native, metastable conformation and showed a remarkable stability to inactivation by heat, an atypical fully reversible unfolding pathway, and an absence of polymer formation. Structural comparison with other eukaryotic serpins revealed features proposed to contribute to its stability, including stabilising interactions around α-helix D and the β-sheet B/C barrel, an extended electrostatic network in the ‘breach’ region, and tight packing in the hydrophobic core and at the β-sheet A/helix F interface. While conserpin showed a two-state unfolding profile atypical of serpins by intrinsic tryptophan fluorescence, the presence of folding intermediates was inferred from kinetic refolding experiments and 4,4′-dianilino-1,1′-binaphthyl-5,5′-disulfonic acid (bis-ANS) binding34.

Here we report the design, production and folding properties of a conserpin variant that incorporates the specificity-conferring reactive centre loop (RCL) residues of the disease-associated serpin, α1-AT. In common with its progenitor the resulting synthetic protein, cAT (consensus α1-AT), exhibited high thermal stability and reversible folding. We exploited the three tryptophans present in conserpin by generating single-tryptophan variants to probe unfolding in the proximity of helix F, the ‘breach’ at the top of β-sheet A, and helix H. These data, with the additional use of circular dichroism as a probe of change in secondary structure, extend the previous study by demonstrating recruitment of the whole molecule during folding. In combination with size-exclusion chromatography, we provide direct evidence of the presence of folding intermediate states inferred previously from kinetic data. We find the altered RCL favours inactivation by oligomerisation over transition to the latent form as observed for the parental protein, and utilise this property to assess the kinetic stability of the native state. Collectively, the data reflect a molecule with a highly efficient, concerted folding mechanism that minimises the accumulation of an aggregation-prone intermediate species as a result of a pronounced kinetic stability.

Results

Engineering a conserpin variant with α1-AT RCL residues

The design of conserpin has been described in detail previously34. The consensus sequence it embodies, when placed in a phylogenetic context1, has a slightly higher similarity to α1-AT-like, intracellular, and antithombin clades A-C over other subfamilies (Fig. 1B, upper panel). This is also reflected by a comparison between the conserpin structure34 and other serpins in the native conformation (Fig. 1B, middle panel) but not between loop-inserted conserpin and cleaved structures (Fig. 1B, lower panel). This highlights a dichotomy between the serpin metastable native conformation and the thermodynamically stabilised loop-inserted form.

The RCL sequence of conserpin was previously found to support inhibitory activity against trypsin, with a stoichiometry of inhibition (SI) of 1.834, equivalent to a ~45% non-productive turnover rate. The conserpin sequence represents a hybrid of the subsite preferences of a broad spectrum of target proteases. However, as the interaction between two defined partners is generally determined by multiple specific subsite interactions within the binding cleft35, this consensus RCL may not fully capture the potential inhibitory activity of the conserpin scaffold against a single target. Furthermore, the conserpin RCL is 1–4 residues shorter on the P’ side than many inhibitory serpins. To introduce a sequence with a known cognate proteolytic profile, the P7 to P2′ region of the RCL was replaced by the corresponding region of α1-AT, to yield a functional variant, cAT. The purified protein (Fig. 1C) was found to exhibit a mixed α-helix/β-sheet far UV-circular dichroism profile consistent with a folded serpin (Fig. 1D).

cAT is a functional serpin with α1-AT-like specificity

To determine whether the RCL substitution conferred cAT with an α1-AT-like activity, the SI was determined against chymotrypsin. SI is a measure of the efficiency of the inhibitory mechanism, governed by efficient loop incorporation and effective trapping of the tethered protease36,37. With a molar ratio of serpin-to-protease of 1.46 ± 0.03:1 (SEM, n = 4) to achieve full inhibition (Fig. 2A), this interaction was more efficient than the interaction of conserpin with trypsin, with around a third of cAT molecules following the substrate pathway. cAT was also found to form an SDS-stable inhibitory complex – a hallmark of many serpin-enzyme interactions38 – with human neutrophil elastase (Fig. 2B). This complex did not show evidence of secondary degradation by-products, in contrast to the interaction between conserpin and trypsin34. The second order association rate constant of cAT with chymotrypsin (k’ass), corrected for non-productive turnover, was 7.7 ± 0.2 × 105 M−1 s−1 at 25 °C (SE of the regression, n = 4), slightly faster than the rate of interaction with α1-AT under the same experimental conditions, 6.9 ± 0.4 × 105 M−1 s−1 (Fig. 2C). Thus cAT can present its α1-AT RCL in an effective orientation for engagement with a protease and is capable of a sufficiently rapid conformational change to act as an inhibitor.

Figure 2
figure 2

Serpin-enzyme complex formation by cAT. (A) Residual protease activity following the incubation at 25 °C of cAT with chymotrypsin at various molar ratios; the intercept of the linear regression with the abscissa is equivalent to the stoichiometry of inhibition for cAT (solid line) in comparison with α1-AT (dashed line). Error bars represent ± SEM for four independent experiments. (B) SDS-PAGE (10% w/v) of cAT incubated with human neutrophil elastase (HNE) at 37 °C for 2 hours. MM, molecular marker; lane 1, cAT control; lane 2, HNE; lane 4, cAT and HNE incubated at equimolar concentration. Monomer, cleaved and serpin-enzyme complex bands are labelled. (C) The inhibition of chymotrypsin by several concentrations of cAT and α1-AT was followed over time at 25 °C, from which the apparent second-order rate constant (kobs) was calculated. The slope of the relationship between inhibitor concentration and kobs provides the uncorrected second-order rate constant, kapp. Error bars represent ± SEM (n = 4). The inset graph shows representative progress curves of cAT (c) and α1-AT (A) with approximately equal kobs values. (D) The dissociation of the complex between chymotrypsin and cAT or α1-AT was followed as the rate of regain of protease activity at 25 °C following dilution from 5 µM to the concentrations shown. The slope of the resulting regression provides the apparent rate constant, kdiss,app. Error bars represent ± SEM (n = 4).

The inhibitory progress curves did exhibit a difference in behaviour between the two proteins, however, with a noticeably higher residual protease activity for cAT that increased towards the end of the experiment (Fig. 2C, inset). This is characteristic of complex dissociation. To assess this, 5 µM inhibitory complex was diluted to 0–25 nM, and the regain of protease activity monitored by chromogenic substrate turnover. The apparent rate of complex dissociation observed for the cAT-chymotrypsin complex, 6.9 ± 1.0 × 10−5 s−1, was around 10-fold greater than that of the α1-AT-chymotrypsin complex, at 7.5 ± 1.2 × 10−6 s−1 (SEM, n = 4) (Fig. 2D). Coupled with a stoichiometry above parity, this is suggestive of a stability-function trade-off that yields an active but partially perturbed inhibitory mechanism.

cAT forms higher-order species when heated

In common with the parental protein34, cAT was found to be resistant to heat-induced denaturation, with minimal detectable change in secondary structure over the course of a thermal denaturation assay from 25 °C to 95 °C at 1 °C min−1 (Fig. 3A, upper), requiring the addition of denaturant to unfold. In the presence of 2 M guanidine hydrochloride (GdnHCl) (Fig. 3A, lower), the unfolding temperature midpoint of denaturation (Tm) of cAT was 70.0 ± 0.1 °C (SEM, n = 3), which is within 2.5 °C of the value for conserpin34. Following a decrease in temperature from 95 °C to 25 °C, the material showed no visible sign of precipitation, and retained ~50% activity (an SI value of 2.8 ± 0.2, SEM, n = 3) with respect to the pre-denaturation control. This partial loss of activity was not accompanied by a loss of circular dichroism (CD) signal, indicating that it was not associated with gross structural changes in, or significant precipitation of, the sample (Fig. 3A, lower). The thermal refolding curve revealed a similar midpoint transition of 69.5 °C albeit with a less co-operative behaviour than seen during unfolding, indicating that, in the presence of denaturant, the forward and reverse pathways of cAT are different.

Figure 3
figure 3

Heat-induced oligomerisation of cAT. (A) Upper panel: Thermal unfolding of 0.2 mg mL−1 cAT, heated from 25 °C to 95 °C at a rate of 1 °C min−1, monitored by the change in CD signal at 222 nm (black). The sample was then returned to the starting temperature at the same rate (grey). Lower panel: A representative thermal melt performed in the presence of 2 M of GdnHCl. (B) cAT (at 10 µM) was heated at 75 °C, 3 µL aliquots removed at various timepoints, and the oligomerisation state resolved by 6% non-denaturing PAGE. No monomer was visible at the conclusion of the experiment. (C) Left panel: cAT or α1-AT labelled with Atto 488 and Atto 594 fluorescent dyes at a concentration of 0.1 mg ml−1 in PBS was heated at temperatures between 80–87 °C and 60–68 °C, respectively, and the increase in FRET between the fluorophores determined over time. Points reflect representative data normalised between 0 and 1, with the curves of best fit as solid lines. Right panel: The natural logarithm half-times of the change in FRET were plotted against the reciprocal inverse absolute temperature; the slopes of the regressions are proportional to the apparent activation energy of polymerisation under the conditions of the experiment. Error bars reflect ± SEM (n = 4).

Several serpins are known to form ordered long-chain polymers when heated due to population of an oligomerisation-prone intermediate state23,39,40. The latent conformation is an alternate monomeric end-state, in which the RCL is incorporated into β-sheet A without cleavage; this form can neither inhibit proteases nor polymerise41, and it also proposed to be accessed via an intermediate42. After heating at 76 °C for 5 hours, conserpin had been observed to fully convert to this monomeric latent conformation34. In contrast, incubation of cAT at 75 °C led to a gradual decrease in the monomer band intensity with a concurrent appearance of higher molecular mass species by non-denaturing PAGE, including a component unable to migrate into the gel (Fig. 3B). After four hours a complete loss of monomer was seen, with no residual monomer band that could indicate the presence of the latent species41. Thus the substitution of RCL residues in cAT confers behaviour more consistent with polymerisation-prone eukaryotic serpins, and by analogy the ability to adopt a polymerisation intermediate state.

To assess this possibility and follow polymerisation in real-time, mixtures of cAT and α1-AT (at 0.1 mg ml−1) labelled with Atto-488 and Atto-594 dyes were heated, and reaction progress monitored by a change in Förster resonance energy transfer (FRET). The resulting curves reported an increased proximity between donor and acceptor fluorophores to within ~80 Å, consistent with intermolecular association between differentially labelled proteins (Fig. 3C, left). The temperature dependence of the rate of polymerisation was then used to calculate the apparent activation energy of this oligomerisation (Fig. 3C, right). Under these conditions, the energetic barrier to polymerisation was found to be around 3-fold higher for cAT than α1-AT, at 139 ± 12 kcal mol−1 and 48 ± 6 kcal mol−1, respectively (±SE of regression, n = 4). Thus, the cAT native state exhibits significant kinetic stability against heat-induced inactivation.

cAT unfolds via a poorly populated intermediate

In the seminal conserpin study34, intrinsic tryptophan fluorescence reported an apparently two-state equilibrium unfolding profile. This was inconsistent with data obtained using bis-ANS and rapid folding kinetics, which suggested the presence of a folding intermediate, and hence a three-state mechanism. This discrepancy could be explained either by a poorly populated intermediate, or well-populated intermediate with fluorescence properties very similar to either the native or unfolded state34. To investigate this further, GdnHCl-mediated equilibrium unfolding of cAT was undertaken whilst monitoring changes in far-UV CD signal, to use the global structure of the protein as an additional reporter of structural change. The resulting curves (Fig. 4, upper panel) showed an apparent two-state, strongly cooperative and near-contemporaneous transition in both fluorescence and CD, with the mid-point of denaturation (D50%) centred at 2.9 M. As noted for conserpin, this contrasts with the three-state unfolding profile exhibited by most serpins12,40,43. The equilibrium unfolding and refolding data were superimposable – consistent with a fully reversible folding pathway – and protein refolded from 6 M GdnHCl to 0.2 M GdnHCl had an inhibitory activity close to the starting value (SI of 1.7 ± 0.1; SEM, n = 3). Bis-ANS, an environment-sensitive probe whose quantum yield increases upon binding to hydrophobic regions of equilibrium intermediates12,34,43, was found to exhibit a small peak in mid-range concentrations of denaturant (Fig. 4, middle panel), centred at 2.8 ± 0.1 M GdnHCl (SE of global fit, n = 3). This corresponded well with fluorescence and far-UV CD D50% values. Notably, the bis-ANS peak appeared over a noticeably more narrow concentration range than seen for α1-AT or α1-antichymotrypsin, with a width of ~1.7 M compared with 3 M for the latter two proteins12,43. These data are consistent with a minimally-populated folding intermediate and the simultaneous loss of secondary and tertiary structure at increasing concentrations of denaturant.

Figure 4
figure 4

cAT populates an intermediate state over a narrow range of denaturant. Upper panel: Equilibrium GdnHCl-mediated unfolding (solid lines) and refolding (dotted lines) for cAT followed via the change in CD signal at 222 nm (black lines) and intrinsic fluorescence at 330 nm (red lines). Each dataset is the result of at least three independent experiments. A two-state unfolding curve was satisfactorily fit to the data; midpoints are listed in Table 1. Middle panel: cAT was incubated in varying concentrations of GdnHCl at 25 °C and bis-ANS added at 5 times the concentration of the protein. The fluorescence intensity was measured at 480 nm, with an excitation wavelength of 390 nm, and slit widths of 5 nm. Error bars reflect SD from three independent experiments, whose profiles were normalised according to their total integrated fluorescence intensity. The curve reflects the sum of an empirically-determined single exponential decay and Gaussian function. Lower panel: GdnHCl unfolding was monitored by size exclusion chromatography using a Superose 12 10/300 column, at the denaturant concentrations shown. The absorbance at 280 nm is shown in grey. The deconvoluted components of the 2.8 M sample are shown as dotted lines, the sums of the fitted components shown as dashed lines, and the experimental data as solid lines. Arrows indicate the expanded (top) and compact (bottom) intermediates.

Analytical gel filtration reveals species with expanded and compact conformations

Size exclusion chromatography (SEC) can be used to detect distinct intermediate ensembles with different hydrodynamic volumes to that of the native and unfolded conformations18. Accordingly, cAT was incubated in five different concentrations of GdnHCl chosen on the basis of the spectroscopic data, and the SEC elution profile of each sample determined (Fig. 4, lower panel). In the absence of denaturant, the native protein was found to elute at around 13 mL, while in 6 M GdnHCl, the expanded unfolded species eluted at 9.5 mL. At 2.8 M, which corresponds with the spectroscopic D50% value and the peak in bis-ANS fluorescence, both of these species were evident, but with additional peaks at 11.5 mL and 13.7 mL. These peaks are indicative of a population of expanded molecules, as expected for the molten globule-like unfolding intermediate of α1-AT44, and additionally a compact ensemble with respect to the native state. Notably, re-folding rate data indirectly suggested that conserpin populates two distinct intermediates34, an inference which appears to be consistent with the species directly observed here. Interestingly, it has been proposed that the heat-induced intermediate of α1-AT shows compaction with respect to the native state45,46. At either 2 M or 4 M GdnHCl, these additional species were absent. In combination with the spectroscopic data, these profiles are consistent with the presence of a poorly populated intermediate ensemble, comprised of two species with distinct properties, which exist over a narrow denaturant concentration range.

Single-tryptophan variants as probes of local folding

Tryptophan variants can be used as site-specific probes of the local folding behaviour of a protein, as described previously for α1-AT47,48 and plasminogen activator inhibitor-149. The cAT sequence contains tryptophans at positions 160, 194 and 275 (all designations made using α1-AT numbering), corresponding with helix F, the loop connecting strand 3 of α-helix A and strand 4 of β-sheet C (situated in the ‘breach’ region at the top of β sheet A), and helix H respectively (Figs 1A and 5A). In order to use these residues as reporters of local structural change, single tryptophan variants were generated by systematically mutating two at a time to phenylalanine, to form the variants cATW160, cATW194, and cATW275. The CD profiles of the resulting proteins were consistent with that of the wild-type cAT (Fig. 5B), and did not result in a loss of inhibitory activity (Table 1). Thus, the double tryptophan-to-phenylalanine substitutions did not substantially alter global structure or function.

Figure 5
figure 5

Properties of single-tryptophan mutants of cAT. (A) Residues and structural elements proximate to tryptophans 160, 194 and 275 (α1-AT numbering) present in the structure of native conserpin (5CDX) are shown. (B) Far-UV spectra of cATW160 (blue) cATW194 (red) and cATW275 (green) are shown with respect to cAT (black dashes), at a concentration of 0.2 mg mL−1, using a path length of 0.1 cm, at 25 °C. (C) Fluorescence emission spectra for cAT (black), cATW160 (blue) cATW194 (red) and cATW275 (green), in the presence of 6 M of GdnHCl (upper panel) and 8 M urea (lower panel). The dashed spectrum is the summation of all tryptophan mutants. Sample fluorescence, with excitation at 295 nm and 5 nm slit widths, was buffer-corrected. (D) As in panel C, but in the absence of denaturant. (E) Single-tryptophan variants cATW160 (blue) cATW194 (red) and cATW275 (green) at 0.2 mg mL−1, heated from 25 °C to 95 °C at a rate of 1 °C min−1, with unfolding monitored by the change in CD signal at 222 nm. (F) As in panel E, performed in the presence of 2 M GdnHCl.

Table 1 Comparison of cAT and single tryptophan mutants.

In the presence of 6 M GdnHCl, the intrinsic fluorescence emission spectra of the denatured proteins were fully additive (Fig. 5C, upper panel). Spectra of cAT and variants were also recorded in the absence of denaturant (Fig. 5D). The sum of the emission spectra of all single tryptophan variants yielded a maximal intensity about 120% of that of cAT, with comparable peak emission wavelengths at 331–332 nm. A previous analysis of plasminogen activator inhibitor-1, with three of the four tryptophan residues at identical positions to those in cAT, revealed significant quenching of Trp275 by Trp19449. In the context of similar inhibitory activity, far-UV CD profiles and the peak fluorescence emission wavelength across the three cAT variants, this 20% increase in intensity was therefore more likely the consequence of resonance energy transfer in cAT rather than the result of marked structural perturbation.

Intrinsic fluorescent scans were also performed in the presence of 8 M urea. Two of the variants had identical emission profiles to that seen when unfolded in 6 M GdnHCl. The exception, cATW275 (Fig. 5C, lower panel), displayed an intensity at the emission maximum slightly higher than the other variants coupled with a −5 nm blue shift. This indicates that full solvation was not achieved at position Trp275 and some structure persists around helix H. The vicinity of helix H is thus more resistant to unfolding, as seen previously in α1-antichymotrypsin50; the folding nucleus around which the rest of the serpin scaffold condenses therefore appears to be conserved in cAT.

Tryptophan residues contribute to the stability of cAT

Thermal unfolding experiments were used to assess whether the double tryptophan substitutions resulted in a change in stability. In the absence of denaturant, the variants exhibited an observable unfolding profile, with Tm values of approximately 80 °C (Table 1 and Fig. 5E). This is appreciably lower than wild-type cAT but considerably higher than any single eukaryotic serpin studied. Similarly, in the presence of 2 M GdnHCl, the variants had a Tm of 6–8 °C lower than parental cAT (see Fig. 5F and Table 1). These data highlight the importance of the hydrophobic packing mediated by each of these residues, within the key helix F/β-sheet A, breach and sheet B/C barrel regions (Fig. 5A).

cAT folding and unfolding is cooperative

Equilibrium unfolding experiments were then performed on the mutants, with measurements by both CD and intrinsic fluorescence (Fig. 6A). The calculated D50% values (Table 1) reflected a marginally lower thermodynamic stability than the wild type cAT protein (2.5–2.6 M as compared with 2.9 M) with a ∆∆GD-N of greater than 4.4 kcal mol−1 between cAT and the single tryptophan variants. These data point to the importance of the local interactions made by the tryptophan residues. However, reversibility was not affected: similar curves were seen when the refolding of each variant was monitored in the same manner. The close correspondence in behaviour reported by tryptophan residues on different structural elements, by global (CD) and by local (intrinsic fluorescence) measures of unfolding, highlight the highly cooperative nature of cAT folding.

Figure 6
figure 6

Equilibrium unfolding of single-tryptophan variants of cAT. (A) Equilibrium GdnHCl-mediated unfolding (solid lines) and refolding (dotted lines) for cATW160, cATW194 and cATW275 followed by the change in CD signal at 222 nm (black lines) and intrinsic fluorescence at 330 nm (red lines). Each dataset is the result of at least three independent experiments. A two-state unfolding curve was fitted to the data. The concentration at which half the protein is unfolded (D50%) or refolded (D50% refold) is shown in Table 1. (B) Normalised fluorescence emission spectra of parental cAT, cATW160, cATW194 and cATW275 in buffer without denaturant (—), and in the presence of 2.6 M or 3 M of GdnHCl (---) and 6 M GdnHCl (···). All scans were conducted at 25 °C, with an excitation wavelength of 295 nm and slit widths of 5 nm.

Intrinsic fluorescence spectra reveal the presence of a folding intermediate

Fluorescence scans of α1-AT at the midpoint of chemical denaturation exhibited a red shift from 330 nm to 343 nm, which has been ascribed to the presence of an intermediate ensemble47. Similar scans were conducted here at 3 M and 2.6 M GdnHCl for cAT and the tryptophan variants, respectively (Fig. 6B). Under these conditions, all double mutants showed a λmax of approximately 348 nm, signifying the partial exposure of each individual tryptophan to the solvent due to local conformational changes. This common behaviour is consistent with a molten globule intermediate51. These data, in combination with the results obtained using bis-ANS and SEC, support the conclusion that cAT folds through a three-state mechanism via a molten globule intermediate. Therefore, the reversible nature of cAT unfolding is a consequence of a poorly populated, rather than absent, folding intermediate, facilitated by a native state with pronounced thermodynamic and kinetic stability and a highly efficient and cooperative folding pathway.

Discussion

Serpins are present in every taxonomic phylum, including prokaryotes that live at extremes of temperature. However, well-characterised serpins relevant to human physiology generally possess a finely-balanced stability easily perturbed by mutation, which on the face of it appears to be a necessary compromise to maintain the unique mechanism of action. It follows that prokaryotic hyperstable serpins should therefore be the product of specialised evolutionary adaptations to an intrinsically unstable scaffold. It is notable then that conserpin, a protein representative of eukaryotic serpins, exhibits both hyperstability and inhibitory activity. Indeed, this protein exhibits structural features that are distinct from the kinds of changes typically associated with adaptation to destabilising environments. Instead, it is proposed that the folding landscape has been smoothed, reducing opportunities for populating aggregation-prone intermediate states34.

The whole-protein consensus approach to the design of conserpin was simple: each residue corresponds to the most frequently observed amino acid at each position in an alignment of eukaryotic serpins. The underlying premise is that, over the course of evolution, extensive residue-level sampling has occurred, and changes which are the least incompatible with function and stability will be over-represented in modern-day sequences52. This is an approach distinct from the derivation of an ancestral sequence, which reconstructs a protein based on phylogenetic relationships, and which by definition attempts to negate the influence of changes in sequence that have occurred subsequent to the chosen branch-point. Thus it is unsurprising that there is a higher representation of residues in conserpin from the most populous phylogenetic clades, α1-AT-like clade A and intracellular clade B (Fig. 1B).

While the consensus requirements of the N-terminal RCL hinge region are well-known53, the specificity of a serpin is reliant on an effective substrate-like interaction between the C-terminal region of the RCL and a target protease. This in turn depends on protease subsite preferences and active site topology54, in which both RCL sequence and length play a role. Thus the observation that around 45% of conserpin molecules are cleaved non-productively may be a consequence of an RCL sequence built on consensus principles. To test this, and generate a tool protein with a known inhibitory profile, the specificity-determining residues of the α1-AT RCL were introduced onto the conserpin scaffold. Whilst exhibiting a comparable rate of association for α1-AT with chymotrypsin, and a higher degree of inhibition with respect to conserpin, this change did not convert the resulting protein into a stoichiometric inhibitor (Fig. 2A and C). Additionally, the inhibitory complex exhibited compromised stability with respect to α1-AT, with a gradual regain of protease activity at an accelerated rate (Fig. 2D). The basis for this is not immediately clear; the two have a comparable RCL length and the resting place of the inhibited protease is structurally similar in both. It has been shown that the protease catalytic triad can be distorted to various extents in some complexes55 and that partial translocation of a protease results in a relatively unstable inhibitory state56. It is possible that the instability is due to an impedance of full translocation, or an otherwise incompletely inhibited catalytic triad. Thus, it appears that the less efficient inhibitory mechanism is a trade-off for the gain in stability.

Like its progenitor protein, cAT exhibited a remarkable thermal stability at odds with the eukaryotic serpins from which the consensus sequence was derived (Fig. 3A); the only naturally-occurring serpin characterised to date with a higher stability is aeropin, produced by an archaeon that lives at temperatures close to 100 °C18. This was associated with a pronounced increase in the kinetic barrier to polymerisation (Fig. 3C), while thermodynamic stability with respect to the unfolded state was partially attributable to the three tryptophan residues in helix F, at the top of β-sheet A, and helix H (Fig. 5E). The lack of distinction between denaturant and thermal measures of stability (Figs 5E,F and 6A) reinforces a contribution to global stability and the high degree of cooperativity between the three locales during folding. It has been reported that introduction of a tryptophan at position 160 of α1-AT elevated the Tm from 59 to 65 °C48; correspondingly, this side-chain contributes to an enhanced packing with β-sheet A in the conserpin structure34. Trp194 is the most highly conserved of the three, situated in the breach region, and substitution with a phenylalanine residue has been found to decrease the kinetic stability of α1-antichymotrypsin57, although not the thermodynamic stability of α1-AT47. The loss of Trp275 does not compromise reversibility or lead to a disproportionate impact on thermodynamic stability with respect to the other tryptophan residues; thus, it does not appear to be required for the putative folding nucleus (Fig. 5C, lower panel).

The equilibrium unfolding and refolding profiles of cAT and its tryptophan variants were consistent with those observed for conserpin: they exhibited a two-state transition, showed full reversibility, and exhibited a high degree of cooperativity (Figs 4 and 6A). In contrast, all characterised eukaryotic serpins have been found to unfold via an intermediate species18 often described as molten globule-like in structure51, and no other eukaryotic serpin has been shown to unfold reversibly and refold fully. Accordingly, intrinsic fluorescence spectra of individual tryptophan residues and analytical size exclusion chromatography confirmed the presence of a distinct expanded intermediate ensemble (Figs 4 and 6B). The resolution of two novel components over a narrow denaturant range is consistent with indirect evidence of intermediates inferred from rapid folding of conserpin34 and with the observation of distinct transition midpoints in plasma α1-AT reported by different spectroscopic approaches58. The concurrent appearance of these intermediates at equilibrium under restrictive denaturant concentrations suggests that they are interconvertible, directly or indirectly. In the context of folding from a denatured state, this further supports a highly efficient collapse from expanded to compact ensembles.

Polymers and the inactive latent conformation form via intermediate states that can be induced under destabilising conditions23,39,40,42. In α1-AT, the degree of native state perturbation is associated with generation of two different polymer configurations, attained through two different intermediates - a ‘denaturant-induced’ expanded molten globule unfolding intermediate44 and a compact ‘heat-induced’ polymerisation intermediate45,46. Whilst definitive equivalence cannot be determined from these data, it remains noteworthy that the hydrodynamic volumes of cAT intermediate species are consistent with those implicated in α1-AT polymerisation. Further investigation would be required to establish whether the intermediate states identified here represent branch points leading to different conformational outcomes. Nevertheless, with intermediates apparently representing an obligate component of the serpin folding pathway, it is most likely their low abundance that asserts the greatest influence on thermal stability and reversibility of unfolding of cAT. Indeed, our data combined with published observations of conserpin34 and aeropin18 support a general mechanism in which serpin stability against misfolding and polymerisation is achieved not by elimination of such intermediates, but by limiting the extent to which they are populated.

When considering mechanisms of inactivation, a noteworthy divergence in behaviour of cAT from conserpin is the appearance of higher-order species during prolonged heating at high temperature (Fig. 3B). Under these conditions the parental conserpin protein also undergoes an inactivating change, but in contrast does so in a unimolecular fashion, resulting in formation of the latent species34. The prevailing evidence is that polymerisation and latency arise through a common process: both have been observed to occur simultaneously with neuroserpin42, Z α1-AT and M α1-AT in the presence of citrate41,59, and thus have been proposed to represent alternate outcomes determined by a decision point along the pathway42. Structurally, both forms require a mobile strand 1 of β-sheet C, and at least partial insertion of the RCL as an additional central 6th strand in β-sheet A14,21,60,61. As cAT and conserpin undergo conformational change with a similar half-time of around 1-2 hours under equivalent conditions, the substitution of the α1-AT RCL residues has not made cAT substantially more prone to inactivation, but instead shifted the balance towards the formation of oligomers. A basis for this could be greater compatibility of the P7-P1′ RCL residues with the inserted state, in line with an improved stoichiometry of inhibition relative to the parental conserpin protein (Fig. 2A). Whether this is the case or not, these data indicate that an attempt to further limit inactivating conformational change should focus in the first instance on residues outside of the RCL.

In α1-AT, the intermediate ensemble has been characterised as partially folded, possessing an intact β-sheet B and helices G and H, an expanded β-sheet A and a disrupted helix F47,48,51,62,63. This is compatible with the presence of residual structure in the vicinity of Trp275 (on helix H) in 8 M urea and the disordered breach and helix F regions (Trp160 and Trp194 respectively) observed here. By extension, this suggests that not only is the native structure and function of the consensus serpin protein conserved, but the folding pathway is as well.

The development of a synthetic serpin using a consensus strategy was found to yield a functional protein that is highly thermostable, resistant to polymerisation and able to fold reversibly. This study has provided direct evidence for two transiently populated intermediates, and the use of single-tryptophan variants has revealed the folding pathway to exhibit a high degree of cooperativity and conservation of the folding nucleus. Substitution of the RCL has improved inhibitory efficiency but suggests that that further optimisation of the conserpin core may be necessary to maximise this. Even so, with a fully reversible folding pathway, pronounced stability, and adaptable specificity, conserpin and the derivatives characterised here provide a useful platform for the investigation of conformational change and stability in the serpin superfamily.

Methods

Materials and software

All reagents were from Sigma Aldrich unless specified. The concentration of stock solutions of guanidine hydrochloride (GdnHCl) was determined using refractive index measurements64. Molecular weight markers were from Life Technologies and Fermentas. A primary mouse anti-histidine tag antibody (AbD Serotec) and secondary sheep anti-mouse antibody (Chemicon) were used for western blots. Bovine α-chymotrypsin was stored in 1 mM HCl. Human neutrophil elastase (HNE) was from Calbiochem and prepared in 50 mM sodium acetate, 200 mM NaCl, pH 5. Non-linear regression and numerical calculation of reaction half-times was performed using Prism (GraphPad Inc.) and GNU Octave. Structural representations were generated using Pymol (Schrodinger Inc.).

Design of the consensus serpin (cAT)

The engineering of the conserpin sequence has been described previously34. Residues P7-P1′ (GVEIVPRS) were replaced with the P7-P2′ region of the α1-AT RCL (FLEAIPMSI). The constructs used for the single tryptophan mutation experiments were codon-optimised for E. coli and commercially synthesised by DNA 2.0 (USA).

Expression constructs

Plasmids encoding cAT and the mutants cATW160 (W194F/W275F), cATW194 (W160F/W275F), and cATW275 (W160F/W194F) were generated using ligation-independent cloning with the pLIC-HIS vector using standard protocols; these constructs were transformed into BL21(DE3) E. coli and subjected to small scale expression65. Colonies were screened for expression using a crude activity assay in which 90 µL of the lysate was mixed with 10 µL of 10 µM of chymotrypsin.

Protein expression and purification

Protein was expressed using Overnight Express™ Instant TB Medium (Merck-Millipore) as described previously65. Cells were harvested, lysed in 40 mL lysis buffer (10 mM imidazole, 25 mM NaH2PO4, 300 mM NaCl, pH 8.0) supplemented with 2 mM β-mercaptoethanol, 0.125 mM PMSF, 0.25 mg mL−1 lysozyme and 1 mg of DNAse I. Following centrifugation, the soluble fraction was filtered through a 0.22 µm filter membrane (Millipore) and applied to a pre-equilibrated 5 ml HisTrap HP column (GE Healthcare), washed with 6 volumes of 20 mM imidazole, 25 mM NaH2PO4, 300 mM NaCl, pH 8.0 and eluted with 500 mM imidazole, 25 mM NaH2PO4, 300 mM NaCl, pH 8.0. Peak fractions were loaded onto a Superdex 200 16/60 column and eluted with 50 mM Tris, 90 mM NaCl, pH 8.0; protein was concentrated to ~300 µM and stored at −80 °C.

Characterisation of inhibitory properties

The SI and association rate constant (kass) of cAT and single tryptophan variants against bovine chymotrypsin was determined as described previously66 using protease assay buffer (20 mM Tris, 100 mM NaCl, 0.1% (w/v) PEG 8000, 10 mM CaCl2, pH8.0) and 200 µM N-succinyl-Ala-Ala-Pro-Phe-p-nitroanilide substrate. The refolding SI was measured using protein unfolded in 6 M of GdnHCl, 90 mM NaCl and 50 mM Tris, pH8.0 for 2 hours, and refolded into 50 mM Tris, 90 mM NaCl, pH 8.0, so the final concentration of the GdnHCl was 0.2 M, for an additional 2 hours. For complex dissociation experiments, 5 µM serpin-chymotrypsin complex was diluted 200–2000-fold into assay buffer containing substrate, and the resulting progress curves describing the regain of activity were fit by a quadratic equation as described67 with the turnover number calculated separately for each discrete experiment.

Heat-induced oligomerisation

The disappearance of monomer was analysed by incubating 10 µL of 10 µM of cAT protein samples at 75 °C for different times. 3 µL of the protein sample was then added to 1 µL of native-PAGE loading buffer and resolved by 6% native-PAGE as described68. For FRET experiments, α1-AT or cAT, in phosphate-buffered saline (PBS), were incubated with a 2–3-fold molar excess of either NHS-Atto 488 or NHS-Atto 594 dyes (Atto-Tec, Germany) in separate reactions for 4 hours at 25 °C. Following the quenching of the reaction by 10 mM hydroxylamine, the preparations were were separated from unconjugated dye by anion exchange chromatography as described66. Polymerisation experiments, with protein at a concentration of 0.1 mg ml−1 in PBS and a total volume of 20 µl, were performed using a Mastercycler Realplex4 instrument (Eppendorph) and the resulting curves were well-described by a double exponential equation as noted previously66. The temperature dependence of these reaction half-times was assessed using an Arrhenius plot, with the apparent activation energy (Eact,app) calculated by multiplying the slope of the regression by the gas constant.

Circular dichroism scans and thermal denaturation

Circular dichroism (CD) measurements were performed on a Jasco J-815 CD spectrometer (Jasco) at a protein concentration of 0.2 mg mL−1 with 90 mM NaCl and 50 mM Tris, pH 8.0 using a quartz cell with a path-length of 0.1 cm. Far-UV scans were performed between 200–260 nm for samples in 90 mM NaCl and 50 mM Tris, pH 8.0. For thermal denaturation, a heating rate of 1 °C min−1 from 25 °C to 95 °C was used, with the change in signal measured at 222 nm. Refolding was measured directly after the thermal melt by holding the temperature at 95 °C for 1 min before the temperature was decreased to 25 °C at the same rate. The midpoint of transition (Tm) was obtained by fitting the data with a Boltzmann sigmoidal curve in accordance with the method described69 for both forward and reverse thermal denaturation experiments.

Size exclusion chromatography (SEC)

SEC was performed on a Superose 12 10/300 gel filtration column equilibrated with the respective buffer that the protein was incubated in: 0 M, 2 M, 2.8 M, 4 M or 6 M GdnHCl in 90 mM NaCl and 50 mM Tris, pH8.0. The elution profiles, reflecting the absorbance at 280 nm, were fit to the sum of one or more Gaussian functions.

Tryptophan fluorescence scans

Intrinsic fluorescence was measured using a FluoroMax-4 spectrofluorometer (HORIBA Jobin Yvon) with 0.5 µM of each protein in a 1 cm path-length quartz cell at 25 °C. The excitation wavelength (λex) was 295 nm and the emission wavelength (λem) was 330 nm, with 5 nm slit widths.

Equilibrium unfolding and refolding

Equilibrium unfolding was performed as described18 using both CD and intrinsic fluorescence. Refolding curves were obtained through first unfolding the protein in 6 M of GdnHCl, 90 mM NaCl and 50 mM Tris, pH8.0 for 2 hours and then refolding back into buffer containing various concentration of GdnHCl with 50 mM Tris, 90 mM NaCl, pH 8.0 for an additional 2 hours. The fraction of unfolded was plotted against the final GdnHCl concentration.

Sequence and structural comparisons

MEGA 670 was used to calculate the average pairwise sequence identity between conserpin and aligned eukaryotic serpins separated into phylogenetic clades1. To produce sub-alignments in which the most variable sites were removed at different thresholds, positions were ranked by their Kabat variability score (calculated as the number of amino acids at a site ÷ the frequency of the most common amino acid). For structural comparisons, the native (PDB 5CDX) and latent (PDB 5CDZ) conformations of conserpin34 were aligned to native and loop-inserted serpin structures, respectively, using SUPERPOSE71. Two rounds of superpositions were performed: following the first, those residue positions in conserpin that failed to match in any pairwise comparison were excluded the subsequent round, following which root-mean square deviations (RMSD) were calculated.