1 Introduction

Cystine knots are a structural motif with three disulfides (six cysteine residues in close proximity in a protein backbone), with one of the disulfides passing through a ring, formed by the other two disulfide bonds [1]. Cystine knots are known to enhance protein structural stability, and they can be found in many proteins with a wide range of biological functions, such as inhibition, growth stimulation, and cyclization [2, 3]. However, the cystine-knot family shows low sequence homology, and it is therefore hard to predict cystine-knot signatures by sequence alignment. Furthermore, there are 15 ways to form three disulfides into a cystine knot, and the determination of the correct assignment of the disulfide bonds is a challenge. There is a clear need to develop methodology that can be used routinely to determine the disulfide linkages, thus providing structural information for protein function and stability.

Arylsulfatase A (ARSA), a lysosomal enzyme, contains a cystine knot. ARSA catalyzes the hydrolysis of cerebroside sulfate to cerebroside and sulfate. Deficiency of this enzyme cumulates cerebroside sulfate and leads to the destruction of myelin in the central and peripheral nervous systems, resulting in a progressive demyelination disease known as metachromatic leukodystrophy (MLD) [4]. MLD is an autosomal recessive disease with late infantile, juvenile, and adult forms, and is a terminal illness. Most children with the infantile form die by age 5 years. Symptoms of the juvenile form progress with death occurring 10 to 20 years following onset, and those persons affected by the adult form typically die within 6 to 14 years following onset of symptoms.

Patients with MLD have been reported to have a disruption of the cystine knot by the mutation of Cys 470 to Arg [5]. Recent studies have shown that a partially or fully reduced cystine knot makes the protein susceptible to chemical or proteolytic degradation [6]. The conformation of ARSA, forming a homo-dimer protein at neutral pH and a homo-octamer at acidic pH (i.e., in the lysosome), requires proper disulfide linkages. The stability of the enzyme seems to relate to the dimer-to-octamer transition in the lysosomal milieu, in which formation of the octamer has been shown to be disrupted by the replacement of Cys282 with phenylalanine [7].

Recombinant human arylsulfatase A (rhASA) with the sequence homology to ARSA has been investigated for use in enzyme replacement therapy, a potential treatment for MLD patients [8, 9]. Thus, characterization of disulfides in rhASA is an important structure attribute for biopharmaceutical manufacturing to maintain drug function and stability. Currently, experimental approaches for the characterization of cystine knots include X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR) [10], which can be time-consuming (X-ray), and inconclusive because of the near proximity of six cysteines (NMR). The development of an effective and robust methodology is needed for manufacturers’ ability to routinely maintain and control the drug quality.

A typical strategy for cysteine characterization included proteolytic cleavage of the backbone using an appropriate endoprotease. Comparison of the reduced and non-reduced peptide maps leads to the identification of the bound and free cysteines [11, 12]. However, it is challenging to correctly assign the disulfide linkages for nested disulfides or cystine knots using this methodology. Recently, Wu et al. established a methodology using LC-MS with collision induced dissociation (CID), electron transfer dissociation (ETD), and CID of the isolated charge-reduced ions (MS3) to determine complicated and intertwined disulfides including scrambling [1315]. In this work, we applied a multi-enzyme digestion strategy combined with CID, ETD, and CID-MS3 to characterize the complete cysteine status in rhASA. Several digestion protocols with different pH were evaluated to determine whether the free cysteines of the protein promote disulfide scrambling under alkaline conditions typically used for enzymatic digestion. By optimizing specific protocols for each cysteine, the status of all the cysteines in rhASA, including the disulfide linkages from the cystine knot and nested disulfide, were successfully determined.

2 Experimental

2.1 Samples

rhASA, GMP lot JPT11001, manufactured by Shire Human Genetic Therapies (Lexington, MA, USA), was provided at 39.1 mg/mL The sample was aliquoted (10 μL or 391 μg per vial) and stored at –80 °C before analysis.

2.2 Reagents

Sequencing-grade trypsin was purchased from Promega (Madison, WI, USA). Mass spectrometry grade lysyl endopeptidase (Lys-C) was obtained from Wako (Richmond, VA, USA). Pepsin was purchased from MP Biomedicals (Santa Ana, CA, USA). Asp-N, PNGase F, ammonium bicarbonate (NH4HCO3), and formic acid (FA) were from Sigma-Aldrich (St. Louis, MO, USA). LC-MS grade water was purchased from J.T. Baker (Phillipsburg, NJ, USA), and HPLC grade acetonitrile from ThermoFisher Scientific (Fairlawn, NJ, USA). Amicon centrifugal filter units (10 kDa MWCO) were obtained from Millipore (Bedford, MA, USA).

2.3 Enzymatic Digestion

The protein solution (2.5 μL of 39.1 mg/mL) was buffer exchanged with 100 mM ammonium bicarbonate (pH 8) or 50 mM Tris-HCl buffer (pH 6.8) using a 10 kDa molecular weight cutoff filter and concentrated to 2 mg/mL (49 μL). In a separate study, a slightly less than alkaline pH (pH 6.8) was used to examine the effect of pH on the formation of alternative disulfide linkages during the digestion procedure. If a difference was observed, pepsin digestion at pH 2 was used to eliminate scrambling that can occur under basic pH conditions. For pepsin digestion, the protein solution was buffer exchanged with 10 mM HCl (pH 2). Pepsin (1:10, wt/wt) was added to the protein solution and incubated at 37 °C for 30 min. The reaction was quenched by adjusting the pH to 6 with 10 mM sodium hydroxide. For Lys-C plus trypsin digestion, the protein solution (pH 6.8 or 8) was incubated with endoproteinase Lys-C (1:50 wt/wt) and trypsin (1:50 wt/wt) for 8 hat room temperature. The enzyme was added a second time (1:50 wt/wt for each enzyme), and the digestion was allowed to occur for an additional 12 h at room temperature. For Lys-C plus trypsin, Asp-N, and PNGase F digestion, the protein solution (pH 6.8 or 8) was treated with the combination of endoproteinase Lys-C (1:50 wt/wt), trypsin (1:50 wt/wt), Asp-N (1:50 wt/wt), and PNGase F (1 units/10 μg) for 8 h at room temperature; the mixture of enzymes was then added a second time (the same ratio for each enzyme), and the digestion was allowed to occur for an additional 12 hat room temperature. After checking the digestion efficiency, no differences could be observed for PNGase F added either to the same mixture or prior to the mixture. For simplicity, PNGaseF is added in the same mixture. In all cases except pepsin digestion, digestion was terminated by the addition of 1 % formic acid. An aliquot of 2 μg of the enzyme digest was analyzed per LC-MS run.

2.4 LC-MS

An Ultimate 3000 nano-LC pump (Dionex, Mountain View, CA, USA) and a self-packed C18 column (Magic C18, 200 Å pore, and 5 μm particle size, 75 μm i.d. × 15 cm) (Magic C18 particle from Michrom Bioresources, Auburn, CA, USA) was coupled to an LTQ-Orbitrap-ETD XL mass spectrometer (ThermoFisher Scientific, San Jose, CA, USA) equipped with a nanospray ion source (New Objective, Woburn, MA, USA). Mobile phase A was 0.1 % formic acid in water, and mobile phase B was 0.1 % formic acid in acetonitrile. The peptides were eluted at 200 nL/min using a linear gradient from 2 % to 60 % B in 90 min, followed by 60 % to 80 % B over 10 min. The LTQ-Orbitrap-ETD XL mass spectrometer was operated in the data-dependent mode to switch automatically between MS (scan 1 in the Orbitrap), CID-MS2 (scan 2 in the LTQ), and ETD-MS2 (scan 3 in the LTQ). Briefly, after a survey MS spectrum from m/z 300 to 2000, subsequent CID-MS2, and ETD-MS2 steps were performed on the same precursor ion with ±2.5 m/z isolation width, with ion/ion reaction duration time being maintained constant throughout the experiment at 100 ms [14]. CID-MS2 and ETD-MS2 spectra were repeated by targeting specific ions, in order to gain linkage information not obtained in the initial run. These targeted approaches, using the Orbitrap in scans 2 and 3 (if needed), were repeated (e.g., targeting multiple charges of a precursor ion or the same disulfide-linked peptide but with different enzymatic cleavage sites or missed cleavages) until the linkage information was complete. If necessary, the ions of interest obtained with ETD-MS2 were targeted for CID-MS3.

2.5 Disulfide Assignment

The expected disulfide-linked tryptic or multi-enzyme digested peptide masses with different charges were first calculated and then matched to the observed masses in the LC-MS chromatogram. The matched masses (with <5 ppm mass accuracy for highly abundant ions and <20 ppm for low abundant ions, as determined by 10 % above or below the main peak) were further verified by analysis of the corresponding CID-MS2 and ETD-MS2 fragmentation spectra, as well as the CID-MS3 fragmentation spectra, as needed. Any internal cleavages (e.g., simultaneous cleavages at both the P1 and P2 polypeptides, or two cleavages within an intra-linked disulfide, were assigned manually. For these assignments, the other cleavages (i.e., the portion cleaved from the association with internal cleavages) should be found to confirm the assignment of the internal cleavages. In our experience, these internal cleavages for disulfide-linked peptides seem to occur more often than typical peptides (without disulfide linkages) and, thus, need to be paid more attention in assignment.

3 Results and Discussion

The primary structure of recombinant human arylsulfatase A (rhASA) with six disulfides linkages and three unpaired cysteines is shown in Figure 1. The crystal structure of human ARSA (without glycosylation) deduced these six disulfide linkages and three free cysteines, including one that is post-translationally modified to formylglycine [16]. The cystine knot is formed at the C-terminal end of the molecule from the six cysteine residues indicated in the figure. It is important to note that ARSA is a glycoprotein with glycosylation sites indicated in Figure 1. It is clear that the complexity of the protein structure makes the accurate determination of the status of all cysteines very challenging. Multiple strategies are necessary for the elucidation of cysteines, as described below.

Figure 1
figure 1

Primary structure of rhASA with disulfide linkages and unpaired cysteines

3.1 Digestion Strategy

A multi-enzyme digestion strategy is clearly needed for the complicated disulfide and unpaired cysteine structure for rhASA. In principle, identification of a single disulfide linkage is straightforward because there is usually only one possibility for connection. Consequently, proteases that can cut proteins into peptides containing only a single disulfide are desired. However, intertwined disulfides or a cysteine-rich region in a protein such as in the case of rhASA may prevent enzyme digestion to the desired peptide size. Preferred peptide sizes are 1 to 5 kDa since peptide recovery and electrospray ionization efficiency can be a problem for larger peptides, while peptides less than 1 kDa may not retain well on a reversed-phase column. In some cases, the disulfide assignment will require further adjustment of peptide sizes to generate peptide lengths with sufficiently high-charge states for effective ETD fragmentation [13]. It should be noted that the enzymatic cleavages of the protein are the same using either trypsin or Lys-C plus trypsin. Nevertheless, the use of Lys-C plus trypsin seems to yield slightly higher digestion efficiency than trypsin alone. The reason could be that the protein size was reduced by Lys-C, leading to a more effective trypsin digestion. Thus, the selection of specific enzymes needs to be carefully considered. Also, for disulfide-linked peptides containing N-linked glycosylation, an additional PNGase F treatment should be considered to reduce the complexity of the mass spectra. For peptides containing free cysteines, the digestion pH for the selected enzymes needs also to be optimized to maintain sufficient enzyme activity while avoiding scrambling. In this study, after surveying several enzyme combinations (Lys-C, trypsin, Asp-N, pepsin, and PNGase F), several protocols were developed for the full cysteine status of rhARSA. Table 1 lists the various digestion protocols including the fragmentation methods for the specific assignments. A detailed description of these steps is in the following sections, beginning with the unpaired cysteines, followed by the single disulfide, and then the nested disulfide, with the final section dealing with the complicated cystine knot.

Table 1 Unpaired cysteine and disulfide linkages in rhASA

3.2 Unpaired Cysteines: Cys20, Cys51, Cys276

When trypsin or Lys-C plus trypsin digestion was used to assign the unpaired cysteines, disulfide scrambling, which formed various disulfides mainly among the free cysteines, was observed using a standard digestion buffer at pH 8 (~40 %), and to a lesser extent at pH 6.8 (~5 %). As expected, we did not observe scrambled disulfides with pepsin digestion at pH 2. It should be noted that the scrambled disulfides obtained at higher pH (i.e., pH 8) provided us the types and linkage information. Thus, we could target these scrambled disulfides at the lower pH analysis. Although the amount of scrambled disulfides could be lower at the lower pH, the targeting approach (extraction of specific ions for targeted MSn) should be more sensitive than in the initial discovery mode. Although pepsin digestion is nonspecific, major cleavages were found after leucine residues (C-terminal side), followed by aromatic amino acids, proline, and glutamic acid residues. Thus, we could focus on these cleavages for potential scrambled disulfides as well. The major pepsin fragment containing the unpaired Cys20 was identified. The peptide with the corresponding mass and charge is shown in Figure 2a. As shown, the precursor ion scan was performed in the Orbitrap, and its monoisotopic mass accurately matched the theoretical peptide mass with an unmodified free cysteine, as m/z 667.2965 (observed) matched to m/z 667.2957 (theoretical) for the 2+ charge state. The site of the free cysteine was determined by CID-MS2 of the precursor ion, as shown in Figure 2b.

Figure 2
figure 2

(a) Mass and charge of the pepsin-digested peptide with an unmodified (free) Cys20, and (b) CID-MS2 spectrum of the precursor from (a). The sequence and theoretical mass of the peptide are indicated in the insert (a)

The remaining unpaired cysteines were identified in a similar manner, as shown in the supplementary material, Figure S1A and B (Cys51 converted to formylglycine), Figure S2A and B (Cys51, a free cysteine), and Figure S3A and B (Cys276, a free cysteine). Table 1 (#1, #2, #3, and #4) summarizes the assignments for all the unpaired cysteines. At Cys51, it contains more than 70 % of the formylglycine form. Since the ionization efficiency of the peptide containing the free cysteine could be different from that containing the formylglycine, the ratio of the two is a rough estimation of the formylglycine conversion.

3.3 Single Disulfide: Cys282–Cys396

For the peptide with a single disulfide (Cys282–Cys396, #5 in Table 1), the linkage assignment was straightforward. Although alkali pH (i.e., pH 8) should not cause the disulfide-linked cysteines to scramble, the other free cysteines in the protein could potentially cross react with the disulfide at alkali pH. Indeed, a minute amount of cross-reacted disulfides was observed using a digestion buffer at pH 8. As expected, no cross-reacted disulfides could be observed when trypsin at pH 6.8 or pepsin (pH 2) was used for the digestion. The assignment of the disulfide-linked peptide is illustrated by the pepsin digestion protocol in the supplementary material (Figure S4). In Figure S4A, the observed accurate mass matched the theoretical peptide mass with one disulfide (a loss of 2H from the backbone sequence). The corresponding CID-MS2 spectrum, b and y ions in Figure S4B, verified the correct sequence. For the corresponding ETD-MS2 spectrum, the disulfide bond was preferentially dissociated as expected [14, 15], resulting in two dissociated peptides designated as P1 and P2 (Figure S4C), which confirms that the two peptides are linked together.

In this disulfide linkage assignment, we used the low pH approach instead of alkylating the free cysteine to prevent scrambling since it is difficult to control the alkylation properly. Alkylation under denaturing conditions, the scrambling (particularly from free cysteine) can occur quickly before the alkylation step. If the protein is at native conditions, the alkylation step is often not optimized. Incomplete alkylation could cause the remaining free cysteine to scramble as well. Therefore, we used low pH to protonate (inactivate) the free cysteine to assign the disulfide linkages. In addition, with the use of low pH (not to alkylate the free cysteine), the free cysteine in the sequence can be assigned as well, as shown in Figure 2, as well as in Figure S2 and Figure S3 in the supplementary material.

3.4 Nested Disulfide: Cys138–Cys154 and Cys143–Cys150

As shown in Figure 1, the cysteines for the nested disulfides are located in Cys138–Cys154 and Cys143–Cys150. Since there are four cysteines, other potential linkages could be either as two separate disulfides (Cys138–Cys143 and Cys150–Cys 154) as well as two crossed disulfides (Cys138–Cys150 and Cys143–Cys154) (see Figure S5 in the supplementary material). Furthermore, the complexity is increased by two N-linked glycosylation sites, one within, and the other next to the two disulfides (N underlined in Figure 1). To reduce the complexity, the two N-linked glycans were removed with PNGase F, converting Asn (N) to Asp (D). This conversion provided a target for Asp-N digestion. Thus, in addition to Lys-C plus trypsin (to reduce the protein size), the addition of PNGase F and Asp-N enzymes effectively cut the disulfide-linked peptide to a suitable size for mass spectrometric analysis (see Figure S6 in the supplementary material). These nested disulfide bonds form a ring, which significantly reduces CID fragmentation efficiency for the amino acids inside the ring [13], thereby complicating the assignment for disulfide linkages inside the ring. Although ETD is effective to break the disulfides, the peptide length obtained by trypsin or pepsin alone is too large for effective fragmentation (m/z >1000) [13]. The digestion protocol required four enzymes to obtain the proper size for effective fragmentation by mass spectrometry (see Figure S6 and Figure S7 in the supplementary material). The assignment of the disulfide-linked peptide based on the mass spectra is shown in Figure 3. In Figure 3a, the observed accurate mass matched the theoretical peptide mass with two disulfides (a loss of 4H from the backbone sequence). Since the ring structure formed by nested disulfides was broken by the additional Asp-N digestion, the disulfide linkages could be conclusively assigned as long as cleavages can be observed in the backbone between the CDGGC amino acid residues. As shown in Figure 3b, the y1, y3, b11, and b12 fragments in the CID-MS2 spectrum provide strong evidence for the linkages Cys138 with Cys154, and Cys143 with Cys150. In addition to the CID-MS2 spectrum, the corresponding ETD-MS2 spectrum (Figure S8 in the supplementary material) confirms that the two linked peptides (P1 and P2) are connected.

Figure 3
figure 3

(a) Mass and charge of the Lys-C plus trypsin plus Asp-N plus PNGaseF-digested peptide with two disulfides (Cys138 with Cys154, and Cys143 with Cys150), and (b) CID-MS2 spectrum of the 2+ charged precursor from (a). The sequence and theoretical mass of the peptide are indicated in the insert of (a

It should be noted that although Asp-N should cleave aspartic acid in the protein backbone, the aspartic acid residue adjacent to a cysteine (a disulfide) inside the ring was not cleaved (see Figure S6C in the supplementary material). For digestion at pH 8, a significantly scrambled disulfide was observed at a different LC retention time (see Figure S9A and B in the supplementary material), as the structure resembled to scramble 1 in Figure S5 (note: no scramble 2 could be observed). Without chromatographic separation, it would be difficult to determine the disulfide isomers since they are often isobaric. Nevertheless, the scrambled disulfides often have different configurations from the correct one. Thus, a different LC retention time or a shoulder peak with identical mass as the correct one is often the potential area for evaluation of scrambling. At pH 6.8, the scrambled disulfide was reduced to a trace amount and could not be observed at pH 2 with pepsin. Although the pepsin-digested disulfide could not be effectively fragmented by CID, the fragmentation did indirectly confirm the nested disulfide linkage (see Figure S10 in the supplementary material). ETD was also tested to fragment the pepsin-digested disulfide but was not successful, due to minimal fragmentation and mainly charge-reduced species in the ETD spectrum. Although CID-MS3 and even MS4 have been attempted to fragment the charge-reduced species, the fragmentation efficiency was still poor for the peptide of this size. As described above, the use of an additional enzyme (i.e., Asp-N) to obtain a proper size and configuration of the disulfides was critical.

3.5 Cystine Knot: Cys470–Cys482, Cys471–Cys484, and Cys475–Cys481

The cysteine knot could not be broken by any of the enzymes or combination of enzymes employed. In addition, CID fragmentation could not produce backbone cleavages within the cystine knot. Thus, ETD was examined. For the amino acid sequence in this region, pepsin digestion was selected in order to obtain the proper peptide length with less acidic residues for effective fragmentation by ETD (i.e., eliminated additional glutamic and aspartic acid residues as compared to the corresponding tryptic fragment). The corresponding mass and charge of the pepsin-digested peptide is shown in Figure S11 (in the supplementary material). The monoisotopic mass matched the expected peptide mass with three disulfides (a loss of 6H from the backbone sequence). Limited sequence information, as expected, was obtained by CID-MS2 (Figure S12). Nevertheless, and significantly, ETD-MS2 dissociated the disulfides, which allowed cleavage of the peptide backbone, as shown in Figure 4a and b. The fragmentation of this disulfide-linked peptide but for two different charge states is shown in Figure 4a (m/z 656.30, 4+) and Figure 4b (m/z 525.20, 5+). The fragmentation data from the two different charge states demonstrates consistency with respect to cleavage sites and verifies that the linkage assignments are correct. Since the peptide was linked through three intertwined disulfides, a partial reduction of a particular disulfide (with mass shift by only 1 Da), the high resolution-accurate mass instrument (Orbitrap) with ETD provided even more convincing evidence for the disulfide bond assignments. As seen in both ETD spectra of Figure 4a and b, z7 and c18, along with the internal cleavages from the dissociated disulfide, confirm the connection between Cys471 and Cys484. In addition, one of the charge-reduced species (m/z 1312.6, [M + 4H] 2+··) in the ETD spectrum was further fragmented (CID-MS3 using the Orbitrap) as shown in Figure 4c. The MS3 spectra contain additional disulfide and backbone cleavages, such as y17 and b8, confirming the connection between Cys470 and Cys482. The fragmentation pattern and assignments were also observed with the same CID-MS3 spectrum generated in the LTQ ion trap (Figure S13 in the supplementary material), which makes the method applicable even with low resolution MS instruments. After assigning the two disulfide linkages, the non-dissociated (the third) disulfide was left with the only possible remaining connection, which was a linkage between Cys475 and Cys481. In summary, the combination of ETD-MS2 and CID-MS3 mass spectral analysis confirms the linkage sites as Cys470 with Cys482, Cys471 with Cys484, and Cys475 with Cys481. The theoretical and observed fragment ions are listed in the supplementary material (Table S1).

Figure 4
figure 4

(a) ETD-MS2 spectrum (using the Orbitrap) of the cysteine knot precursor (m/z 656.30, 4+), (b) ETD-MS2 spectrum (using the Orbitrap) of the same peptide from (a) but with different charged precursor (m/z 525.20, 5+), and (c) CID-MS3 spectrum (using the Orbitrap) of m/z 1312.6 from (a). For the spectra measured in the Orbitrap (a), (b), and (c), only the highest abundant isotopic mass is shown for each ion. The monoisotopic mass of these individual ions are listed in Table S1

4 Conclusions

In this study, in-depth LC-MS protocols have been developed to assign the status of all 15 cysteine residues in rhASA, including the disulfide linkages from the nested disulfide and cystine knot. Although both cystine knot and nested disulfides are difficult to resolve, strategies with a combination of different enzymes and MS fragmentation methods could successfully determine the assignments. The successful assignment of the disulfide linkages in the cystine knot demonstrates the power of the approach, which should be generally useful for other cystine knots. Using the described methods, it becomes feasible to monitor the disulfide linkages of recombinant rhASA.