Mapping disulfide bonds from sub-micrograms of purified proteins or micrograms of complex protein mixtures

Disulfide bonds are vital for protein functions, but locating the linkage sites has been a challenge in protein chemistry, especially when the quantity of a sample is small or the complexity is high. In 2015, our laboratory developed a sensitive and efficient method for mapping protein disulfide bonds from simple or complex samples (Lu et al. in Nat Methods 12:329, 2015). This method is based on liquid chromatography–mass spectrometry (LC–MS) and a powerful data analysis software tool named pLink. To facilitate application of this method, we present step-by-step disulfide mapping protocols for three types of samples—purified proteins in solution, proteins in SDS-PAGE gels, and complex protein mixtures in solution. The minimum amount of protein required for this method can be as low as several hundred nanograms for purified proteins, or tens of micrograms for a mixture of hundreds of proteins. The entire workflow—from sample preparation to LC–MS and data analysis—is described in great detail. We believe that this protocol can be easily implemented in any laboratory with access to a fast-scanning, high-resolution, and accurate-mass LC–MS system.


Functions of disulfide bonds
Formation of disulfide bonds is a common posttranslational modification that has important biological functions. Many secreted proteins such as antibodies, growth factors, extracellular matrix proteins, and cell surface receptors or transporters, which happen to be of great therapeutic interest, are rich in disulfide bonds. As a structural building block, a disulfide bond covalently links two cysteine residues in the same protein or in different proteins to strengthen the correct conformation of a protein or protein complex, thereby improving stability. Further, the reversible nature of a disulfide bond enables it to act as a molecular switch to regulate the activity of enzymes or transcription factors in response to the redox state of the environment (Hogg 2003). To fully understand the biological function of a disulfide-containing protein and its regulation, it is necessary to map precisely the position of each disulfide bond and to determine the redox state of the two cysteine residues involved-in the disulfide form or as free thiols, or else-under conditions studied.

Methods for disulfide bond analysis
In the past, a variety of methods had been used for the analysis of protein disulfide bonds including Edman Shan Lu and Yong Cao have contributed equally to this work. degradation (Haniu et al. 1994), diagonal electrophoresis (McDonagh 2009), mutagenesis of cysteine residues coupled with reducing and non-reducing SDS-PAGE (Itakura et al. 1994), X-ray crystallography (McCarthy et al. 2000), and nuclear magnetic resonance spectroscopy (Sharma and Rajarathnam 2000). However, none of these methods are ideal; the ones that can provide precise linkage information of disulfide bonds demand highly specialized skills and devoted efforts of structural biologists, and the ones that can be executed in an average biology lab do not afford linkage information directly. Further, they usually require milligrams of purified proteins, and none of them work on complex samples.
In recent years, rapid technological development in liquid chromatography-mass spectrometry (LC-MS) has made it possible to map disulfide bonds from as little as micrograms of proteins in a relatively high throughput way (Choi et al. 2009;Gö tze et al. 2012;Huang et al. 2014;Liu et al. 2014Liu et al. , 2017aMurad and Singh 2013;Wang et al. 2014;Wefing et al. 2006;Wu et al. 2008;Xu et al. 2007). Among the methods in this category, the more straightforward ones do not reduce disulfide bonds prior to LC-MS analysis, so the linkage information can be extracted from the fragmentation spectra of peptides containing disulfide bonds. Different fragmentation methods including collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), electron-transfer dissociation (ETD), and electrontransfer/higher-energy collision dissociation (EThcD) (Liu et al. 2014) have been used to varying degrees of success. Table 1 summarizes the data analysis tools that have been developed for LC-MS analysis of disulfide bonds. Most of them are designed to identify disulfide bonds directly from fragmentation spectra. From these endeavors, two challenges have become apparent, one is to identify all the disulfide bonds in a protein and the other is to identify disulfide bonds at a proteome scale, that is, from highly complex samples such as cell lysates, isolated mitochondria, or secretomes. Another constant problem is false identification of disulfide bonds, the source of which may be faulty data analysis or disulfide bond scrambling during sample preparation. About the method used in this protocol With these problems in mind, we developed a method that enabled us to prevent most, if not all disulfide scrambling events and to identify all the native disulfide bonds of a single protein or a mixture of ten proteins from micrograms or even a few hundred nanograms of samples. It also enabled us to map native disulfide bonds at a proteome scale, for instance, 199 disulfide bonds were identified from a periplasmic fraction of Escherichia coli cells and 568 disulfide bonds were identified from proteins secreted by human umbilical vein endothelial cells. This method was published in 2015 (Lu et al. 2015). Since then, to our knowledge, it has been used successfully in dozens of studies and seven of them have been published (Hartman et al. 2016;Hung et al. 2016;Liu et al. 2017b;Mauney et al. 2017;Wang et al. 2016;Wu et al. 2016Wu et al. , 2017. The three key components of this method are as follows. First, disulfide bond scrambling is prevented by blocking free thiols with N-ethylmaleimide (NEM) and by maintaining an acidic pH throughout the sample preparation process, the latter of which includes precipitating freshly prepared protein samples with trichloroacetic acid (TCA) as early as possible and carrying out all protease digestions at pH 6.5 (Fig. 1). Second, to identify all the disulfide bonds of a protein, multiple proteases are utilized, even the non-specific ones such as proteinase K. This is because some disulfide bonds may be present in a complex form that is difficult to identify if the sample is digested only with Lys-C and trypsin (Fig. 2). Last and the most important, a data analysis program called pLink-SS has been developed and carefully tuned to identify disulfide-bonded peptides from HCD spectra. The types of disulfide-bonded peptides that can be identified using pLink-SS are shown in Fig. 3. Presently, pLink-SS has been incorporated into pLink 2, which is an upgraded version of pLink and remains free for academic users. pLink 2 is *40 times faster than pLink, with a friendly graphical interface and some further improvements in accuracy. pLink 2 was officially released on January 1, 2018 and can be downloaded at http://pfind.ict.ac.cn/ software/pLink/.
In this paper, we present a step-by-step disulfide mapping protocol using the method we developed in 2015. As shown in Fig. 4, this protocol contains three alternative sub-protocols that are each optimized for low-complexity samples in solution, protein gel bands, or high-complexity samples. • Refrigerated bench top centrifuge (Eppendorf) • SpeedVac TM concentrator (Fisher Scientific) • A common laboratory oven or dryer • Laser-based micropipette puller, model P-2000 (Sutter Instrument) • Column packing setup, consisting of a pressure injection cell (also known as pressure loading cell or bomb loader) model PC77-MAG (Next Advance), a high-purity nitrogen gas canister (from a local supplier) fitted with a high-pressure regulator, and a stainless steel 1/8-inch diameter tubing that connects the regulator to the pressure injection cell. The last two items are parts of a column packing kit (Next Advance)

LC-MS system
• Easy-nLC1000 liquid chromatography system (Thermo Fisher Scientific) or a similar nano-flow (200-600 nl/min) HPLC system with an autosampler • Q-Exactive TM Q-Orbitrap mass spectrometer (Thermo Fisher Scientific) or a similar fast-scanning, highresolution, accurate-mass MS instrument that can collect ten or more high-resolution (R [ 7000) MS2 spectra per second

Do-it-yourself capillary chromatography columns
Needless to say, skip this section if one chooses to purchase pre-packed columns of equivalent properties.

RP analytical column with a spray tip
i. Cut a 50-cm-long 75-lm ID fused silica tubing, burn the center segment (2-3 cm) over an ethanol burner, and wipe the blackened coating off the tubing with a sheet of kimwipe moistened with methanol. iii. Dismount the two empty columns each with a pulled tip. iv. In a tube or a glass vial that will fit inside the pressure injection cell, add a small amount (roughly the size of a grain of millet) of Welch UHPLC-XB-C18 resin into methanol and make a slurry. v. Using the column packing setup, pack the 3-lm Luna C18 resin into a 75-lm ID analytical column with a pulled tip for a length of *2 cm, then switch the resin to 1.8-lm Welch UHPLC-XB-C18, and pack another 10-12 cm. The total length of the reverse phase is 13 ± 1 cm. vi. Condition the column by running a RP gradient through it and ending with Buffer A wash.

RP trap column
i. Take from the Frit kit 60 ll Kasil-1624 and 20 ll Kasil-1, mix well in a small tube, then add 20 ll formamide, mix well. ii. Cut a 20-cm-long 75-lm ID fused silica tubing, dip one end into the mixture just made, and then pull out immediately. Inspect the column for the appearance of a segment of liquid inside. iii. Place the column in an oven of 100°C for 4-12 h to obtain a porous frit at one end of the column. Before polymerization, handle the column with great care to avoid displacing the liquid away from the end. Keep fritted columns at RT for long-term storage. iv. Before packing a column, cut the frit end with a tubing cutter so only 1-2 mm frit is left. v. In a tube or a glass vial that will fit inside the pressure injection cell, add a small amount (roughly the size of a grain of millet) of Luna C18 resin into methanol and make a slurry. vi. Using the column packing setup, pack the 10-lm YMC*GEL C18 resin into an empty, 75-lm ID column against the frit for a length of 7 ± 1 cm. vii. Condition the trap column by passing Buffer A through it.

SCX fractionation column
i. Take from the Frit kit 60 ll Kasil-1624 and 20 ll Kasil-1, mix well in a small tube, then add 20 ll formamide, mix well. ii. Cut a 25-cm-long 200-lm ID fused silica tubing, dip one end into the mixture just made, and then pull out immediately. Inspect the column for the appearance of a segment of liquid inside. iii. Place the column in an oven of 100°C for 4-12 h to obtain a porous frit at one end of the column. Before polymerization, handle the column with great care to avoid displacing the liquid away from the end. Keep fritted columns at RT for long-term storage. iv. Before packing a column, cut the frit end with a tubing cutter so only 1-2 mm frit is left. v. In a tube or a glass vial that will fit inside the pressure injection cell, add a small amount (roughly the size of a grain of millet) of Luna SCX resins into methanol and make a slurry. vi. Using the column packing setup, pack the SCX resins into an empty column against the frit for a length of 2-3 cm. vii. Pass Buffer A through the column to pack the SCX segment more tightly. viii. Change the packing material to 3-lm Luna C18 resin and pack a RP segment of 2-3 cm. ix. Condition the SCX column by passing Buffer A through it.

SOFTWARE
• Computer workstation with Microsoft Windows 7 or a newer operating system • NET framework 4.5 • MSFileReader, both 32 and 64-bit version, for pLink 2 to access the raw file • pLink 2, version 2.3.0 • Python 3.6

REAGENT SETUP 1 mol/L NEM stock solution
Dissolve 125.1 mg of solid N-ethylmaleimide in 1 ml of 100% ACN, dispense into 4-ll aliquots, store in a desiccator at -20°C, and use within two months. Each aliquot is for a single use.
8 mol/L urea in 100 mmol/L Tris pH 6.5 Dissolve 240 mg of solid urea in 100 ll 0.5 mol/L Tris at pH 6.5 and 220 ll H 2 O. Prepare the fresh solution each time to minimize urea degradation and subsequent carbamylation of proteins or peptides.

Stock solutions of proteases
Prepare 0.5 lg/lL trypsin or a different protease in H 2 O or a stock buffer specified by the vendor, dispense into aliquots of 10 ll or less, and store at -80°C. Ideally, each aliquot is for a single use.

LC-MS
On Q-Exactive, generate a MS method as follows: spray voltage 2.0-2.3 kV, data-dependent mode, full scan resolution 140,000, MS2 scan resolution 17,500, isolation window 2.0 m/z, AGC target at 1e6 for FTMS full scan and 5e4 for MS2, minimal signal threshold for MS2 at 4e4; normalized collision energy at 27%; peptide match preferred, and HCD spectra were collected for the ten most intense precursors carrying ?3, ?4,…, or ?7 positive charges, dynamic exclusion 60 s. To increase the identification of loop-linked disulfide bonds, a technical repeat run is recommended in which ?2 precursors were also included. So, generate another MS method that is the same as above except that ?2 precursors are not excluded.
On Easy-nLC 1000 UHPLC, set the sample loading and RP gradient method as follows. For each sample, 0.5 lg of digested peptides are loaded onto the trap column at 1 ll/min and desalted with 10 ll of Buffer A. The peptides are separated through a 100 min linear gradient from 100% Buffer A to 30% Buffer B and then going up to 100% Buffer B in 1 min, followed by a 3-min 100% Buffer B wash before returning to 100% Buffer A in 2 min and maintaining at 100% Buffer A for 4 min. Set the flow rate at 250 nl/min.
[CRITICAL] License is required to run pLink 2 the first time. To receive the license file, send an e-mail to pLink@ict.ac.cn. Follow instructions during installation. The pLink 2 installation package includes pParse, a MS data conversion tool, and pLabel-a very convenient and powerful tool for annotating peaks in a MS2 spectrum. Once pLink 2 is installed, pParse and pLabel are ready.
[CRITICAL] Here we provide a python script to organize the pLink 2 search results and generate a simple and informative report file. To run this script, Python 3.6 and packages openpyxl and xlrd are required.

In-solution digestion (low-complexity samples) [TIMING~1 day]
Note: Low-complexity samples refer to a purified protein or a protein complex of no more than 50 subunits.
(1) Determine the concentration of a freshly purified protein sample using a BCA protein assay kit. Do not rely on OD 280 measurements.
(2) Based on the sequences of the proteins in question and the amount of proteins available, decide which proteases to use and how many digestions to carry out. Common choices are Lys-C/trypsin, Lys-C/trypsin/ Glu-C, Lys-C/Asp-N, and Lys-C/trypsin/Asp-N; and additional options include Lys-C/elastase, subtilisin, and proteinase K. (3) For each digestion, take 4 lg of a freshly prepared protein sample and precipitate with 25% TCA. Specifically, add 1/3 volume of 100% TCA, mix well, and leave on ice for 30 min to overnight. Spin at 4°C in a bench top centrifuge at top speed for 30 min to pellet proteins, wash with 0.5 ml cold acetone twice, and air dry the pellet. (4) Dissolve 4 lg of the freshly precipitated protein sample in 10 ll of 8 mol/L urea, 100 mmol/L Tris, pH 6.5; add NEM to a final concentration 2 mmol/L; and incubate at 37°C for 2 h.
[CRITICAL] NEM can gradually hydrolyze in water and lose activity, so 15 min before use, transfer a frozen aliquot of 1 mol/L NEM to a desiccator at RT.
[? TROUBLESHOOTING] (5) Digest the protein(s) with one or more proteases of choice. The digestion conditions of various proteases are listed in Table 2. If two or more proteases are to be combined in one digestion, perform the digestion sequentially to reduce mutual digestion between proteases. For example, if a sample is to be digested with Lys-C/trypsin/ Glu-C, digest with Lys-C first in 8 mol/L urea at 37°C for 2 h; then dilute to 2 mol/L urea with 30 ll of 100 mmol/L Tris, pH 6.5, add trypsin, and incubate at 37°C for 12 h; and lastly, dilute to 1 mol/L urea with 40 ll of 100 mmol/L Tris, pH 6.5, add Glu-C, and incubate at 37°C for 12 h.
[CRITICAL] As disulfide-linked proteins are often resistant to proteases, digestion in the presence of a high concentration of denaturant is necessary. Therefore, Lys-C digestion in 8 mol/L urea usually precedes other proteases except for subtilisin and proteinase K, both of which have activity high enough for digesting any protein, even at pH 6.5. Avoid under-or over-digestion.
[? TROUBLESHOOTING] (6) To remove glycosylation, which may interfere with data analysis, add PNGase F (112 NEB units per 6 lg of proteins) to the digest and incubate at 37°C for 2 h.
[CRITICAL] Many secreted proteins are glycosylated and disulfide-linked. Unexpected glycosylation prevents identification of disulfide bonds. PNGase F remains its activity in 2 mol/L urea or 1 mol/L GndCl but loses most of its activity in 2 mol/L GndCl, so dilute the proteinase K digest with an equal volume of 100 mmol/L Tris, pH 6.5 before adding PNGase F. (7) Quench the reaction by adding 90% FA to a 5% final concentration. For LC-MS analysis of a low-complexity sample, load 0.2-0.5 lg of protein for a single run.
[PAUSE POINT] The samples can be stored up to several weeks at -20°C or -80°C before LC-MS analysis.

In-solution digestion (high-complexity samples) [TIMING 1-2 days]
Note: High-complexity samples refer to whole-cell lysates, subcellular fractions, or crude immunoprecipitated proteins that contain hundreds or thousands of proteins.
(1) Precipitate 30 lg of a freshly prepared protein sample with 25% TCA and wash with cold acetone as described above. For details, see ''In-solution digestion (low-complexity samples)''.
(2) Dissolve precipitated proteins in 25 ll of 8 mol/L urea, 100 mmol/L Tris, pH 6.5; add NEM to a final concentration of 2 mmol/L; and incubate at 37°C for 2 h.
[CRITICAL] Avoid using proteases of poor specificity to digest high-complexity samples as it will lead to search space explosion in data analysis, reducing the speed and sensitivity of identification.

PROTOCOL
In our experience, adding Glu-C on top of Lys-C/ trypsin digestion significantly increases the number of disulfide bond identifications. If desired, Lys-C/trypsin/Asp-N is another option. (4) Quench the digestion by adding FA to a final concentration of 5%. (5) Using a pressure injection cell, load digested peptides onto a SCX fractionation column. For better flow-rate control, connect the fritted end of the SCX column to an empty 75-lm ID, 360-lm OD fused silica tubing (with a pulled tip if necessary) with a MicroTight union. Wash the column with 15 ll of 0.1% FA, followed by 15 ll 80% ACN, 0.1% FA, then 10 ll of Buffer A, at a flow rate of 1 ll/min. Now the peptides are bound to SCX resin and ready to be fractionated. (6) Elute sequentially with 20 ll of 5% Buffer B (5% ACN, 0.1% FA) containing 25, 50, 75, 100, 500, or 1000 mmol/L ammonium acetate, pH 2-3, at a flow rate of 1.0-2.0 ll/min. Collect each of the six fractions into an Eppendorf tube. Load one-fifth of each fraction for a subsequent reverse-phase LC-MS run.
[PAUSE POINT] The samples can be stored up to several weeks at -20°C or -80°C before LC-MS analysis.

In-gel digestion (protein bands of interest) [TIMING 1-2 days]
[CRITICAL] To maintain protein disulfide bonds during SDS-PAGE analysis, reducing reagents are forbidden and 20 mmol/L NEM must be present in the sample loading buffer.
(1) Excise the gel band of interest and dice into 1 mm 3 pieces.
(2) Destain the gel with 50% methanol and wash with ddH 2 O twice. Then dehydrate with 100% acetonitrile. (3) Rehydrate into 100 mmol/L Tris, pH 6.5 containing 0.5 mmol/L NEM, 5 ng/lL Lys-C, and 10 ng/lL of another protease of choice-trypsin, Glu-C, or Asp-N. (4) Digest for 12 h at 37°C. (5) Extract peptides with 50-100 ll of extraction Buffer I (50% ACN, 0.5% FA) and then with 50-100 ll of extraction Buffer II (75% ACN, 1% FA). (6) Concentrate the sample to 4-8 ll in a SpeedVac TM concentrator at 2.5 Torr, RT. If the sample dries out accidently, reconstitute the sample with 6 ll of 0.1% FA, 1% ACN. Calculate or estimate the amount of protein in the gel band and prepare to load about 0.2 lg for LC-MS analysis. If this is not practical, load 1/5 of the sample.
[PAUSE POINT] The samples can be stored up to several weeks at -20°C or -80°C before LC-MS analysis.

LC-MS ANALYSIS [TIMING~1-30 h, DEPENDING ON SAMPLE COMPLEXITY]
[CRITICAL] Each sample is to be analyzed twice; reject 2? precursor ions in one (to increase identification of inter-linked peptides) but not in the other (to increase identification of loop-linked peptides).
(1) Connect the trap column and the analytical column to an Easy-nLC 1000 UHPLC according to the 2-column setup scheme. Cut the tail end of each column to reduce dead volumes. Import .raw files and set up the extraction parameters (Fig. 5) [CRITICAL] Although pLink 2 can import ''.mgf'' files, we recommend the use of ''.raw'' files as input. The ''.mgf'' files extracted using other software tools may not be supported by pLink 2, and ''.ms2'' files are not allowed. (10) Click the ''Add'' button, choose the input file(s). (11) Make sure to uncheck ''Mixture Spectra''. The default setting is on, but for disulfide bond identification it is better to turn it off.
Set up the identification parameters ( Fig. 6) (12) Click to switch to the ''Identification'' panel. [CRITICAL] we recommend that you append the database of common contaminant proteins stored in pLink 2 to your ''fasta'' database file. (16) Select the enzymes that had been used to digest the samples, for example, ''Trypsin'' for Lys-C/ trypsin digestion, ''Glu-C.Trypsin'' for Lys-C/trypsin/Glu-C digestion, and ''non-specific'' for Lys-C/ elastase or protease K digestion.

RESULT ANALYSIS
(1) Rank the identification results by the cysteine sites. If a cysteine residue is found to form disulfide bonds with more than one cysteine residue, it could be a result of disulfide scrambling or false identification. Compare the E values, spectral counts, and intensities of these potentially conflicting identification results to try to distinguish a native disulfide bond from scrambled ones. This is Fig. 9 Find a representative spectrum of a disulfide-linked peptide or peptide pair for manual inspection Mapping disulfide bonds from purified proteins or complex protein mixtures PROTOCOL based on the assumption that the native disulfide bond is the major form, so it should have higher signal intensity, higher spectral counts, and a smaller E value, which indicates a higher confidence in identification.
(2) For purified proteins, such as pharmaceutical protein drugs, it is often required to map all the disulfide bonds in a protein. When some cysteines are missing from the disulfide identification results, one possibility is that they exist as free cysteines and the other is that they form disulfide bonds but these disulfide bonds have escaped identification. For the first possibility, one can expect free cysteines to be modified by NEM, so the corresponding modified linear peptides should be identifiable using conventional database search engines such as pFind. To find out whether the second possibility is true, we recommend reduction of disulfide bonds followed by alkylation and conventional database search to identify linear peptides; the cysteine-containing linear peptides can then be compared with those identified from the matched, non-reduced sample. In the case that a disulfide bond has escaped identification, first consider that the disulfide-containing peptide(s) may be too long or too short. Acting accordingly, choose proteases that will generate-concerning the cysteine residues in question-peptides of 6-20 (or better, 8-16) amino acids. Also, there may be an unknown modification near the missing disulfide bond. In this case, guess what it might be by sequence analysis with the help of modification prediction software, or use open search tools such as pFind 3.0 or Peaks to find possible modifications. Then, add the variable modification in pLink 2 search and see if it helps. Lastly, manual spectrum interpretation may help, but it requires a lot of time and experience.
[? TROUBLESHOOTING] Troubleshooting advice can be found in Table 3.
Human and animal rights and informed consent This article does not contain any studies with human or animal subjects performed by any of the authors.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.