Software Lock Mass by Two-Dimensional Minimization of Peptide Mass Errors
- 2.9k Downloads
Mass accuracy is a key parameter in proteomic experiments, improving specificity, and success rates of peptide identification. Advances in instrumentation now make it possible to routinely obtain high resolution data in proteomic experiments. To compensate for drifts in instrument calibration, a compound of known mass is often employed. This ‘lock mass’ provides an internal mass standard in every spectrum. Here we take advantage of the complexity of typical peptide mixtures in proteomics to eliminate the requirement for a physical lock mass. We find that mass scale drift is primarily a function of the m/z and the elution time dimensions. Using a subset of high confidence peptide identifications from a first pass database search, which effectively substitute for the lock mass, we set up a global mathematical minimization problem. We perform a simultaneous fit in two dimensions using a function whose parameterization is automatically adjusted to the complexity of the analyzed peptide mixture. Mass deviation of the high confidence peptides from their calculated values is then minimized globally as a function of both m/z value and elution time. The resulting recalibration function performs equal or better than adding a lock mass from laboratory air to LTQ-Orbitrap spectra. This ‘software lock mass’ drastically improves mass accuracy compared with mass measurement without lock mass (up to 10-fold), with none of the experimental cost of a physical lock mass, and it integrated into the freely available MaxQuant analysis pipeline (www.maxquant.org).
Key wordsMass accuracy MaxQuant Proteomics Database search Orbitrap Peptide mass measurement Lock mass
Mass spectrometry (MS)-based proteomics [1, 2, 3] greatly benefits from high resolution and high mass accuracy measurements . For example, resolving co-eluting peptides of similar mass is a prerequisite for their accurate quantification, and high accuracy measurement of peptide masses greatly aid in their identification by providing stringent filters on possible candidates. Several definitions of mass accuracy are commonly used, and this important parameter is often only assessed anecdotally . In proteomics, the operationally important definition is the best mass estimate from the MS measurement together with a statistical confidence interval. This interval can then be used as the basis for setting a permissible mass deviation window for peptide identification in databases. Such confidence intervals can be assigned to each peptide separately. They are obtained from the measured values from consecutive scans and isotope states, weighted by the signal for each data point. We have previously described principles of extracting these mass values from large scale data sets and implemented the corresponding algorithms in the MaxQuant computational proteomics analysis pipeline [4, 6]. As a result of applying these computational algorithms, these peptide mass accuracies are frequently improved to the sub-ppm range. This makes the precursor mass value an important search parameter and allows a corresponding drop in the required quality of the MS/MS spectra while still maintaining a 1% false discovery rate for peptide identifications.
A precondition for the above analysis was the elimination of the systematic mass drift by using a lock mass [7, 8]. A lock mass is a defined compound of known composition that is added to the MS analysis. Some instruments feature a separate electrospray source, which is used to spray the reference compound [9, 10]. Alternatively, the reference compound could be mixed into the analyte directly, but this has disadvantages because the compound may interfere with analysis of low abundance samples or it may not be detectable in high abundance samples. In electrospray, charged droplets are formed in laboratory air and analyte ions are desorbed from them. However, these charged droplets can also absorb and ionize background chemicals that are always present in laboratory air [11, 12]. On the LTQ-Orbitrap family of instruments these ions, specifically polycyclodimethylsiloxanes, can be separately isolated in the linear ion trap and injected into the C-trap, which is an intermediate storage trap . In the C-trap the lock mass ions are mixed with the MS or MS/MS ions to be analyzed and co-injected into the Orbitrap analyzer. The ion is recognized by the data system in real time and the mass scale is automatically adjusted. While this procedure is sufficiently fast to be routinely applicable in proteomic experiments, there is some time requirement for isolating the lock mass ions, adding to overall MS and MS/MS cycle times. In addition, it can often be desirable to suppress background ions in laboratory air (i.e., by the ABIRD device: www.esisourcesolutions.com). This has the side effect that the lock mass is no longer available. For these reasons, an alternative to the lock mass would be beneficial.
In proteomics experiments, typically hundreds or thousands of peptides are identified in every LC-MS/MS run. Many of these peptides have very information-rich MS/MS spectra, and they can be unambiguously identified even with large mass tolerances. We have previously made use of this fact by implementing a two-pass search, where the top identified peptides serve as mass references for calibration . However, this recalibration was done globally for the entire LC-MS run, was only applicable to time of flight data and did not attempt to reach sub-ppm mass accuracy. For Orbitrap data, the simple mass scale adjustments  would not be applicable. In this paper we set out to develop algorithms to replace the physical lock mass with a software algorithm that performs at least as well in global recalibration of Orbitrap data.
2.1 Protein Digestion
Total HeLa cell lysate was treated with a urea (6 M) and thiourea (2 M) solution followed by reduction with dithiothreitol (DTT) (1 mM) for 30 min and alkylation with iodoacetamide (IAA) (55 mM) for 20 min at room temperature. The proteins were digested with Lys-C (1 μg/50 μg protein) (Wako, Neuss, Germany) for 3 h at room temperature. The mixture was diluted with water (1:4) before incubation with trypsin (1 μg/50 μg protein) (Promega, Mannheim, Germany) for 12 h at room temperature. The digestion was stopped by addition of formic acid (3 %) and the samples stored on StageTips .
2.2 LC-MS/MS Analysis
The peptide mixture was loaded onto a C18-reversed phase column (15 cm long, 75 μm i.d.) that was packed in-house with ReproSil-Pur C18-AQ 3 μm resin (Dr. Maisch) in buffer A (0.5% acetic acid). The peptide mixture was separated with a linear gradient of 5%–60% buffer B (80% ACN and 0.5% acetic acid) at a flow rate of 250 nL/min on a nanoflow HPLC (Proxeon Easy HPLC; Thermo Fisher Scientific). On-line coupling of the HPLC system to an LTQ Orbitrap XL mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) was achieved using a nanoelectrospray ion source (Proxeon Biosystems, now Thermo Fisher Scientific). Data were acquired in a data-dependent ‘top5’ format, selecting the most abundant precursor ions from the survey scan (mass range 300–1650 Th) in order to isolate them in the linear ion trap and fragment them by CID with a normalized collision energy of 35 eV. Survey scans were acquired with a resolution of 60,000 at m/z 400 and with a target value of 106 in the Orbitrap analyzer. The MS/MS scans were acquired with unit mass resolution in the LTQ using 3000 as target value. Dynamic exclusion was defined by a list size of 500 features and exclusion duration of 90 s. Early expiration was set to expiration count 3 and S/N threshold 3. The lower threshold for targeting a precursor ion in the MS scans was 1000 counts. Three technical replicates were acquired without using the lock mass option in Xcalibur. In three separate technical replicates protonated polycyclodimethylsiloxane (PCM-6) with exact m/z 445.1200 Th was selected as lock mass for the measurement .
Data was analyzed by MaxQuant  using the Andromeda search engine . The IPI human data base was used for peptide identification in the IPI human data base (containing 87,061 entries) combined with 262 common contaminants and concatenated with the reversed versions of all sequences. Enzyme specificity was set to trypsin, allowing cleavage N-terminal to proline. Further modifications were cysteine carbamidomethylation (fixed) as well as protein N-terminal acetylation and methionine oxidation (variable).
2.3 Computational Methods
Data were analyzed with the MaxQuant framework , which is written in C# in the Microsoft .NET environment. Algorithmic parts of MaxQuant are available as source code and the entire program can be freely downloaded as well from www.maxquant.org. Detailed instructions for installation and support programs are also available .
3 Results and Discussion
3.1 Time and m/z Dependence of the Mass Error
3.2 The Software Lock Mass Optimization Problem
Note that this equation does not assume linearity in any of the variables or parameters, but only that the contribution of each variable can be represented as a sum of nonlinear terms. The explicit form of the parameterization of the functions f and g are described below.
For this purpose, the functions f and g have to be parameterized in a suitable way. We use piecewise linear functions for f and g. First, the x-positions of these functions are adapted to the data, and they are then treated as constant during the minimization. The number of x-positions and their exact location are chosen such that the number of degrees of freedom adjusts itself to the complexity of the data. Roughly speaking, the more data are available the more complex the parameterizations of the functions can be. The x positions are chosen such that there are at least 80 data points per x position in the m/z direction and 50 data points per x position in the time direction. Furthermore, the x positions have to be at least 50 Th apart in the m/z direction. The numbers that are being determined during the optimization are the y-values at these fixed x positions, typically several dozens or hundreds of coefficients. The functions f and g are linearly interpolated between these positions. One of the y-values has to be fixed to an arbitrary value since the system otherwise has a zero mode. The numerical solution of this minimization problem is obtained by the Levenberg-Marquard method (see, e.g., reference  for an introduction). After the parameters of f and g have been determined, we can subtract the systematic mass error from the measured mass of each MS isotope pattern in the LC-MS run. Subsequently, the actual database search (second pass search) is performed with individualized peptide mass tolerances inside the MaxQuant framework as before.
3.3 Performance of the Software Lock Mass
Here we have investigated the concept of a software lock mass, a replacement for its physical version, which is integrated into the MaxQuant/Andromeda computational proteomics workflow. We have demonstrated that it performs as least as well as the physical lock mass on typical complex proteome data. Even data that were acquired with a lock mass may benefit from the application of our recalibration workflow, especially in cases where the lock mass performance was not optimal. In contrast to the hardware lock mass option, the software lock mass can correct nonlinearities in the mass scale. Here, we have demonstrated the method on an Orbitrap instrument. However, we speculate that other instrument types would also benefit from the software lock mass approach. For instance, mass calibration drift typically is an issue of practical importance for time of flight instruments. Furthermore, while shown here for MS spectra, the benefits of the software lock mass also carry over to high-resolution MS/MS spectra.
Importantly, use of the software lock mass is completely free from an experimental point of view. All it requires is a peptide mixture of sufficient complexity. In contrast, a physical lock mass, even if derived from laboratory air, always has some experimental cost, such as additional hardware, influence on the spectra, or a slight increase in cycle time. Since the software lock mass is an unmitigated benefit, it can be adopted for all proteomics experiments, as we have done for some time in our laboratory.
The authors thank all the other members of the Proteomics and Signal Transduction group for help and discussions. This work was partially supported by the European Union 7th Framework Program (HEALTH-F4-2008-201648/PROSPECTS).
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- 10.Nepomuceno, A.I., Muddiman, D.C., Bergen, H.R. 3rd, Craighead, J.R., Burke, M.J. , Caskey, P.E., Allan, J.A.: Dual electrospray ionization source for confident generation of accurate mass tags using liquid chromatography Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chem. 75(14), 3411–3418 (2003)Google Scholar
- 11.Fenn, J.: Proceedings of the 34th ASMS, Cincinnati, OH; p. 507–509 (1986)Google Scholar
- 14.Mortensen, P., Gouw, J.W., Olsen, J.V., Ong, S.E., Rigbolt, K.T., Bunkenborg, J., Cox, J., Foster, L.J., Heck, A.J., Blagoev, B., Andersen, J.S., Mann, M.: MSQuant, an open source platform for mass spectrometry-based quantitative proteomics. J. Proteome Res. 9(1), 393–403 (2010)CrossRefGoogle Scholar
- 16.Cox, J., Neuhauser, N., Michalski, A., Scheltema, R.A., Olsen, J.V., Mann, M.: Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10(4), 1794–1805 (2011)Google Scholar
- 17.Cox, J., Matic, I., Hilger, M., Nagaraj, N., Selbach, M., Olsen, J.V., Mann, M.: A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat. Protoc. 4(5), 698–705 (2009)Google Scholar
- 18.Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge, 683–688 (2007)Google Scholar
Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.