mProphet: automated data processing and statistical validation for large-scale SRM experiments

Reiter, Lukas; Rinner, Oliver; Picotti, Paola; Hüttenhain, Ruth; Beck, Martin; Brusniak, Mi-Youn; Hengartner, Michael O; Aebersold, Ruedi

doi:10.1038/nmeth.1584

mProphet: automated data processing and statistical validation for large-scale SRM experiments

Article
Published: 20 March 2011

Volume 8, pages 430–435, (2011)
Cite this article

From

View current issue Submit your manuscript

Lukas Reiter^1,2,3,4^na1,
Oliver Rinner^1,2^na1,
Paola Picotti²^nAff8,
Ruth Hüttenhain^2,5,
Martin Beck²^nAff8,
Mi-Youn Brusniak⁶,
Michael O Hengartner³ &
…
Ruedi Aebersold^2,5,7

9411 Accesses
379 Citations
18 Altmetric
2 Mentions
Explore all metrics

This article has been updated

Abstract

Selected reaction monitoring (SRM) is a targeted mass spectrometric method that is increasingly used in proteomics for the detection and quantification of sets of preselected proteins at high sensitivity, reproducibility and accuracy. Currently, data from SRM measurements are mostly evaluated subjectively by manual inspection on the basis of ad hoc criteria, precluding the consistent analysis of different data sets and an objective assessment of their error rates. Here we present mProphet, a fully automated system that computes accurate error rates for the identification of targeted peptides in SRM data sets and maximizes specificity and sensitivity by combining relevant features in the data into a statistical model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 1: Structure of SRM data and definition of terms.**

**Figure 2: Generation of a gold-standard data set with assigned true peak groups.**

**Figure 3: Combining features improves the separation of true and false peak groups.**

**Figure 4: Separation of true from false peak group signals in a total human u2os cell line lysate using decoy transitions and mProphet scoring.**

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics

Article 27 October 2016

From Raw Data to Biological Discoveries: A Computational Analysis Pipeline for Mass Spectrometry-Based Proteomics

Article 22 May 2015

False Discovery Rate Estimation in Proteomics

Change history

06 April 2011
In the version of this article initially published online, a 'greater than' sign was inadvertently reversed, and an author contribution was incorrectly attributed. The error has been corrected for the print, PDF and HTML versions of this article.

References

Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).
Article Google Scholar
Picotti, P., Bodenmiller, B., Mueller, L.N., Domon, B. & Aebersold, R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 138, 795–806 (2009).
Article CAS Google Scholar
Wolf-Yadlin, A., Hautaniemi, S., Lauffenburger, D.A. & White, F.M. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc. Natl. Acad. Sci. USA 104, 5860–5865 (2007).
Article CAS Google Scholar
Anderson, L. & Hunter, C.L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteomics 5, 573–588 (2006).
Article CAS Google Scholar
Jovanovic, M. et al. A quantitative targeted proteomics approach to validate predicted microRNA targets in C. elegans. Nat. Methods 7, 837–842 (2010).
Article CAS Google Scholar
Oberg, A.L. & Vitek, O. Statistical design of quantitative mass spectrometry-based proteomic experiments. J. Proteome Res. 8, 2144–2156 (2009).
Article CAS Google Scholar
Addona, T.A. et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat. Biotechnol. 27, 633–641 (2009).
Article CAS Google Scholar
Whiteaker, J.R. et al. Integrated pipeline for mass spectrometry-based discovery and confirmation of biomarkers demonstrated in a mouse model of breast cancer. J. Proteome Res. 6, 3962–3975 (2007).
Article CAS Google Scholar
Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).
Article CAS Google Scholar
Keshishian, H. et al. Quantification of cardiovascular biomarkers in patient plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 8, 2339–2349 (2009).
Article CAS Google Scholar
Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2007).
Article CAS Google Scholar
Deutsch, E.W., Lam, H. & Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 9, 429–434 (2008).
Article CAS Google Scholar
Lange, V. et al. Targeted quantitative analysis of Streptococcus pyogenes virulence factors by multiple reaction monitoring. Mol. Cell. Proteomics 7, 1489–1500 (2008).
Article CAS Google Scholar
Picotti, P. et al. A database of mass spectrometric assays for the yeast proteome. Nat. Methods 5, 913–914 (2008).
Article CAS Google Scholar
Fusaro, V.A., Mani, D.R., Mesirov, J.P. & Carr, S.A. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190–198 (2009).
Article CAS Google Scholar
Sherwood, C. et al. MaRiMba: a software application for spectral library-based MRM transition list assembly. J. Proteome Res. 8, 4396–4405 (2009).
Article CAS Google Scholar
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
Article CAS Google Scholar
Prakash, A. et al. Expediting the development of targeted SRM assays: using data from shotgun proteomics to automate method development. J. Proteome Res. 8, 2733–2739 (2009).
Article CAS Google Scholar
Abbatiello, S.E., Mani, D.R., Keshishian, H. & Carr, S.A. Automated detection of inaccurate and imprecise transitions in peptide quantification by multiple reaction monitoring mass spectrometry. Clin. Chem. 56, 291–305 (2010).
Article CAS Google Scholar
Stahl-Zeng, J. et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 6, 1809–1817 (2007).
Article CAS Google Scholar
Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
Article CAS Google Scholar
Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
Article CAS Google Scholar
Kall, L., Canterbury, J.D., Weston, J., Noble, W.S. & MacCoss, M.J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
Article Google Scholar
Reiter, L. et al. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol. Cell. Proteomics 8, 2405–2417 (2009).
Article CAS Google Scholar
Picotti, P. et al. High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat. Methods 7, 43–46 (2010).
Article CAS Google Scholar
Moore, R.E., Young, M.K. & Lee, T.D. Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 13, 378–386 (2002).
Article CAS Google Scholar
Sherman, J., McKay, M.J., Ashman, K. & Molloy, M.P. How specific is my SRM?: The issue of precursor and product ion redundancy. Proteomics 9, 1120–1123 (2009).
Article CAS Google Scholar
Choi, H. & Nesvizhskii, A.I. Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. J. Proteome Res. 7, 254–265 (2008).
Article CAS Google Scholar
Hilpert, K., Winkler, D.F. & Hancock, R.E. Peptide arrays on cellulose support: SPOT synthesis, a time and cost efficient method for synthesis of large numbers of peptides in a parallel and addressable fashion. Nat. Protoc. 2, 1333–1349 (2007).
Article CAS Google Scholar
Wenschuh, H. et al. Coherent membrane supports for parallel microsynthesis and screening of bioactive peptides. Biopolymers 55, 188–206 (2000).
Article CAS Google Scholar
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Article CAS Google Scholar
Kim, S., Gupta, N. & Pevzner, P.A. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008).
Article CAS Google Scholar
Ong, S.E. et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 (2002).
Article CAS Google Scholar
Gerber, S.A., Rush, J., Stemman, O., Kirschner, M.W. & Gygi, S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100, 6940–6945 (2003).
Article CAS Google Scholar
Pedrioli, P.G. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22, 1459–1466 (2004).
Article CAS Google Scholar
Keller, A., Eng, J., Zhang, N., Li, X.J. & Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005).
Article Google Scholar
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Article CAS Google Scholar
R Development Core Team. R: A Language and Environment for Statistical Computing (2008).

Download references

Acknowledgements

We thank J. Malmström and M. Jovanovic for providing the samples that were used as background matrix in the gold-standard data set, M. Jovanovic for careful reading of the manuscript, A. Srebniak for help in generating a software package, and H. Wenschuh. We acknowledge M. Claassen for discussions on machine learning. This work was supported by grants from the Forschungskredit of the University of Zurich, University of Zurich Research Priority Program in Systems Biology and Functional Genomics, GEBERT-RÜF Stiftung and Swiss National Science Foundation (grant 31000-10767), with funds from the US National Heart, Lung, and Blood Institute and the US National Institutes of Health (contract N01-HV-28179), and by SystemsX.ch, the Swiss initiative for systems biology.

Author information

Paola Picotti & Martin Beck
Present address: Present addresses: Institute of Biochemistry, Department of Biology, ETH Zurich, Zurich, Switzerland (P.P.) and European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany (M.B.).,
Lukas Reiter and Oliver Rinner: These authors contributed equally to this work.

Authors and Affiliations

Biognosys AG, Zurich, Switzerland
Lukas Reiter & Oliver Rinner
Department of Biology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology (ETH) Zurich, Zurich, Switzerland
Lukas Reiter, Oliver Rinner, Paola Picotti, Ruth Hüttenhain, Martin Beck & Ruedi Aebersold
Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
Lukas Reiter & Michael O Hengartner
PhD Program in Molecular Life Sciences Zurich, Zurich, Switzerland
Lukas Reiter
Competence Center for Systems Physiology and Metabolic Diseases, Zurich, Switzerland
Ruth Hüttenhain & Ruedi Aebersold
Institute for Systems Biology, Seattle, Washington, USA
Mi-Youn Brusniak
Faculty of Science, University of Zurich, Zurich, Switzerland
Ruedi Aebersold

Authors

Lukas Reiter
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Rinner
View author publications
You can also search for this author in PubMed Google Scholar
Paola Picotti
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Hüttenhain
View author publications
You can also search for this author in PubMed Google Scholar
Martin Beck
View author publications
You can also search for this author in PubMed Google Scholar
Mi-Youn Brusniak
View author publications
You can also search for this author in PubMed Google Scholar
Michael O Hengartner
View author publications
You can also search for this author in PubMed Google Scholar
Ruedi Aebersold
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.R., O.R., P.P., M.-Y.B. and R.A. designed the gold-standard data set. P.P. carried out the measurements on the gold-standard data set. L.R., O.R. and R.A. wrote the paper. L.R. and O.R. wrote the software and did the data analysis. L.R. did most of the statistical data analysis. R.H. contributed to the experiment involving the human plasma N-glycopeptide-enriched samples. M.B. contributed to the experiment involving the human u2os cell line. M.O.H. provided critical input on the project. R.A. supervised the project.

Corresponding author

Correspondence to Ruedi Aebersold.

Ethics declarations

Competing interests

O.R. and L.R. are employees of Biognosys AG. This company funded parts of the work.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12, Supplementary Table 1, Supplementary Results and Supplementary Note (PDF 7030 kb)

Supplementary Data 1

Table of transitions, table of peak groups, table with identification statistics and classifier of the gold standard data set analysis. The transitions sheet contains the precursor m/z (Q1), fragment ion m/z (Q3), an id that groups the transitions according to precursor (transition group id), an id for the transition (transition id), a string describing the isotopic labeling of the peptide (isotype), the collision energy used (CE), the expected retention time used for scheduled SRM (tR), the expected relative intensity of the fragment ions (relative intensity %), a string indicating whether the transition is a decoy or target (decoy) and an id to group corresponding target and decoy transition groups (target decoy transition group id). The mProphet peak groups sheet contains a row for each peak group. The most important columns are an id for a transition group measurement (transition_group_record), the features used for scoring (all columns starting with main_var or var_), a column indicating the dilution of the synthetic peptides in the specific matrix (dilution), the species used for the background matrix (background), the class of the peak group in terms of identity as determined by the dilution alignment (real_class), a boolean indicating whether the peak group was derived from decoy or target transitions (real_decoy), a boolean indicating whether treated as decoy or target in the mProphet analysis (decoy) and the mProphet discrimination score (d_score). The mProphet all peak groups sheet contains the all peak groups of the analysis, not only the ones that rank highest in one transition group record (peak_group_rank). The mProphet stat sheet relates the mProphet discrimination score (cutoff) to the false discovery rate (FDR) and the sensitivity (sens). The mProphet classifier weight sheet contains the weights that were determined using the semi-supervised learning approach. (XLS 2515 kb)

Supplementary Data 2

Table of transitions, table of peak groups, table with identification statistics and classifier of the human u2os cell line analysis. For a detailed description of the sheets see Supplementary Data 1 legend. (XLS 3791 kb)

Supplementary Data 3

Table of transitions, table of peak groups, table with identification statistics and classifier of the human plasma analysis. For a detailed description of the sheets see Supplementary Data 1 legend. (XLS 1166 kb)

Supplementary Data 4

Table of transitions and peak groups for the measurement of yeast target and decoy transitions in human plasma. The transitions sheet contains target transitions of yeast peptides and corresponding decoy transitions generated by two different decoy transition generation algorithms (ADD_RANDOM and REVERSE_PEP_AND_INCREASE_Q1). The mQuest peak groups sheet contains the data processed with mQuest. The mProphet analysis does result in meaningful results since the data contains no positive target measurements. For a detailed description of the sheets see Supplementary Data 1 legend. (XLS 675 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reiter, L., Rinner, O., Picotti, P. et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat Methods 8, 430–435 (2011). https://doi.org/10.1038/nmeth.1584

Download citation

Received: 28 April 2010
Accepted: 11 February 2011
Published: 20 March 2011
Issue Date: May 2011
DOI: https://doi.org/10.1038/nmeth.1584
Springer Nature America, Inc.

This article is cited by

SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics
- Qingzu He
- Huan Guo
- Jianwei Shuai
Interdisciplinary Sciences: Computational Life Sciences (2024)
Proteomic profiling of protein expression changes after 3 months-exercise in ESRD patients on hemodialysis
- Hye Yun Jeong
- Hyun-Ju An
- So-Young Lee
BMC Nephrology (2023)
Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics
- Ronghui Lou
- Ye Cao
- Wenqing Shui
Nature Communications (2023)
Exploring the multi-level regulation of lignocellulases in the filamentous fungus Trichoderma guizhouense NJAU4742 from an omics perspective
- Yanwei Xia
- Jingfan Wang
- Youzhi Miao
Microbial Cell Factories (2022)
Proteomics-based diagnostic peptide discovery for severe fever with thrombocytopenia syndrome virus in patients
- Sang-Yeop Lee
- Hayoung Lee
- Seung Il Kim
Clinical Proteomics (2022)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

mProphet: automated data processing and statistical validation for large-scale SRM experiments

From

Abstract

Access this article

Similar content being viewed by others

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics

From Raw Data to Biological Discoveries: A Computational Analysis Pipeline for Mass Spectrometry-Based Proteomics

False Discovery Rate Estimation in Proteomics

Change history

06 April 2011

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Rights and permissions

About this article

Cite this article

This article is cited by

SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics

Proteomic profiling of protein expression changes after 3 months-exercise in ESRD patients on hemodialysis

Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics

Exploring the multi-level regulation of lignocellulases in the filamentous fungus Trichoderma guizhouense NJAU4742 from an omics perspective

Proteomics-based diagnostic peptide discovery for severe fever with thrombocytopenia syndrome virus in patients

Navigation

mProphet: automated data processing and statistical validation for large-scale SRM experiments

Abstract

Access this article

Similar content being viewed by others

Change history

06 April 2011

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation