Bioinformatics Challenges in Mass Spectrometry-Driven Proteomics

  • Lennart MartensEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 753)


Mass spectrometry-based proteomics has become an essential part of the analytical toolbox of the life sciences. With the ability to identify and quantify hundreds to thousands of proteins in high throughput, the field has contributed its fair share to the data avalanche coming from the so-called omics fields. As a result, the challenges involved in processing and managing this flood of data have grown as well. This chapter will point out and discuss these challenges, starting from the processing of raw mass spectrometry data into peaks, over the identification of peptides and proteins, to the quantification of the identified molecules. Finally, the informatics aspects of the nascent field of targeted proteomics are outlined as well.

Key words

Bioinformatics databases mass spectrometry identification quantification 



False discovery rate


Mass-to-charge ratio


Mass spectrometry




Selected reaction monitoring



LM would like to thank Joël Vandekerckhove for his support.


  1. 1.
    Domon B. and Aebersold R. (2006) Mass spectrometry and protein analysis. Science 312, 212–217.PubMedCrossRefGoogle Scholar
  2. 2.
    Hubbard T., Aken B., Ayling S., Ballester B., Beal K., Bragin E. et al. (2009) Ensembl 2009. Nucleic Acids Res 37, D690–D697.PubMedCrossRefGoogle Scholar
  3. 3.
    The UniProt Consortium (2009) The universal protein resource (UniProt) 2009. Nucleic Acids Res 37, D169–D174.Google Scholar
  4. 4.
    Sadygov R.G., Cociorva D. and Yates J.R. (2004) Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book Nat Methods 1, 195–202.PubMedCrossRefGoogle Scholar
  5. 5.
    Gevaert K., Van Damme P., Ghesquière B., Impens F., Martens L., Helsens K. et al. (2007) A la carte proteomics with an emphasis on gel-free techniques. Proteomics 7, 2698–2718.PubMedCrossRefGoogle Scholar
  6. 6.
    Martens L., Nesvizhskii A.I., Hermjakob H., Adamski M., Omenn G.S., Vandekerckhove J. et al. (2005) Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories. Proteomics 5, 3501–3505.PubMedCrossRefGoogle Scholar
  7. 7.
    Martens L. and Hermjakob H. (2007) Proteomics data validation: Why all must provide data. Mol Biosyst 3, 518–522.PubMedCrossRefGoogle Scholar
  8. 8.
    Boyne M.T., Garcia B.A., Li M., Zamdborg L., Wenger C.D., Babai S. et al. (2009) Tandem mass spectrometry with ultrahigh mass accuracy clarifies peptide identification by database retrieval. J Proteome Res 8, 374–379.PubMedCrossRefGoogle Scholar
  9. 9.
    Vaudel M., Sickmann A. and Martens L. (2009) Peptide and protein quantification: A map of the minefield. Proteomics 10, 650–670.CrossRefGoogle Scholar
  10. 10.
    Cox J. and Mann M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26, 1367–1372.PubMedCrossRefGoogle Scholar
  11. 11.
    Zhang J., Xu D., Gao W., Lin G. and He S. (2009) Isotope pattern vector based tandem mass spectral data calibration for improved peptide and protein identification. Rapid Commun Mass Spectrom 23, 3448–3456.PubMedCrossRefGoogle Scholar
  12. 12.
    Zhang X., Asara J.M., Adamec J., Ouzzani M. and Elmagarmid A.K. (2005) Data pre-processing in liquid chromatography-mass spectrometry-based proteomics. Bioinformatics 21, 4054–4059.PubMedCrossRefGoogle Scholar
  13. 13.
    Kwon D., Vannucci M., Song J.J., Jeong J. and Pfeiffer R.M. (2008) A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise. Proteomics 8, 3019–3029.PubMedCrossRefGoogle Scholar
  14. 14.
    Renard B.Y., Kirchner M., Monigatti F., Ivanov A.R., Rappsilber J., Winter D. et al. (2009) When less can yield more – Computational preprocessing of MS/MS spectra for peptide identification. Proteomics 9, 4978–4984.PubMedCrossRefGoogle Scholar
  15. 15.
    Matthiesen R. (2007) Methods, algorithms and tools in computational proteomics: A practical point of view. Proteomics 7, 2815–2832.PubMedCrossRefGoogle Scholar
  16. 16.
    Nesvizhskii A.I., Vitek O. and Aebersold R. (2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods 4, 787–797.PubMedCrossRefGoogle Scholar
  17. 17.
    Perkins D.N., Pappin D.J., Creasy D.M. and Cottrell J.S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.PubMedCrossRefGoogle Scholar
  18. 18.
    Eng J.K., McCormack A.L. and Yates J.R. (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J Am Soc Mass Spectrom 5, 976–989.CrossRefGoogle Scholar
  19. 19.
    Craig R. and Beavis R. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics (Oxford, England) 20, 1466–1467.CrossRefGoogle Scholar
  20. 20.
    Geer L.Y., Markey S.P., Kowalak J.A., Wagner L., Xu M., Maynard D.M. et al. (2004) Open mass spectrometry search algorithm. J Proteome Res 3, 958–964.PubMedCrossRefGoogle Scholar
  21. 21.
    Tabb D.L., Fernando C.G. and Chambers M.C. (2007) MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6, 654–661.PubMedCrossRefGoogle Scholar
  22. 22.
    Kapp E.A., Schütz F., Connolly L.M., Chakel J.A., Meza J.E., Miller C.A. et al. (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis. Proteomics 5, 3475–3490.PubMedCrossRefGoogle Scholar
  23. 23.
    Keller A., Nesvizhskii A.I., Kolker E. and Aebersold R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search Anal Chem 74, 5383–5392.PubMedCrossRefGoogle Scholar
  24. 24.
    Käll L., Storey J.D. and Noble W.S. (2008) Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry Bioinformatics 24, 42–48.CrossRefGoogle Scholar
  25. 25.
    Helsens K., Timmerman E., Vandekerckhove J., Gevaert K. and Martens L. (2008) Peptizer, a tool for assessing false positive Peptide identifications and manually validating selected results Mol Cell Proteomics 7, 2364–2372.PubMedCrossRefGoogle Scholar
  26. 26.
    Martens L., Vandekerckhove J. and Gevaert K. (2005) DBToolkit: Processing protein databases for peptide-centric proteomics Bioinformatics 21, 3584–3585.PubMedCrossRefGoogle Scholar
  27. 27.
    Reidegeld K.A., Eisenacher M., Kohl M., Chamrad D., Körting G., Blüggel M. et al. (2008) An easy-to-use Decoy Database Builder software tool, implementing different decoy strategies for false discovery rate calculation in automated MS/MS protein identifications Proteomics 8, 1129–1137.PubMedCrossRefGoogle Scholar
  28. 28.
    Elias J.E. and Gygi S.P. (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry Nat Methods 4, 207–214.PubMedCrossRefGoogle Scholar
  29. 29.
    Shilov I.V., Seymour S.L., Patel A.A., Loboda A., Tang W.H., Keating S.P. et al. (2007) The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra Mol Cell Proteomics 6, 1638–1655.PubMedCrossRefGoogle Scholar
  30. 30.
    Käll L., Storey J.D., MacCoss M.J. and Noble W.S. (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases J Proteome Res 7, 29–34.PubMedCrossRefGoogle Scholar
  31. 31.
    Käll L., Storey J.D., MacCoss M.J. and Noble W.S. (2008) Posterior error probabilities and false discovery rates: Two sides of the same coin J Proteome Res 7, 40–44.PubMedCrossRefGoogle Scholar
  32. 32.
    Pevtsov S., Fedulova I., Mirzaei H., Buck C. and Zhang X. (2006) Performance evaluation of existing de novo sequencing algorithms J Proteome Res 5, 3018–3028.PubMedCrossRefGoogle Scholar
  33. 33.
    Pitzer E., Masselot A. and Colinge J. (2007) Assessing peptide de novo sequencing algorithms performance on large and diverse data sets Proteomics 7, 3051–3054.PubMedCrossRefGoogle Scholar
  34. 34.
    Kim S., Bandeira N. and Pevzner P.A. (2009) Spectral profiles, a novel representation of tandem mass spectra and their applications for de novo peptide sequencing and identification Mol Cell Proteomics 8, 1391–1400.PubMedCrossRefGoogle Scholar
  35. 35.
    Mann M. and Wilm M. (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags Anal Chem 66, 4390–4399.PubMedCrossRefGoogle Scholar
  36. 36.
    Tabb D.L., Saraf A. and Yates J.R. (2003) GutenTag: High-throughput sequence tagging via an empirically derived fragmentation model Anal Chem 75, 6415–6421.PubMedCrossRefGoogle Scholar
  37. 37.
    Dasari S., Chambers M.C., Slebos R.J., Zimmerman L.J., Ham A.L. and Tabb D.L. (2010) TagRecon: High-throughput mutation identification through sequence tagging J Proteome Res 9, 1716–1726.Google Scholar
  38. 38.
    Nesvizhskii A.I. and Aebersold R. (2005) Interpretation of shotgun proteomic data: The protein inference problem Mol Cell Proteomics 4, 1419–1440.PubMedCrossRefGoogle Scholar
  39. 39.
    Zhang B., Chambers M.C. and Tabb D.L. (2007) Proteomic parsimony through bipartite graph analysis improves accuracy and transparency J Proteome Res 6, 3549–3557.PubMedCrossRefGoogle Scholar
  40. 40.
    Nesvizhskii A.I., Keller A., Kolker E. and Aebersold R. (2003) A statistical model for identifying proteins by tandem mass spectrometry Anal Chem 75, 4646–4658.PubMedCrossRefGoogle Scholar
  41. 41.
    Ma Z., Dasari S., Chambers M.C., Litton M.D., Sobecki S.M., Zimmerman L.J. et al. (2009) IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering J Proteome Res 8, 3872–3881.PubMedCrossRefGoogle Scholar
  42. 42.
    Martens L., Muller M., Stephan C., Hamacher M., Reidegeld K.A., Meyer H.E. et al. (2006) A comparison of the HUPO Brain Proteome Project pilot with other proteomics studies Proteomics 6, 5076–5086.PubMedCrossRefGoogle Scholar
  43. 43.
    Fenselau C. (2007) A review of quantitative methods for proteomic studies. J Chromatogr B Analyt Technol Biomed Life Sci 855, 14–20.PubMedCrossRefGoogle Scholar
  44. 44.
    Bantscheff M., Schirle M., Sweetman G., Rick J. and Kuster B. (2007) Quantitative mass spectrometry in proteomics: A critical review Anal Bioanal Chem 389, 1017–1031.PubMedCrossRefGoogle Scholar
  45. 45.
    Nakamura T. and Oda Y. (2007) Mass spectrometry-based quantitative proteomics Biotechnol Genet Eng Rev 24, 147–163.PubMedGoogle Scholar
  46. 46.
    Mueller L.N., Brusniak M., Mani D.R. and Aebersold R. (2008) An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data J Proteome Res 7, 51–61.PubMedCrossRefGoogle Scholar
  47. 47.
    Jin S., Daly D.S., Springer D.L. and Miller J.H. (2008) The effects of shared peptides on protein quantitation in label-free proteomics by LC/MS/MS J Proteome Res 7, 164–169.PubMedCrossRefGoogle Scholar
  48. 48.
    Zhang Y., Wen Z., Washburn M.P. and Florens L. (2010) Refinements to label free proteome quantitation: How to deal with peptides shared by multiple proteins Anal Chem 82, 2272–2281.PubMedCrossRefGoogle Scholar
  49. 49.
    Colaert N., Helsens K., Impens F., Vandekerckhove J. and Gevaert K. (2010) Rover: A tool to visualize and validate quantitative proteomics data from different sources Proteomics 10, 1226–1229.Google Scholar
  50. 50.
    Muth T., Keller D., Puetz S.M., Martens L., Sickmann A. and Boehm A.M. (2010) jTraqX: A free, platform independent tool for isobaric tag quantitation at the protein level Proteomics 10, 1223–1225.Google Scholar
  51. 51.
    Lange V., Picotti P., Domon B. and Aebersold R. (2008) Selected reaction monitoring for quantitative proteomics: A tutorial Mol Syst Biol 4, 222.PubMedCrossRefGoogle Scholar
  52. 52.
    Walsh G.M., Lin S., Evans D.M., Khosrovi-Eghbal A., Beavis R.C. and Kast J. (2009) Implementation of a data repository-driven approach for targeted proteomics experiments by multiple reaction monitoring J Proteomics 72, 838–852.PubMedCrossRefGoogle Scholar
  53. 53.
    Sherwood C.A., Eastham A., Lee L.W., Peterson A., Eng J.K., Shteynberg D. et al. (2009) MaRiMba: A software application for spectral library-based MRM transition list assembly J Proteome Res 8, 4396–4405.PubMedCrossRefGoogle Scholar
  54. 54.
    Mead J.A., Bianco L., Ottone V., Barton C., Kay R.G., Lilley K.S. et al. (2009) MRMaid, the web-based tool for designing multiple reaction monitoring (MRM) transitions Mol Cell Proteomics 8, 696–705.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Medical Protein ResearchVIB, Ghent UniversityGhentBelgium
  2. 2.Department of BiochemistryGhent UniversityGhentBelgium

Personalised recommendations