Bioinformatics for Proteomics: Opportunities at the Interface Between the Scientists, Their Experiments, and the Community

  • Marc Vaudel
  • Harald Barsnes
  • Lennart Martens
  • Frode S. Berven
Part of the Methods in Molecular Biology book series (MIMB, volume 1156)


Within the last decade, bioinformatics has moved from command line scripts dedicated to single experiments towards production grade software integrated in experimental workflows providing a rich environment for biological investigation. Located at the interface between the scientists, their experiments, and the community, bioinformatics acts as a gateway to a wide source of information. This chapter does not list tools and methods, but rather hints at how bioinformatics can help in improving biological projects, all the way from their initial design to the dissemination of the results.

Key words

Bioinformatics Experimental design 



H.B. is supported by the Research Council of Norway. L.M. acknowledges the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”), the PRIME-XS project, grant agreement number 262067, and the “ProteomeXchange” project, grant agreement number 260558, both funded by the European Union 7th Framework Program. The authors have no competing financial or commercial interests.


  1. 1.
    Bromenshenk JJ, Henderson CB, Wick CH et al (2010) Iridovirus and microsporidian linked to honey bee colony decline. PLoS One 5:e13181PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Foster LJ (2011) Interpretation of data underlying the link between colony collapse disorder (CCD) and an invertebrate iridescent virus. Mol Cell Proteomics 10:M110.006387PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Ma K, Vitek O, Nesvizhskii AI (2012) A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet. BMC Bioinformatics 13 Suppl 16:S1Google Scholar
  4. 4.
    Vaudel M, Burkhart JM, Sickmann A et al (2011) Peptide identification quality control. Proteomics 11:2105–2114PubMedCrossRefGoogle Scholar
  5. 5.
    Colaert N, Degroeve S, Helsens K et al (2011) Analysis of the resolution limitations of peptide identification algorithms. J Proteome Res 10:5555–5561PubMedCrossRefGoogle Scholar
  6. 6.
    Knudsen GM, Chalkley RJ (2011) The effect of using an inappropriate protein database for proteomic data analysis. PLoS One 6:e20873PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    Szklarczyk D, Franceschini A, Kuhn M et al (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39:D561–D568PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Kerrien S, Aranda B, Breuza L et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40:D841–D846PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Sherman BT, da Huang W, Tan Q et al (2007) DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. BMC Bioinformatics 8:426PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Haw R, Hermjakob H, D’Eustachio P et al (2011) Reactome pathway analysis to enrich biological discovery in proteomics data sets. Proteomics 11:3598–3613PubMedCrossRefGoogle Scholar
  11. 11.
    Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Hornbeck PV, Kornhauser JM, Tkachev S et al (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40:D261–D270PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Reddy TB, Riley R, Wymore F et al (2009) TB database: an integrated platform for tuberculosis research. Nucleic Acids Res 37:D499–D508PubMedCentralPubMedCrossRefGoogle Scholar
  14. 14.
    Forbes SA, Bindal N, Bamford S et al (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39:D945–D950PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Apweiler R, Bairoch A, Wu CH et al (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32:D115–D119PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Craig R, Cortens JP, Beavis RC (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3:1234–1242PubMedCrossRefGoogle Scholar
  17. 17.
    Lane L, Argoud-Puy G, Britan A et al (2012) neXtProt: a knowledge platform for human proteins. Nucleic Acids Res 40:D76–D83PubMedCentralPubMedCrossRefGoogle Scholar
  18. 18.
    Vizcaino JA, Mueller M, Hermjakob H et al (2009) Charting online OMICS resources: a navigational chart for clinical researchers. Proteomics Clin Appl 3:18–29PubMedCrossRefGoogle Scholar
  19. 19.
    Hahne H, Moghaddas Gholami A, Kuster B (2012) Discovery of O-GlcNAc-modified proteins in published large-scale proteome data. Mol Cell Proteomics 11:843–850PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Matic I, Ahel I, Hay RT (2012) Reanalysis of phosphoproteomics data uncovers ADP-ribosylation sites. Nat Methods 9:771–772PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Martens L, Nesvizhskii AI, Hermjakob H et al (2005) Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories. Proteomics 5: 3501–3505PubMedCrossRefGoogle Scholar
  22. 22.
    Fannes T, Vandermarliere E, Schietgat L et al (2013) Predicting tryptic cleavage from proteomics data using decision tree ensembles. J Proteome Res 12:2253–2259PubMedCrossRefGoogle Scholar
  23. 23.
    Vandermarliere E, Martens L (2013) Protein structure as a means to triage proposed PTM sites. Proteomics 13:1028–1035PubMedCrossRefGoogle Scholar
  24. 24.
    Desiere F, Deutsch EW, King NL et al (2006) The PeptideAtlas project. Nucleic Acids Res 34:D655–D658PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Martens L, Hermjakob H, Jones P et al (2005) PRIDE: the proteomics identifications database. Proteomics 5:3537–3545PubMedCrossRefGoogle Scholar
  26. 26.
    Vizcaino JA, Foster JM, Martens L (2010) Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research. J Proteomics 73:2136–2146PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Wang R, Fabregat A, Rios D et al (2012) PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat Biotechnol 30:135–137PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Barsnes H, Martens L (2013) Crowdsourcing in proteomics: public resources lead to better experiments. Amino Acids 44:1129–1137PubMedCrossRefGoogle Scholar
  29. 29.
    Levin Y (2011) The role of statistical power analysis in quantitative proteomics. Proteomics 11:2565–2567PubMedCrossRefGoogle Scholar
  30. 30.
    Oberg AL, Vitek O (2009) Statistical design of quantitative mass spectrometry-based proteomic experiments. J Proteome Res 8: 2144–2156PubMedCrossRefGoogle Scholar
  31. 31.
    Karp NA, Lilley KS (2009) Investigating sample pooling strategies for DIGE experiments to address biological variability. Proteomics 9:388–397PubMedCrossRefGoogle Scholar
  32. 32.
    Geiger T, Cox J, Ostasiewicz P et al (2010) Super-SILAC mix for quantitative proteomics of human tumor tissue. Nat Methods 7:383–385PubMedCrossRefGoogle Scholar
  33. 33.
    Bantscheff M, Schirle M, Sweetman G et al (2007) Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem 389:1017–1031PubMedCrossRefGoogle Scholar
  34. 34.
    Vaudel M, Sickmann A, Martens L (2010) Peptide and protein quantification: a map of the minefield. Proteomics 10:650–670PubMedCrossRefGoogle Scholar
  35. 35.
    Domon B, Aebersold R (2010) Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 28:710–721PubMedCrossRefGoogle Scholar
  36. 36.
    Vaudel M, Burkhart JM, Radau S et al (2012) Integral quantification accuracy estimation for reporter ion-based quantitative proteomics (iQuARI). J Proteome Res 11:5072–5080PubMedCrossRefGoogle Scholar
  37. 37.
    Vaudel M, Burkhart JM, Breiter D et al (2012) A complex standard for protein identification, designed by evolution. J Proteome Res 11:5065–5071PubMedCrossRefGoogle Scholar
  38. 38.
    Muth T, Benndorf D, Reichl U et al (2013) Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. Mol Biosyst 9:578–585PubMedCrossRefGoogle Scholar
  39. 39.
    Castellana NE, Payne SH, Shen Z et al (2008) Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci U S A 105:21034–21038PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Moruz L, Pichler P, Stranzl T et al (2013) Optimized nonlinear gradients for reversed-phase liquid chromatography in shotgun proteomics. Anal Chem 85:7777–7785PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Jenden DJ, Cho AK (1979) Selected ion monitoring in pharmacology. Biochem Pharmacol 28:705–713PubMedCrossRefGoogle Scholar
  42. 42.
    Yost RA, Enke CG (1979) Triple quadrupole mass spectrometry for direct mixture analysis and structure elucidation. Anal Chem 51: 1251–1264PubMedCrossRefGoogle Scholar
  43. 43.
    Purvine S, Eppel JT, Yi EC et al (2003) Shotgun collision-induced dissociation of peptides using a time of flight mass analyzer. Proteomics 3:847–850PubMedCrossRefGoogle Scholar
  44. 44.
    Craig R, Cortens JP, Beavis RC (2005) The use of proteotypic peptide libraries for protein identification. Rapid Commun Mass Spectrom 19:1844–1850PubMedCrossRefGoogle Scholar
  45. 45.
    Barsnes H, Eidhammer I, Martens L (2011) A global analysis of peptide fragmentation variability. Proteomics 11:1181–1188PubMedCrossRefGoogle Scholar
  46. 46.
    Mallick P, Schirle M, Chen SS et al (2007) Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol 25:125–131PubMedCrossRefGoogle Scholar
  47. 47.
    Degroeve S, Martens L (2013) MS2PIP: a tool for MS/MS peak intensity prediction. Bioinformatics 29(24):3199–3203PubMedCrossRefGoogle Scholar
  48. 48.
    Moruz L, Staes A, Foster JM et al (2012) Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12:1151–1159PubMedCrossRefGoogle Scholar
  49. 49.
    Nahnsen S, Kohlbacher O (2012) In silico design of targeted SRM-based experiments. BMC Bioinformatics 13 Suppl 16:S8Google Scholar
  50. 50.
    Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372PubMedCrossRefGoogle Scholar
  51. 51.
    Orchard S, Jones P, Taylor C et al (2007) Proteomic data exchange and storage: the need for common standards and public repositories. Methods Mol Biol 367:261–270PubMedGoogle Scholar
  52. 52.
    Kinsinger CR, Apffel J, Baker M et al (2012) Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). J Proteome Res 11:1412–1419PubMedCentralPubMedCrossRefGoogle Scholar
  53. 53.
    Kinsinger CR, Apffel J, Baker M et al (2012) Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles). Proteomics 12:11–20PubMedCrossRefGoogle Scholar
  54. 54.
    Kinsinger CR, Apffel J, Baker M et al (2011) Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles). Proteomics Clin Appl 5:580–589PubMedCrossRefGoogle Scholar
  55. 55.
    Kinsinger CR, Apffel J, Baker M et al (2011) Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). Mol Cell Proteomics 10:O111.015446PubMedCentralPubMedCrossRefGoogle Scholar
  56. 56.
    Martens L (2011) Data management in mass spectrometry-based proteomics. Methods Mol Biol 728:321–332PubMedCrossRefGoogle Scholar
  57. 57.
    Hakkinen J, Vincic G, Mansson O et al (2009) The proteios software environment: an extensible multiuser platform for management and analysis of proteomics data. J Proteome Res 8:3037–3043PubMedCrossRefGoogle Scholar
  58. 58.
    Piggee C (2008) LIMS and the art of MS proteomics. Anal Chem 80:4801–4806PubMedCrossRefGoogle Scholar
  59. 59.
    Stephan C, Kohl M, Turewicz M et al (2010) Using laboratory information management systems as central part of a proteomics data workflow. Proteomics 10:1230–1249PubMedCrossRefGoogle Scholar
  60. 60.
    Weisser H, Nahnsen S, Grossmann J et al (2013) An automated pipeline for high-throughput label-free quantitative proteomics. J Proteome Res 12(4):1628–1644PubMedCrossRefGoogle Scholar
  61. 61.
    Lange E, Gropl C, Reinert K et al (2006) High-accuracy peak picking of proteomics data using wavelet techniques. Pac Symp Biocomput 243–254Google Scholar
  62. 62.
    Martin SF, Falkenberg H, Dyrlund TF et al (2013) PROTEINCHALLENGE: crowd sourcing in proteomics analysis and software development. J Proteomics 88:41–46PubMedCrossRefGoogle Scholar
  63. 63.
    Keller A, Eng J, Zhang N et al (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1:2005.0017PubMedCentralPubMedCrossRefGoogle Scholar
  64. 64.
    Sturm M, Bertsch A, Gropl C et al (2008) OpenMS—an open-source software framework for mass spectrometry. BMC Bioinformatics 9:163PubMedCentralPubMedCrossRefGoogle Scholar
  65. 65.
    Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536PubMedCentralPubMedCrossRefGoogle Scholar
  66. 66.
    Junker J, Bielow C, Bertsch A et al (2012) TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. J Proteome Res 11:3914–3920PubMedCrossRefGoogle Scholar
  67. 67.
    Elias JE, Gygi SP (2010) Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol 604:55–71PubMedCentralPubMedCrossRefGoogle Scholar
  68. 68.
    Everett LJ, Bierl C, Master SR (2010) Unbiased statistical analysis for multi-stage proteomic search strategies. J Proteome Res 9:700–707PubMedCrossRefGoogle Scholar
  69. 69.
    Ivanov AR, Colangelo CM, Dufresne CP et al (2013) Interlaboratory studies and initiatives developing standards for proteomics. Proteomics 13:904–909PubMedCrossRefGoogle Scholar
  70. 70.
    Martens L, Vizcaino JA, Banks R (2011) Quality control in proteomics. Proteomics 11:1015–1016PubMedCrossRefGoogle Scholar
  71. 71.
    Tabb DL (2013) Quality assessment for clinical proteomics. Clin Biochem 46:411–420PubMedCentralPubMedCrossRefGoogle Scholar
  72. 72.
    Martens L (2013) Bringing proteomics into the clinic: the need for the field to finally take itself seriously. Proteomics Clin Appl 7: 388–391PubMedCrossRefGoogle Scholar
  73. 73.
    Burkhart JM, Premsler T, Sickmann A (2011) Quality control of nano-LC-MS systems using stable isotope-coded peptides. Proteomics 11: 1049–1057PubMedCrossRefGoogle Scholar
  74. 74.
    Staes A, Vandenbussche J, Demol H et al (2013) Asn3, a reliable, robust and universal lock mass for improved accuracy in LC-MS and LC-MS/MS. Anal Chem 85(22):11054–11060PubMedCrossRefGoogle Scholar
  75. 75.
    Cote RG, Reisinger F, Martens L (2010) jmzML, an open-source Java API for mzML, the PSI standard for MS data. Proteomics 10:1332–1335PubMedCrossRefGoogle Scholar
  76. 76.
    Sturm M, Kohlbacher O (2009) TOPPView: an open-source viewer for mass spectrometry data. J Proteome Res 8:3760–3763PubMedCrossRefGoogle Scholar
  77. 77.
    Pichler P, Mazanek M, Dusberger F et al (2012) SIMPATIQCO: a server-based software suite which facilitates monitoring the time course of LC-MS performance metrics on Orbitrap instruments. J Proteome Res 11:5540–5547PubMedCentralPubMedCrossRefGoogle Scholar
  78. 78.
    Gonnelli G, Hulstaert N, Degroeve S et al (2012) Towards a human proteomics atlas. Anal Bioanal Chem 404:1069–1077PubMedCrossRefGoogle Scholar
  79. 79.
    Foster JM, Degroeve S, Gatto L et al (2011) A posteriori quality control for the curation and reuse of public proteomics data. Proteomics 11:2182–2194PubMedCrossRefGoogle Scholar
  80. 80.
    Domon B, Aebersold R (2006) Mass spectrometry and protein analysis. Science 312:212–217PubMedCrossRefGoogle Scholar
  81. 81.
    Shannon P, Markiel A, Ozier O et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504PubMedCentralPubMedCrossRefGoogle Scholar
  82. 82.
    Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.
  83. 83.
    Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4:1419–1440PubMedCrossRefGoogle Scholar
  84. 84.
    Vaudel M, Sickmann A, Martens L (2013) Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics. Biochim Biophys Acta 1844(1 Pt A):12–20PubMedGoogle Scholar
  85. 85.
    Flicek P, Amode MR, Barrell D et al (2011) Ensembl 2011. Nucleic Acids Res 39: D800–D806PubMedCentralPubMedCrossRefGoogle Scholar
  86. 86.
    Cox J, Mann M (2012) 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics 13 Suppl 16:S12Google Scholar
  87. 87.
    Kasprzyk A, Keefe D, Smedley D et al (2004) EnsMart: a generic system for fast and flexible access to biological data. Genome Res 14: 160–169PubMedCentralPubMedCrossRefGoogle Scholar
  88. 88.
    Kasprzyk A (2011) BioMart: driving a paradigm change in biological data management. Database (Oxford) 2011:bar049CrossRefGoogle Scholar
  89. 89.
    Smedley D, Haider S, Ballester B et al (2009) BioMart—biological queries made easy. BMC Genomics 10:22PubMedCentralPubMedCrossRefGoogle Scholar
  90. 90.
    Villaveces JM, Jimenez RC, Garcia LJ et al (2011) Dasty3, a WEB framework for DAS. Bioinformatics 27:2616–2617PubMedCentralPubMedGoogle Scholar
  91. 91.
    Barsnes H, Vizcaino JA, Eidhammer I et al (2009) PRIDE Converter: making proteomics data-sharing easy. Nat Biotechnol 27: 598–599PubMedCrossRefGoogle Scholar
  92. 92.
    Cote RG, Griss J, Dianes JA et al (2012) The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium. Mol Cell Proteomics 11: 1682–1689PubMedCentralPubMedCrossRefGoogle Scholar
  93. 93.
    Martens L, Palazzi LM, Hermjakob H (2008) Data standards and controlled vocabularies for proteomics. Methods Mol Biol 484:279–286PubMedCrossRefGoogle Scholar
  94. 94.
    Cote R, Reisinger F, Martens L et al (2010) The Ontology Lookup Service: bigger and better. Nucleic Acids Res 38:W155–W160PubMedCentralPubMedCrossRefGoogle Scholar
  95. 95.
    Barsnes H, Cote RG, Eidhammer I et al (2010) OLS dialog: an open-source front end to the ontology lookup service. BMC Bioinformatics 11:34PubMedCentralPubMedCrossRefGoogle Scholar
  96. 96.
    Klie S, Martens L, Vizcaino JA et al (2008) Analyzing large-scale proteomics projects with latent semantic indexing. J Proteome Res 7:182–191PubMedCrossRefGoogle Scholar
  97. 97.
    (2013) In need of an upgrade. Nat Biotechnol 31:857. doi: 10.1038/nbt.2717Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Marc Vaudel
    • 1
  • Harald Barsnes
    • 1
  • Lennart Martens
    • 2
    • 3
  • Frode S. Berven
    • 1
  1. 1.Proteomics Unit, Department of BiomedicineUniversity of BergenBergenNorway
  2. 2.Department of BiochemistryGhent UniversityGhentBelgium
  3. 3.Department of Medical Protein ResearchVIBGhentBelgium

Personalised recommendations