Quantitative Proteomics Data in the Public Domain: Challenges and Opportunities

  • Andrew F. Jarnuczak
  • Tobias Ternent
  • Juan Antonio VizcaínoEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1977)


Mass spectrometry based proteomics is no longer only a qualitative discipline, and can be successfully employed to obtain a truly multidimensional view of the proteome. In particular, systematic protein expression profiling is now a routine part of many studies in the field and beyond. The large growth in the number of quantitative studies is accompanied by a trend to share publicly the associated analysis results and the underlying raw data. This trend, established and strongly supported by public repositories such as the PRIDE database at the European Bioinformatics Institute, opens up enormous possibilities to explore the data beyond the original publications, for instance by reusing, reanalyzing, and performing different flavors of meta-analysis studies. To help researchers and scientists realize about this potential, here we describe the mainstream public proteomics resources containing quantitative proteomics data, including the processed analysis results and/or the underlying raw data. We then present and discuss the most important points to consider when attempting to (re)use proteomics data in the public domain. We conclude by highlighting potential pitfalls of (re)using quantitative data and discuss some of our own experiences in this context.

Key words

Mass spectrometry Data (re)analysis Quantitative proteomics Data repository PRIDE database 



Clinical Proteomic Tumor Analysis Consortium


Data-dependent acquisition


Data-independent acquisition


Isobaric tags for relative and absolute quantification


Minimum information about a microarray experiment


Minimum information about a proteomics experiment


Multiple reaction monitoring


Mass spectrometry


Parallel reaction monitoring


Peptide spectrum match


Posttranslational modification




Stable isotope labeling by amino acids in cell culture


Tandem mass tag



The authors want to acknowledge financial support from the Wellcome Trust [grant numbers WT101477MA and 208391/Z/17/Z] and from EMBL core funds.


  1. 1.
    Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280. Scholar
  2. 2.
    Wang J, Mouradov D, Wang X et al (2017) Colorectal cancer cell line proteomes are representative of primary tumors and predict drug sensitivity. Gastroenterology 153:1082–1095. Scholar
  3. 3.
    Lawless C, Holman SW, Brownridge P et al (2016) Direct and absolute quantification of over 1800 yeast proteins via selected reaction monitoring. Mol Cell Proteomics 15:130–122. Scholar
  4. 4.
    Lahtvee P-J, Sánchez BJ, Smialowska A et al (2017) Absolute quantification of protein and mRNA abundances demonstrate variability in gene-specific translation efficiency in yeast. Cell Syst 4:495–504.e5. Scholar
  5. 5.
    Guo T, Kouvonen P, Koh CC et al (2015) Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat Med 21:407–413CrossRefGoogle Scholar
  6. 6.
    Kulak NA, Pichler G, Paron I et al (2014) Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324. Scholar
  7. 7.
    Navarro P, Kuharev J, Gillet LC et al (2016) A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol 34:1130–1136. Scholar
  8. 8.
    Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319. Scholar
  9. 9.
    Hebert AS, Richards AL, Bailey DJ et al (2014) The one hour yeast proteome. Mol Cell Proteomics 13:339–347. Scholar
  10. 10.
    Perry RH, Cooks RG, Noll RJ (2008) Orbitrap mass spectrometry: instrumentation, ion motion and applications. Mass Spectrom Rev 27:661–699. Scholar
  11. 11.
    Vizcaíno JA, Csordas A, del-Toro N et al (2016) 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 44:D447–D456. Scholar
  12. 12.
    Martens L, Hermjakob H, Jones P et al (2005) PRIDE: the proteomics identifications database. Proteomics 5:3537–3545. Scholar
  13. 13.
    Deutsch EW, Csordas A, Sun Z et al (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45:D1100–D1106. Scholar
  14. 14.
    Vizcaíno JA, Deutsch EW, Wang R et al (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32:223–226. Scholar
  15. 15.
    Okuda S, Watanabe Y, Moriya Y et al (2017) jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res 45:D1107–D1111. Scholar
  16. 16.
    Vaudel M, Verheggen K, Csordas A et al (2016) Exploring the potential of public proteomics data. Proteomics 16:214–225. Scholar
  17. 17.
    Martens L, Vizcaíno JA (2017) A golden age for working with public proteomics data. Trends Biochem Sci 42:333–341. Scholar
  18. 18.
    Perez-Riverol Y, Alpi E, Wang R et al (2015) Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 15:930–949. Scholar
  19. 19.
    Craig R, Cortens JP, Beavis RC (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3:1234–1242. Scholar
  20. 20.
    Desiere F, Deutsch EW, King NL et al (2006) The PeptideAtlas project. Nucleic Acids Res 34:D655–D658. Scholar
  21. 21.
    Farrah T, Deutsch EW, Kreisberg R et al (2012) PASSEL: the PeptideAtlas SRMexperiment library. Proteomics 12:1170–1175. Scholar
  22. 22.
    Jones AR, Eisenacher M, Mayer G et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381. Scholar
  23. 23.
    Griss J, Jones AR, Sachsenberg T et al (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics 13:2765–2775. Scholar
  24. 24.
    Martens L, Chambers M, Sturm M et al (2011) mzML—a community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133. Scholar
  25. 25.
    Perez-Riverol Y, Xu Q-W, Wang R et al (2016) PRIDE Inspector toolsuite: moving toward a universal visualization tool for proteomics data standard formats and quality assessment of ProteomeXchange datasets. Mol Cell Proteomics 15:305–317. Scholar
  26. 26.
    Ellis MJ, Gillette M, Carr SA et al (2013) Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov 3:1108–1112. Scholar
  27. 27.
    Zhang H, Liu T, Zhang Z et al (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166:755–765. Scholar
  28. 28.
    Mertins P, Mani DR, Ruggles KV et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55–62CrossRefGoogle Scholar
  29. 29.
    Rudnick PA, Markey SP, Roth J et al (2016) A description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) common data analysis pipeline. J Proteome Res 15:1023–1032. Scholar
  30. 30.
    Edwards NJ, Oberti M, Thangudu RR et al (2015) The CPTAC data portal: a resource for cancer proteomics research. J Proteome Res 14:2707–2713. Scholar
  31. 31.
    Wilhelm M, Schlegl J, Hahne H et al (2014) Mass-spectrometry-based draft of the human proteome. Nature 509:582–587CrossRefGoogle Scholar
  32. 32.
    Zolg DP, Wilhelm M, Schnatbaum K et al (2017) Building ProteomeTools based on a complete synthetic human proteome. Nat Meth 14:259–262CrossRefGoogle Scholar
  33. 33.
    Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567.<3551::AID-ELPS3551>3.0.CO;2-2CrossRefPubMedGoogle Scholar
  34. 34.
    Fagerberg L, Hallström BM, Oksvold P et al (2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 13:397–406. Scholar
  35. 35.
    Kim M-S, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509:575–581CrossRefGoogle Scholar
  36. 36.
    Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989. Scholar
  37. 37.
    Wang M, Weiss M, Simonovic M et al (2012) PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics 11:492–500. Scholar
  38. 38.
    Szklarczyk D, Franceschini A, Wyder S et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452. Scholar
  39. 39.
    Wang M, Herrmann CJ, Simonovic M et al (2015) Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15:3163–3168. Scholar
  40. 40.
    Schaab C, Geiger T, Stoehr G et al (2012) Analysis of high accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics 11:M111.014068. Scholar
  41. 41.
    Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372. Scholar
  42. 42.
    Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13. Scholar
  43. 43.
    Bittremieux W, Meysman P, Martens L et al (2016) Unsupervised quality assessment of mass spectrometry proteomics experiments by multivariate quality control metrics. J Proteome Res 15:1300–1307. Scholar
  44. 44.
    Bittremieux W, Walzer M, Tenzer S et al (2017) The human proteome organization-proteomics standards initiative quality control working group: making quality control more accessible for biological mass spectrometry. Anal Chem 89:4474–4479. Scholar
  45. 45.
    Bantscheff M, Lemeer S, Savitski MM, Kuster B (2012) Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 404:939–965. Scholar
  46. 46.
    Domon B, Aebersold R (2010) Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 28:710–721CrossRefGoogle Scholar
  47. 47.
    Shi T, Song E, Nie S et al (2016) Advances in targeted proteomics and applications to biomedical research. Proteomics 16:2160–2182. Scholar
  48. 48.
    Hu A, Noble WS, Wolf-Yadlin A (2016) Technical advances in proteomics: new developments in data-independent acquisition. F1000Res 5. pii: F1000 Faculty Rev-419. Scholar
  49. 49.
    Yates JR, Ruse CI, Nakorchevsky A (2009) Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng 11:49–79. Scholar
  50. 50.
    Tu C, Sheng Q, Li J et al (2015) Optimization of search engines and postprocessing approaches to maximize peptide and protein identification for high-resolution mass data. J Proteome Res 14:4662–4673. Scholar
  51. 51.
    Shteynberg D, Nesvizhskii AI, Moritz RL, Deutsch EW (2013) Combining results of multiple search engines in proteomics. Mol Cell Proteomics 12:2383–2393. Scholar
  52. 52.
    Ting L, Cowley MJ, Hoon SL et al (2009) Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol Cell Proteomics 8:2227–2242. Scholar
  53. 53.
    Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13(Suppl 16):S5. Scholar
  54. 54.
    Välikangas T, Suomi T, Elo LL et al (2016) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 86:bbw095. Scholar
  55. 55.
    Arike L, Valgepea K, Peil L et al (2012) Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteome 75:5437–5448. Scholar
  56. 56.
    Taverner T, Karpievitch YV, Polpitiya AD et al (2012) DanteR: an extensible R-based tool for quantitative analysis of -omics data. Bioinformatics 28:2404–2406. Scholar
  57. 57.
    Chawade A, Alexandersson E, Levander F (2014) Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. J Proteome Res 13:3114–3120. Scholar
  58. 58.
    Pedrioli PGA, Eng JK, Hubley R et al (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22:1459–1466. Scholar
  59. 59.
    Chambers MC, Maclean B, Burke R et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30:918–920. Scholar
  60. 60.
    Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536. Scholar
  61. 61.
    Perez-Riverol Y, Wang R, Hermjakob H et al (2014) Open source libraries and frameworks for mass spectrometry based proteomics: a developer’s perspective. Biochim Biophys Acta 1844:63–76. Scholar
  62. 62.
    Walzer M, Qi D, Mayer G et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12:2332–2340. Scholar
  63. 63.
    Jarnuczak AF, Lee DCH, Lawless C et al (2016) Analysis of intrinsic peptide detectability via integrated label-free and SRM-based absolute quantitative proteomics. J Proteome Res 15:2945–2959. Scholar
  64. 64.
    Falick AM, Lane WS, Lilley KS et al (2011) ABRF-PRG07: advanced quantitative proteomics study. J Biomol Tech 22:21–26PubMedPubMedCentralGoogle Scholar
  65. 65.
    Selevsek N, Chang C-Y, Gillet LC et al (2015) Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-mass spectrometry. Mol Cell Proteomics 14:739–749. Scholar
  66. 66.
    Lee MV, Topper SE, Hubler SL et al (2011) A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol Syst Biol 7:514. Scholar
  67. 67.
    Goveia J, Pircher A, Conradi L et al (2016) Meta-analysis of clinical metabolic profiling studies in cancer: challenges and opportunities. EMBO Mol Med 8:1134–1142CrossRefGoogle Scholar
  68. 68.
    Griss J, Perez-Riverol Y, Hermjakob H, Vizcaíno JA (2015) Identifying novel biomarkers through data mining—a realistic scenario? Proteomics Clin Appl 9:437–443. Scholar
  69. 69.
    Brazma A, Hingamp P, Quackenbush J et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371. Scholar
  70. 70.
    Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893. Scholar
  71. 71.
    Martínez-Bartolomé S, Deutsch EW, Binz P-A et al (2013) Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. J Proteome 95:84–88. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Andrew F. Jarnuczak
    • 1
  • Tobias Ternent
    • 1
  • Juan Antonio Vizcaíno
    • 1
    Email author
  1. 1.European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL-EBI)CambridgeUK

Personalised recommendations