Skip to main content

Quantitative Proteomics Data in the Public Domain: Challenges and Opportunities

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1977))

Abstract

Mass spectrometry based proteomics is no longer only a qualitative discipline, and can be successfully employed to obtain a truly multidimensional view of the proteome. In particular, systematic protein expression profiling is now a routine part of many studies in the field and beyond. The large growth in the number of quantitative studies is accompanied by a trend to share publicly the associated analysis results and the underlying raw data. This trend, established and strongly supported by public repositories such as the PRIDE database at the European Bioinformatics Institute, opens up enormous possibilities to explore the data beyond the original publications, for instance by reusing, reanalyzing, and performing different flavors of meta-analysis studies. To help researchers and scientists realize about this potential, here we describe the mainstream public proteomics resources containing quantitative proteomics data, including the processed analysis results and/or the underlying raw data. We then present and discuss the most important points to consider when attempting to (re)use proteomics data in the public domain. We conclude by highlighting potential pitfalls of (re)using quantitative data and discuss some of our own experiences in this context.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Abbreviations

CPTAC:

Clinical Proteomic Tumor Analysis Consortium

DDA:

Data-dependent acquisition

DIA:

Data-independent acquisition

iTRAQ:

Isobaric tags for relative and absolute quantification

MIAME:

Minimum information about a microarray experiment

MIAPE:

Minimum information about a proteomics experiment

MRM:

Multiple reaction monitoring

MS:

Mass spectrometry

PRM:

Parallel reaction monitoring

PSM:

Peptide spectrum match

PTM:

Posttranslational modification

PX:

ProteomeXchange

SILAC:

Stable isotope labeling by amino acids in cell culture

TMT:

Tandem mass tag

References

  1. Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280. https://doi.org/10.1038/nrm3970

    Article  CAS  PubMed  Google Scholar 

  2. Wang J, Mouradov D, Wang X et al (2017) Colorectal cancer cell line proteomes are representative of primary tumors and predict drug sensitivity. Gastroenterology 153:1082–1095. https://doi.org/10.1053/j.gastro.2017.06.008

    Article  PubMed  PubMed Central  Google Scholar 

  3. Lawless C, Holman SW, Brownridge P et al (2016) Direct and absolute quantification of over 1800 yeast proteins via selected reaction monitoring. Mol Cell Proteomics 15:130–122. https://doi.org/10.1074/mcp.M115.054288

    Article  CAS  Google Scholar 

  4. Lahtvee P-J, Sánchez BJ, Smialowska A et al (2017) Absolute quantification of protein and mRNA abundances demonstrate variability in gene-specific translation efficiency in yeast. Cell Syst 4:495–504.e5. https://doi.org/10.1016/j.cels.2017.03.003

    Article  CAS  PubMed  Google Scholar 

  5. Guo T, Kouvonen P, Koh CC et al (2015) Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat Med 21:407–413

    Article  CAS  Google Scholar 

  6. Kulak NA, Pichler G, Paron I et al (2014) Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324. https://doi.org/10.1038/nmeth.2834

    Article  CAS  PubMed  Google Scholar 

  7. Navarro P, Kuharev J, Gillet LC et al (2016) A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol 34:1130–1136. https://doi.org/10.1038/nbt.3685

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319. https://doi.org/10.1038/nprot.2016.136

    Article  CAS  PubMed  Google Scholar 

  9. Hebert AS, Richards AL, Bailey DJ et al (2014) The one hour yeast proteome. Mol Cell Proteomics 13:339–347. https://doi.org/10.1074/mcp.M113.034769

    Article  CAS  PubMed  Google Scholar 

  10. Perry RH, Cooks RG, Noll RJ (2008) Orbitrap mass spectrometry: instrumentation, ion motion and applications. Mass Spectrom Rev 27:661–699. https://doi.org/10.1002/mas.20186

    Article  CAS  PubMed  Google Scholar 

  11. Vizcaíno JA, Csordas A, del-Toro N et al (2016) 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 44:D447–D456. https://doi.org/10.1093/nar/gkv1145

    Article  CAS  PubMed  Google Scholar 

  12. Martens L, Hermjakob H, Jones P et al (2005) PRIDE: the proteomics identifications database. Proteomics 5:3537–3545. https://doi.org/10.1002/pmic.200401303

    Article  PubMed  Google Scholar 

  13. Deutsch EW, Csordas A, Sun Z et al (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45:D1100–D1106. https://doi.org/10.1093/nar/gkw936

    Article  CAS  PubMed  Google Scholar 

  14. Vizcaíno JA, Deutsch EW, Wang R et al (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32:223–226. https://doi.org/10.1038/nbt.2839

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Okuda S, Watanabe Y, Moriya Y et al (2017) jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res 45:D1107–D1111. https://doi.org/10.1093/nar/gkw1080

    Article  CAS  PubMed  Google Scholar 

  16. Vaudel M, Verheggen K, Csordas A et al (2016) Exploring the potential of public proteomics data. Proteomics 16:214–225. https://doi.org/10.1002/pmic.201500295

    Article  CAS  PubMed  Google Scholar 

  17. Martens L, Vizcaíno JA (2017) A golden age for working with public proteomics data. Trends Biochem Sci 42:333–341. https://doi.org/10.1016/j.tibs.2017.01.001

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Perez-Riverol Y, Alpi E, Wang R et al (2015) Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 15:930–949. https://doi.org/10.1002/pmic.201400302

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Craig R, Cortens JP, Beavis RC (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3:1234–1242. https://doi.org/10.1021/PR049882H

    Article  CAS  PubMed  Google Scholar 

  20. Desiere F, Deutsch EW, King NL et al (2006) The PeptideAtlas project. Nucleic Acids Res 34:D655–D658. https://doi.org/10.1093/nar/gkj040

    Article  CAS  PubMed  Google Scholar 

  21. Farrah T, Deutsch EW, Kreisberg R et al (2012) PASSEL: the PeptideAtlas SRMexperiment library. Proteomics 12:1170–1175. https://doi.org/10.1002/pmic.201100515

    Article  CAS  PubMed  Google Scholar 

  22. Jones AR, Eisenacher M, Mayer G et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381. https://doi.org/10.1074/mcp.M111.014381

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Griss J, Jones AR, Sachsenberg T et al (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics 13:2765–2775. https://doi.org/10.1074/mcp.O113.036681

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Martens L, Chambers M, Sturm M et al (2011) mzML—a community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133. https://doi.org/10.1074/mcp.R110.000133

    Article  PubMed  Google Scholar 

  25. Perez-Riverol Y, Xu Q-W, Wang R et al (2016) PRIDE Inspector toolsuite: moving toward a universal visualization tool for proteomics data standard formats and quality assessment of ProteomeXchange datasets. Mol Cell Proteomics 15:305–317. https://doi.org/10.1074/mcp.O115.050229

    Article  CAS  PubMed  Google Scholar 

  26. Ellis MJ, Gillette M, Carr SA et al (2013) Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov 3:1108–1112. https://doi.org/10.1158/2159-8290.CD-13-0219

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zhang H, Liu T, Zhang Z et al (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166:755–765. https://doi.org/10.1016/j.cell.2016.05.069

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Mertins P, Mani DR, Ruggles KV et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55–62

    Article  CAS  Google Scholar 

  29. Rudnick PA, Markey SP, Roth J et al (2016) A description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) common data analysis pipeline. J Proteome Res 15:1023–1032. https://doi.org/10.1021/acs.jproteome.5b01091

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Edwards NJ, Oberti M, Thangudu RR et al (2015) The CPTAC data portal: a resource for cancer proteomics research. J Proteome Res 14:2707–2713. https://doi.org/10.1021/pr501254j

    Article  CAS  PubMed  Google Scholar 

  31. Wilhelm M, Schlegl J, Hahne H et al (2014) Mass-spectrometry-based draft of the human proteome. Nature 509:582–587

    Article  CAS  Google Scholar 

  32. Zolg DP, Wilhelm M, Schnatbaum K et al (2017) Building ProteomeTools based on a complete synthetic human proteome. Nat Meth 14:259–262

    Article  CAS  Google Scholar 

  33. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567. https://doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2

    Article  CAS  PubMed  Google Scholar 

  34. Fagerberg L, Hallström BM, Oksvold P et al (2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 13:397–406. https://doi.org/10.1074/mcp.M113.035600

    Article  CAS  PubMed  Google Scholar 

  35. Kim M-S, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509:575–581

    Article  CAS  Google Scholar 

  36. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989. https://doi.org/10.1016/1044-0305(94)80016-2

    Article  CAS  PubMed  Google Scholar 

  37. Wang M, Weiss M, Simonovic M et al (2012) PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics 11:492–500. https://doi.org/10.1074/mcp.O111.014704

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Szklarczyk D, Franceschini A, Wyder S et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452. https://doi.org/10.1093/nar/gku1003

    Article  CAS  PubMed  Google Scholar 

  39. Wang M, Herrmann CJ, Simonovic M et al (2015) Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15:3163–3168. https://doi.org/10.1002/pmic.201400441

    Article  CAS  PubMed  Google Scholar 

  40. Schaab C, Geiger T, Stoehr G et al (2012) Analysis of high accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics 11:M111.014068. https://doi.org/10.1074/mcp.M111.014068

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/nbt.1511

    Article  CAS  PubMed  Google Scholar 

  42. Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13. https://doi.org/10.1186/s13059-016-0881-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Bittremieux W, Meysman P, Martens L et al (2016) Unsupervised quality assessment of mass spectrometry proteomics experiments by multivariate quality control metrics. J Proteome Res 15:1300–1307. https://doi.org/10.1021/acs.jproteome.6b00028

    Article  CAS  PubMed  Google Scholar 

  44. Bittremieux W, Walzer M, Tenzer S et al (2017) The human proteome organization-proteomics standards initiative quality control working group: making quality control more accessible for biological mass spectrometry. Anal Chem 89:4474–4479. https://doi.org/10.1021/acs.analchem.6b04310

    Article  CAS  PubMed  Google Scholar 

  45. Bantscheff M, Lemeer S, Savitski MM, Kuster B (2012) Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 404:939–965. https://doi.org/10.1007/s00216-012-6203-4

    Article  CAS  PubMed  Google Scholar 

  46. Domon B, Aebersold R (2010) Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 28:710–721

    Article  CAS  Google Scholar 

  47. Shi T, Song E, Nie S et al (2016) Advances in targeted proteomics and applications to biomedical research. Proteomics 16:2160–2182. https://doi.org/10.1002/pmic.201500449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Hu A, Noble WS, Wolf-Yadlin A (2016) Technical advances in proteomics: new developments in data-independent acquisition. F1000Res 5. pii: F1000 Faculty Rev-419. https://doi.org/10.12688/f1000research.7042.1

    Article  Google Scholar 

  49. Yates JR, Ruse CI, Nakorchevsky A (2009) Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng 11:49–79. https://doi.org/10.1146/annurev-bioeng-061008-124934

    Article  CAS  PubMed  Google Scholar 

  50. Tu C, Sheng Q, Li J et al (2015) Optimization of search engines and postprocessing approaches to maximize peptide and protein identification for high-resolution mass data. J Proteome Res 14:4662–4673. https://doi.org/10.1021/acs.jproteome.5b00536

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Shteynberg D, Nesvizhskii AI, Moritz RL, Deutsch EW (2013) Combining results of multiple search engines in proteomics. Mol Cell Proteomics 12:2383–2393. https://doi.org/10.1074/mcp.R113.027797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Ting L, Cowley MJ, Hoon SL et al (2009) Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol Cell Proteomics 8:2227–2242. https://doi.org/10.1074/mcp.M800462-MCP200

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13(Suppl 16):S5. https://doi.org/10.1186/1471-2105-13-S16-S5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Välikangas T, Suomi T, Elo LL et al (2016) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 86:bbw095. https://doi.org/10.1093/bib/bbw095

    Article  CAS  Google Scholar 

  55. Arike L, Valgepea K, Peil L et al (2012) Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteome 75:5437–5448. https://doi.org/10.1016/j.jprot.2012.06.020

    Article  CAS  Google Scholar 

  56. Taverner T, Karpievitch YV, Polpitiya AD et al (2012) DanteR: an extensible R-based tool for quantitative analysis of -omics data. Bioinformatics 28:2404–2406. https://doi.org/10.1093/bioinformatics/bts449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Chawade A, Alexandersson E, Levander F (2014) Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. J Proteome Res 13:3114–3120. https://doi.org/10.1021/pr401264n

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Pedrioli PGA, Eng JK, Hubley R et al (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22:1459–1466. https://doi.org/10.1038/nbt1031

    Article  CAS  PubMed  Google Scholar 

  59. Chambers MC, Maclean B, Burke R et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30:918–920. https://doi.org/10.1038/nbt.2377

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536. https://doi.org/10.1093/bioinformatics/btn323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Perez-Riverol Y, Wang R, Hermjakob H et al (2014) Open source libraries and frameworks for mass spectrometry based proteomics: a developer’s perspective. Biochim Biophys Acta 1844:63–76. https://doi.org/10.1016/j.bbapap.2013.02.032

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Walzer M, Qi D, Mayer G et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12:2332–2340. https://doi.org/10.1074/mcp.O113.028506

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Jarnuczak AF, Lee DCH, Lawless C et al (2016) Analysis of intrinsic peptide detectability via integrated label-free and SRM-based absolute quantitative proteomics. J Proteome Res 15:2945–2959. https://doi.org/10.1021/acs.jproteome.6b00048

    Article  CAS  PubMed  Google Scholar 

  64. Falick AM, Lane WS, Lilley KS et al (2011) ABRF-PRG07: advanced quantitative proteomics study. J Biomol Tech 22:21–26

    PubMed  PubMed Central  Google Scholar 

  65. Selevsek N, Chang C-Y, Gillet LC et al (2015) Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-mass spectrometry. Mol Cell Proteomics 14:739–749. https://doi.org/10.1074/mcp.M113.035550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Lee MV, Topper SE, Hubler SL et al (2011) A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol Syst Biol 7:514. https://doi.org/10.1038/msb.2011.48

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Goveia J, Pircher A, Conradi L et al (2016) Meta-analysis of clinical metabolic profiling studies in cancer: challenges and opportunities. EMBO Mol Med 8:1134–1142

    Article  CAS  Google Scholar 

  68. Griss J, Perez-Riverol Y, Hermjakob H, Vizcaíno JA (2015) Identifying novel biomarkers through data mining—a realistic scenario? Proteomics Clin Appl 9:437–443. https://doi.org/10.1002/prca.201400107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Brazma A, Hingamp P, Quackenbush J et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371. https://doi.org/10.1038/ng1201-365

    Article  CAS  PubMed  Google Scholar 

  70. Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893. https://doi.org/10.1038/nbt1329

    Article  CAS  PubMed  Google Scholar 

  71. Martínez-Bartolomé S, Deutsch EW, Binz P-A et al (2013) Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. J Proteome 95:84–88. https://doi.org/10.1016/j.jprot.2013.02.026

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors want to acknowledge financial support from the Wellcome Trust [grant numbers WT101477MA and 208391/Z/17/Z] and from EMBL core funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Antonio Vizcaíno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Jarnuczak, A.F., Ternent, T., Vizcaíno, J.A. (2019). Quantitative Proteomics Data in the Public Domain: Challenges and Opportunities. In: Evans, C., Wright, P., Noirel, J. (eds) Mass Spectrometry of Proteins. Methods in Molecular Biology, vol 1977. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9232-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9232-4_14

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-9231-7

  • Online ISBN: 978-1-4939-9232-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics