Abstract
Mass spectrometry based proteomics is no longer only a qualitative discipline, and can be successfully employed to obtain a truly multidimensional view of the proteome. In particular, systematic protein expression profiling is now a routine part of many studies in the field and beyond. The large growth in the number of quantitative studies is accompanied by a trend to share publicly the associated analysis results and the underlying raw data. This trend, established and strongly supported by public repositories such as the PRIDE database at the European Bioinformatics Institute, opens up enormous possibilities to explore the data beyond the original publications, for instance by reusing, reanalyzing, and performing different flavors of meta-analysis studies. To help researchers and scientists realize about this potential, here we describe the mainstream public proteomics resources containing quantitative proteomics data, including the processed analysis results and/or the underlying raw data. We then present and discuss the most important points to consider when attempting to (re)use proteomics data in the public domain. We conclude by highlighting potential pitfalls of (re)using quantitative data and discuss some of our own experiences in this context.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- CPTAC:
-
Clinical Proteomic Tumor Analysis Consortium
- DDA:
-
Data-dependent acquisition
- DIA:
-
Data-independent acquisition
- iTRAQ:
-
Isobaric tags for relative and absolute quantification
- MIAME:
-
Minimum information about a microarray experiment
- MIAPE:
-
Minimum information about a proteomics experiment
- MRM:
-
Multiple reaction monitoring
- MS:
-
Mass spectrometry
- PRM:
-
Parallel reaction monitoring
- PSM:
-
Peptide spectrum match
- PTM:
-
Posttranslational modification
- PX:
-
ProteomeXchange
- SILAC:
-
Stable isotope labeling by amino acids in cell culture
- TMT:
-
Tandem mass tag
References
Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280. https://doi.org/10.1038/nrm3970
Wang J, Mouradov D, Wang X et al (2017) Colorectal cancer cell line proteomes are representative of primary tumors and predict drug sensitivity. Gastroenterology 153:1082–1095. https://doi.org/10.1053/j.gastro.2017.06.008
Lawless C, Holman SW, Brownridge P et al (2016) Direct and absolute quantification of over 1800 yeast proteins via selected reaction monitoring. Mol Cell Proteomics 15:130–122. https://doi.org/10.1074/mcp.M115.054288
Lahtvee P-J, Sánchez BJ, Smialowska A et al (2017) Absolute quantification of protein and mRNA abundances demonstrate variability in gene-specific translation efficiency in yeast. Cell Syst 4:495–504.e5. https://doi.org/10.1016/j.cels.2017.03.003
Guo T, Kouvonen P, Koh CC et al (2015) Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat Med 21:407–413
Kulak NA, Pichler G, Paron I et al (2014) Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324. https://doi.org/10.1038/nmeth.2834
Navarro P, Kuharev J, Gillet LC et al (2016) A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol 34:1130–1136. https://doi.org/10.1038/nbt.3685
Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319. https://doi.org/10.1038/nprot.2016.136
Hebert AS, Richards AL, Bailey DJ et al (2014) The one hour yeast proteome. Mol Cell Proteomics 13:339–347. https://doi.org/10.1074/mcp.M113.034769
Perry RH, Cooks RG, Noll RJ (2008) Orbitrap mass spectrometry: instrumentation, ion motion and applications. Mass Spectrom Rev 27:661–699. https://doi.org/10.1002/mas.20186
Vizcaíno JA, Csordas A, del-Toro N et al (2016) 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 44:D447–D456. https://doi.org/10.1093/nar/gkv1145
Martens L, Hermjakob H, Jones P et al (2005) PRIDE: the proteomics identifications database. Proteomics 5:3537–3545. https://doi.org/10.1002/pmic.200401303
Deutsch EW, Csordas A, Sun Z et al (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45:D1100–D1106. https://doi.org/10.1093/nar/gkw936
Vizcaíno JA, Deutsch EW, Wang R et al (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 32:223–226. https://doi.org/10.1038/nbt.2839
Okuda S, Watanabe Y, Moriya Y et al (2017) jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res 45:D1107–D1111. https://doi.org/10.1093/nar/gkw1080
Vaudel M, Verheggen K, Csordas A et al (2016) Exploring the potential of public proteomics data. Proteomics 16:214–225. https://doi.org/10.1002/pmic.201500295
Martens L, Vizcaíno JA (2017) A golden age for working with public proteomics data. Trends Biochem Sci 42:333–341. https://doi.org/10.1016/j.tibs.2017.01.001
Perez-Riverol Y, Alpi E, Wang R et al (2015) Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 15:930–949. https://doi.org/10.1002/pmic.201400302
Craig R, Cortens JP, Beavis RC (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3:1234–1242. https://doi.org/10.1021/PR049882H
Desiere F, Deutsch EW, King NL et al (2006) The PeptideAtlas project. Nucleic Acids Res 34:D655–D658. https://doi.org/10.1093/nar/gkj040
Farrah T, Deutsch EW, Kreisberg R et al (2012) PASSEL: the PeptideAtlas SRMexperiment library. Proteomics 12:1170–1175. https://doi.org/10.1002/pmic.201100515
Jones AR, Eisenacher M, Mayer G et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11:M111.014381. https://doi.org/10.1074/mcp.M111.014381
Griss J, Jones AR, Sachsenberg T et al (2014) The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics 13:2765–2775. https://doi.org/10.1074/mcp.O113.036681
Martens L, Chambers M, Sturm M et al (2011) mzML—a community standard for mass spectrometry data. Mol Cell Proteomics 10:R110.000133. https://doi.org/10.1074/mcp.R110.000133
Perez-Riverol Y, Xu Q-W, Wang R et al (2016) PRIDE Inspector toolsuite: moving toward a universal visualization tool for proteomics data standard formats and quality assessment of ProteomeXchange datasets. Mol Cell Proteomics 15:305–317. https://doi.org/10.1074/mcp.O115.050229
Ellis MJ, Gillette M, Carr SA et al (2013) Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov 3:1108–1112. https://doi.org/10.1158/2159-8290.CD-13-0219
Zhang H, Liu T, Zhang Z et al (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166:755–765. https://doi.org/10.1016/j.cell.2016.05.069
Mertins P, Mani DR, Ruggles KV et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55–62
Rudnick PA, Markey SP, Roth J et al (2016) A description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) common data analysis pipeline. J Proteome Res 15:1023–1032. https://doi.org/10.1021/acs.jproteome.5b01091
Edwards NJ, Oberti M, Thangudu RR et al (2015) The CPTAC data portal: a resource for cancer proteomics research. J Proteome Res 14:2707–2713. https://doi.org/10.1021/pr501254j
Wilhelm M, Schlegl J, Hahne H et al (2014) Mass-spectrometry-based draft of the human proteome. Nature 509:582–587
Zolg DP, Wilhelm M, Schnatbaum K et al (2017) Building ProteomeTools based on a complete synthetic human proteome. Nat Meth 14:259–262
Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567. https://doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
Fagerberg L, Hallström BM, Oksvold P et al (2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 13:397–406. https://doi.org/10.1074/mcp.M113.035600
Kim M-S, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509:575–581
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989. https://doi.org/10.1016/1044-0305(94)80016-2
Wang M, Weiss M, Simonovic M et al (2012) PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics 11:492–500. https://doi.org/10.1074/mcp.O111.014704
Szklarczyk D, Franceschini A, Wyder S et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452. https://doi.org/10.1093/nar/gku1003
Wang M, Herrmann CJ, Simonovic M et al (2015) Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 15:3163–3168. https://doi.org/10.1002/pmic.201400441
Schaab C, Geiger T, Stoehr G et al (2012) Analysis of high accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics 11:M111.014068. https://doi.org/10.1074/mcp.M111.014068
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/nbt.1511
Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13. https://doi.org/10.1186/s13059-016-0881-8
Bittremieux W, Meysman P, Martens L et al (2016) Unsupervised quality assessment of mass spectrometry proteomics experiments by multivariate quality control metrics. J Proteome Res 15:1300–1307. https://doi.org/10.1021/acs.jproteome.6b00028
Bittremieux W, Walzer M, Tenzer S et al (2017) The human proteome organization-proteomics standards initiative quality control working group: making quality control more accessible for biological mass spectrometry. Anal Chem 89:4474–4479. https://doi.org/10.1021/acs.analchem.6b04310
Bantscheff M, Lemeer S, Savitski MM, Kuster B (2012) Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 404:939–965. https://doi.org/10.1007/s00216-012-6203-4
Domon B, Aebersold R (2010) Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 28:710–721
Shi T, Song E, Nie S et al (2016) Advances in targeted proteomics and applications to biomedical research. Proteomics 16:2160–2182. https://doi.org/10.1002/pmic.201500449
Hu A, Noble WS, Wolf-Yadlin A (2016) Technical advances in proteomics: new developments in data-independent acquisition. F1000Res 5. pii: F1000 Faculty Rev-419. https://doi.org/10.12688/f1000research.7042.1
Yates JR, Ruse CI, Nakorchevsky A (2009) Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng 11:49–79. https://doi.org/10.1146/annurev-bioeng-061008-124934
Tu C, Sheng Q, Li J et al (2015) Optimization of search engines and postprocessing approaches to maximize peptide and protein identification for high-resolution mass data. J Proteome Res 14:4662–4673. https://doi.org/10.1021/acs.jproteome.5b00536
Shteynberg D, Nesvizhskii AI, Moritz RL, Deutsch EW (2013) Combining results of multiple search engines in proteomics. Mol Cell Proteomics 12:2383–2393. https://doi.org/10.1074/mcp.R113.027797
Ting L, Cowley MJ, Hoon SL et al (2009) Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol Cell Proteomics 8:2227–2242. https://doi.org/10.1074/mcp.M800462-MCP200
Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13(Suppl 16):S5. https://doi.org/10.1186/1471-2105-13-S16-S5
Välikangas T, Suomi T, Elo LL et al (2016) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 86:bbw095. https://doi.org/10.1093/bib/bbw095
Arike L, Valgepea K, Peil L et al (2012) Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. J Proteome 75:5437–5448. https://doi.org/10.1016/j.jprot.2012.06.020
Taverner T, Karpievitch YV, Polpitiya AD et al (2012) DanteR: an extensible R-based tool for quantitative analysis of -omics data. Bioinformatics 28:2404–2406. https://doi.org/10.1093/bioinformatics/bts449
Chawade A, Alexandersson E, Levander F (2014) Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. J Proteome Res 13:3114–3120. https://doi.org/10.1021/pr401264n
Pedrioli PGA, Eng JK, Hubley R et al (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22:1459–1466. https://doi.org/10.1038/nbt1031
Chambers MC, Maclean B, Burke R et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30:918–920. https://doi.org/10.1038/nbt.2377
Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536. https://doi.org/10.1093/bioinformatics/btn323
Perez-Riverol Y, Wang R, Hermjakob H et al (2014) Open source libraries and frameworks for mass spectrometry based proteomics: a developer’s perspective. Biochim Biophys Acta 1844:63–76. https://doi.org/10.1016/j.bbapap.2013.02.032
Walzer M, Qi D, Mayer G et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12:2332–2340. https://doi.org/10.1074/mcp.O113.028506
Jarnuczak AF, Lee DCH, Lawless C et al (2016) Analysis of intrinsic peptide detectability via integrated label-free and SRM-based absolute quantitative proteomics. J Proteome Res 15:2945–2959. https://doi.org/10.1021/acs.jproteome.6b00048
Falick AM, Lane WS, Lilley KS et al (2011) ABRF-PRG07: advanced quantitative proteomics study. J Biomol Tech 22:21–26
Selevsek N, Chang C-Y, Gillet LC et al (2015) Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-mass spectrometry. Mol Cell Proteomics 14:739–749. https://doi.org/10.1074/mcp.M113.035550
Lee MV, Topper SE, Hubler SL et al (2011) A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol Syst Biol 7:514. https://doi.org/10.1038/msb.2011.48
Goveia J, Pircher A, Conradi L et al (2016) Meta-analysis of clinical metabolic profiling studies in cancer: challenges and opportunities. EMBO Mol Med 8:1134–1142
Griss J, Perez-Riverol Y, Hermjakob H, Vizcaíno JA (2015) Identifying novel biomarkers through data mining—a realistic scenario? Proteomics Clin Appl 9:437–443. https://doi.org/10.1002/prca.201400107
Brazma A, Hingamp P, Quackenbush J et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371. https://doi.org/10.1038/ng1201-365
Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893. https://doi.org/10.1038/nbt1329
Martínez-Bartolomé S, Deutsch EW, Binz P-A et al (2013) Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. J Proteome 95:84–88. https://doi.org/10.1016/j.jprot.2013.02.026
Acknowledgements
The authors want to acknowledge financial support from the Wellcome Trust [grant numbers WT101477MA and 208391/Z/17/Z] and from EMBL core funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Jarnuczak, A.F., Ternent, T., Vizcaíno, J.A. (2019). Quantitative Proteomics Data in the Public Domain: Challenges and Opportunities. In: Evans, C., Wright, P., Noirel, J. (eds) Mass Spectrometry of Proteins. Methods in Molecular Biology, vol 1977. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9232-4_14
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9232-4_14
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-9231-7
Online ISBN: 978-1-4939-9232-4
eBook Packages: Springer Protocols