Bioinformatics for Comparative Proteomics pp 213-227 | Cite as
Proteomics Databases and Repositories
- 8 Citations
- 2.6k Downloads
Abstract
With the advent of more powerful and sensitive analytical techniques and instruments, the field of mass spectrometry based proteomics has seen a considerable increase in the amount of generated data. Correspondingly, the need to make these data publicly available in centralized online databases has also become more pressing. As a result, several such databases have been created, and steps are currently being taken to integrate these different systems under a single worldwide data-sharing umbrella. This chapter will discuss the importance of such databases and the necessary infrastructure that these databases require for efficient operation. Furthermore, the various kinds of information that proteomics databases can store will be described, along with the different types of databases that are available today. Finally, a selection of prominent repositories will be described in more detail, together with the international ProteomExchange consortium that is aimed at uniting all the different databases in a global data sharing collaboration.
Key words
Proteomics Mass spectrometry Identifications Database Repository ProteomExchangeNotes
Acknowledgements
The author would like to thank Henning Hermjakob and Rolf Apweiler for their support.
References
- 1.Gevaert K., Van Damme P., Ghesquière B., Impens F., Martens L., Helsens K. et al. (2007) A la carte proteomics with an emphasis on gel-free techniques. Proteomics 7, 2698–2718.PubMedCrossRefGoogle Scholar
- 2.Domon B. and Aebersold R. (2006) Mass spectrometry and protein analysis. Science 312, 212–217.PubMedCrossRefGoogle Scholar
- 3.Hubbard T., Aken B., Ayling S., Ballester B., Beal K., Bragin E. et al. (2009) Ensembl 2009. Nucleic Acids Res 37, D690–D607.PubMedCrossRefGoogle Scholar
- 4.The UniProt Consortium (2009) The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res 37, D169–D174.CrossRefGoogle Scholar
- 5.Aebersold R. and Mann M. (2003) Mass spectrometry-based proteomics. Nature 422, 198–207.PubMedCrossRefGoogle Scholar
- 6.Martens L. and Hermjakob H. (2007) Proteomics data validation: why all must provide data. Mol Biosyst 3, 518–522.PubMedCrossRefGoogle Scholar
- 7.Martens L., Nesvizhskii A.I., Hermjakob H., Adamski M., Omenn G.S., Vandekerckhove J. et al. (2005) Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories. Proteomics 5, 3501–3505.PubMedCrossRefGoogle Scholar
- 8.Prince J.T., Carlson M.W., Wang R., Lu P. and Marcotte E.M. (2004) The need for a public proteomics repository. Nat Biotechnol 22, 471–472.PubMedCrossRefGoogle Scholar
- 9.Mead J., Bianco L. and Bessant C. (2009) Recent developments in public proteomic MS repositories and pipelines. Proteomics 9, 861–881.PubMedCrossRefGoogle Scholar
- 10.Bernstein F.C., Koetzle T.F., Williams G.J., Meyer E.F.J., Brice M.D., Rodgers J.R. et al. (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 112, 535–542.PubMedCrossRefGoogle Scholar
- 11.Berman H. (2008) The Protein Data Bank: a historical perspective. Acta Crystallogr 64, 88–95.CrossRefGoogle Scholar
- 12.Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J. et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.PubMedCrossRefGoogle Scholar
- 13.Parkinson H., Kapushesky M., Shojatalab M., Abeygunawardena N., Coulson R., Farne A. et al. (2007) ArrayExpress – a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35, D747–D750.PubMedCrossRefGoogle Scholar
- 14.Berman H., Henrick K., Nakamura H. and Markley J. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35, D301–D303.PubMedCrossRefGoogle Scholar
- 15.Chatr-aryamontri A., Ceol A., Palazzi L., Nardelli G., Schneider M., Castagnoli L. et al. (2007) MINT: the Molecular INTeraction database. Nucleic Acids Res 35, D572–D574.PubMedCrossRefGoogle Scholar
- 16.Kerrien S., Alam-Faruque Y., Aranda B., Bancarz I., Bridge A., Derow C. et al. (2007) IntAct – open source resource for molecular interaction data. Nucleic Acids Res 35, D561–D565.PubMedCrossRefGoogle Scholar
- 17.Degtyarenko K., de Matos P., Ennis M., Hastings J., Zbinden M., McNaught A. et al. (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res 36, D344–D350.PubMedCrossRefGoogle Scholar
- 18.Vizcaíno J., Mueller M., Hermjakob H. and Martens L. (2009) Charting online OMICS resources: a navigational chart for clinical researchers. Proteomics Clin Appl 3, 18–29.PubMedCrossRefGoogle Scholar
- 19.Kapp E.A., Schütz F., Connolly L.M., Chakel J.A., Meza J.E., Miller C.A. et al. (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5, 3475–3490.PubMedCrossRefGoogle Scholar
- 20.Reidegeld K.A., Muller M., Stephan C., Bluggel M., Hamacher M., Martens L. et al. (2006) The power of cooperative investigation: summary and comparison of the HUPO Brain Proteome Project pilot study results. Proteomics 6, 4997–5014.PubMedCrossRefGoogle Scholar
- 21.Klie S., Martens L., Vizcaíno J.A., Côté R., Jones P., Apweiler R. et al. (2008) Analyzing large-scale proteomics projects with latent semantic indexing. J Proteome Res 7, 182–191.PubMedCrossRefGoogle Scholar
- 22.Mueller M., Vizcaíno J.A., Jones P., Côté R., Thorneycroft D., Apweiler R. et al. (2008) Analysis of the experimental detection of central nervous system related genes in human brain and cerebrospinal fluid datasets. Proteomics 8, 1138–1148.PubMedCrossRefGoogle Scholar
- 23.Martens L., Muller M., Stephan C., Hamacher M., Reidegeld K.A., Meyer H.E. et al. (2006) A comparison of the HUPO Brain Proteome Project pilot with other proteomics studies. Proteomics 6, 5076–5086.PubMedCrossRefGoogle Scholar
- 24.Martens L., Orchard S., Apweiler R. and Hermjakob H. (2007) Human Proteome Organization Proteomics Standards Initiative: data standardization, a view on developments and policy. Mol Cell Proteomics 6, 1666–1667.PubMedGoogle Scholar
- 25.Carr S., Aebersold R., Baldwin M., Burlingame A., Clauser K. and Nesvizhskii A. (2004) The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol Cell Proteomics 3, 531–533.PubMedCrossRefGoogle Scholar
- 26.Taylor C.F., Binz P., Aebersold R., Affolter M., Barkovich R., Deutsch E.W. et al. (2008) Guidelines for reporting the use of mass spectrometry in proteomics. Nat Biotechnol 26, 860–861.PubMedCrossRefGoogle Scholar
- 27.Deutsch E. (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8, 2776–2777.PubMedCrossRefGoogle Scholar
- 28.Sadygov R.G., Cociorva D. and Yates J.R. (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1, 195–202.PubMedCrossRefGoogle Scholar
- 29.Nesvizhskii A.I., Vitek O. and Aebersold R. (2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods 4, 787–797.PubMedCrossRefGoogle Scholar
- 30.Keller A., Nesvizhskii A.I., Kolker E. and Aebersold R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74, 5383–5392.PubMedCrossRefGoogle Scholar
- 31.Helsens K., Timmerman E., Vandekerckhove J., Gevaert K. and Martens L. (2008) Peptizer: A tool for assessing false positive peptide identifications and manually validating selected results. Mol Cell Proteomics 7, 2364–2372.PubMedCrossRefGoogle Scholar
- 32.Nesvizhskii A.I. and Aebersold R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4, 1419–1440.PubMedCrossRefGoogle Scholar
- 33.Babnigg G. and Giometti C.S. (2006) A database of unique protein sequence identifiers for proteome studies. Proteomics 6, 4514–4522.PubMedCrossRefGoogle Scholar
- 34.Côté R.G., Jones P., Martens L., Kerrien S., Reisinger F., Lin Q. et al. (2007) The Protein Identifier Cross-Reference (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinformatics 8, 401.PubMedCrossRefGoogle Scholar
- 35.Panchaud A., Affolter M., Moreillon P. and Kussmann M. (2008) Experimental and computational approaches to quantitative proteomics: status quo and outlook. J Proteomics 71, 19–33.PubMedCrossRefGoogle Scholar
- 36.Mueller L.N., Brusniak M., Mani D.R. and Aebersold R. (2008) An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. J Proteome Res 7, 51–61.PubMedCrossRefGoogle Scholar
- 37.Siepen J.A., Swainston N., Jones A.R., Hart S.R., Hermjakob H., Jones P. et al. (2007) An informatic pipeline for the data capture and submission of quantitative proteomic data using iTRAQTM. Proteome Sci 5, 4.PubMedCrossRefGoogle Scholar
- 38.Klammer A.A., Reynolds S.M., Bilmes J.A., MacCoss M.J. and Noble W.S. (2008) Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification. Bioinformatics 24, i348–i356.PubMedCrossRefGoogle Scholar
- 39.Mallick P., Schirle M., Chen S.S., Flory M.R., Lee H., Martin D. et al. (2007) Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol 25, 125–131.PubMedCrossRefGoogle Scholar
- 40.Anonymous. (2008) Thou shalt share your data. Nat Methods 5, 209–209.Google Scholar
- 41.Anonymous. (2007) Democratizing proteomics data. Nat Biotechnol 25, 262.Google Scholar
- 42.Anonymous. (2007) Compete, collaborate, compel. Nat Genet 39, 931.Google Scholar
- 43.Mead J.A., Shadforth I.P. and Bessant C. (2007) Public proteomic MS repositories and pipelines: available tools and biological applications. Proteomics 7, 2769–2786.PubMedCrossRefGoogle Scholar
- 44.Craig R., Cortens J. and Beavis R. (2004) Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 3, 1234–1242.PubMedCrossRefGoogle Scholar
- 45.Craig R. and Beavis R.C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467.PubMedCrossRefGoogle Scholar
- 46.Desiere F., Deutsch E.W., Nesvizhskii A.I., Mallick P., King N.L., Eng J.K. et al. (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol 6, R9.PubMedCrossRefGoogle Scholar
- 47.Lam H., Deutsch E.W., Eddes J.S., Eng J.K., King N., Stein S.E. et al. (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667.PubMedCrossRefGoogle Scholar
- 48.Deutsch E., Lam H. and Aebersold R. (2008) PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep 9, 429–434.PubMedCrossRefGoogle Scholar
- 49.Van P.T., Schmid A.K., King N.L., Kaur A., Pan M., Whitehead K. et al. (2008) Halobac-terium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage. J Proteome Res 7, 3755–3764.PubMedCrossRefGoogle Scholar
- 50.Loevenich S.N., Brunner E., King N.L., Deutsch E.W., Stein S.E. et al. (2009) The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation. BMC Bioinformatics 10, 59.PubMedCrossRefGoogle Scholar
- 51.Deutsch E.W., Eng J.K., Zhang H., King N.L., Nesvizhskii A.I., Lin B. et al. (2005) Human Plasma PeptideAtlas. Proteomics 5, 3497–3500.PubMedCrossRefGoogle Scholar
- 52.Martens L., Hermjakob H., Jones P., Adamski M., Taylor C., States D. et al. (2005) PRIDE: the proteomics identifications database. Proteomics 5, 3537–3545.PubMedCrossRefGoogle Scholar
- 53.Barsnes H., Vizcaíno J.A., Eidhammer I. and Martens L. (2009) PRIDE Converter: making proteomics data-sharing easy. Nat Biotechnol 27, 598–599.PubMedCrossRefGoogle Scholar
- 54.Jones P., Cote R., Cho S., Klie S., Martens L., Quinn A. et al. (2008) PRIDE: new developments and new datasets. Nucleic Acids Res 36, D878–D883.PubMedCrossRefGoogle Scholar
- 55.Mathivanan S., Ahmed M., Ahn N.G., Alexandre H., Amanchy R., Andrews P.C. et al. (2008) Human Proteinpedia enables sharing of human protein data. Nat Biotechnol 26, 164–167.PubMedCrossRefGoogle Scholar
- 56.Mishra G.R., Suresh M., Kumaran K., Kannabiran N., Suresh S., Bala P. et al. (2006) Human protein reference database – 2006 update. Nucleic Acids Res 34, D411–D414.PubMedCrossRefGoogle Scholar
- 57.Slotta D.J., Barrett T. and Edgar R. (2009) NCBI Peptidome: a new public repository for mass spectrometry peptide identifications. Nat Biotechnol 27, 600–601.PubMedCrossRefGoogle Scholar
- 58.Falkner J.A., Hill J.A. and Andrews P.C. (2008) Proteomics FASTA archive and reference resource. Proteomics 8, 1756–1757.PubMedCrossRefGoogle Scholar
- 59.Hermjakob H. and Apweiler R. (2006) The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible. Expert Rev Proteomics 3, 1–3.PubMedCrossRefGoogle Scholar