Abstract
Functional proteomics can be defined as a strategy to couple proteomic information with biochemical and physiological analyses with the aim of understanding better the functions of proteins in normal and diseased organs. In recent years, a variety of publicly available bioinformatics databases have been developed to support protein-related information management and biological knowledge discovery. In addition to being used to annotate the proteome, these resources also offer the opportunity to develop global approaches to the study of the functional role of proteins both in health and disease. Here, we present a comprehensive review of the major human protein bioinformatics databases. We conclude this review by discussing a few examples that illustrate the importance of these databases in functional proteomics research.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Godovac-Zimmermann J, Brown L R. Perspectives for mass spectrometry and functional proteomics. Mass Spectrom Rev, 2001, 20: 1–57
Gavin A C, Bosche M, Krause R, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 2002, 415: 141–147
Monti M, Orru S, Pagnozzi D, et al. Functional proteomics. Clinica Chimica Acta, 2005, 357: 140–150
Pruitt K D, Tatusova T, Maglott D R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res, 2007, 35: D61–D65
The UniProt Consortium. The universal protein resource (UniProt) in 2010. Nucleic Acids Res, 2010, 38: D142–D148
Sickmeier M, Hamilton J A, LeGall T, et al. DisProt: the database of disordered proteins. Nucl Acids Res, 2007, 35: D786–D793
Sayers E W, Barrett T, Benson D A, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, 2007, 35: D5–D12
Leinonen R, Diez F G, Binns D, et al. UniProt archive. Bioinformatics, 2004, 20: 3236–3237
Suzek B E, Huang H, McGarvey P, et al. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics, 2007, 23: 1282–1288
Rebhan M. Protein sequence databases. Methods Mol Biol, 2010, 609: 45–57
Hornbeck P V, Chabra I, Kornhauser J M, et al. PhosphoSite: a bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics, 2004, 4: 1551–1561
Wang Y, Addess K J, Chen J, et al. MMDB: annotating protein sequences with Entrez’s 3D-structure database. Nucleic Acids Res, 2007, 35: D298–D300
Berman H, Henrick K, Nakamura H, et al. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res, 2007, 35: D301–D303
Pieper U, Eswar N, Webb B M, et al. MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res, 2009, 37: D347–D354
Kiefer F, Arnold K, Künzli M, et al. The SWISS-MODEL repository and associated resources. Nucleic Acids Res, 2009, 37: D387–D392
Cuff A L, Sillitoe I, Lewis T, et al. The CATH classification revisited-architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res, 2009, 37: D310–D314
Andreeva A, Howorth D, Chandonia J M, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res, 2008, 36: D419–D425
Bogatyreva N S, Osypov A A, Ivankov D N. KineticDB: a database of protein folding kinetics. Nucleic Acids Res, 2009, 37: D342–D346
Garavelli J S. The RESID database of protein modifications as a resource and annotation tool. Proteomics, 2004, 4: 1527–1533
Zanzoni A, Ausiello G, Via A, et al. Phospho3D: a database of three-dimensional structures of protein phosphorylation sites. Nucleic Acids Res, 2007, 35: D229–D231
Salwinski L, Miller C S, Smith A J, et al. The database of interacting proteins: 2004 update. Nucleic Acids Res, 2004, 32: D449–D451
Zanzoni A, Montecchi-Palazzi L, Quondam M, et al. MINT: a Molecular INTeraction database. FEBS Lett, 2002, 513: 135–140
Aranda B, Achuthan P, Alam-Faruque Y, et al. The IntAct molecular interaction database in 2010. Nucleic Acids Res, 2010, 38: D525–D531
Keshava Prasad T S, Goel R, Kandasamy K, et al. Human Protein Reference Database—2009 update. Nucleic Acids Res, 2009, 37: D767–D772
Snel B, Lehmann G, Bork P, et al. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res, 2000, 28: 3442–3444
Chaurasia G, Malhotra S, Russ J, et al. UniHI 4: new tools for query, analysis and visualization of the human protein-protein interactome. Nucleic Acids Res, 2009, 37: D657–D660
Matthews L, Gopinath G, Gillespie M, et al. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res, 2009, 37: D619–D622
Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 2000, 28: 27–30
Finn R D, Tate J, Mistry J, et al. The Pfam protein families database. Nucleic Acids Res, 2008, 36: D281–D288
Letunic I, Doerks T, Bork P. SMART 6: recent updates and new developments. Nucleic Acids Res, 2009, 37: D229–D232
Bru C, Courcelle E, Carrère S, et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res, 2005, 33: D212–D215
Attwood T K. The PRINTS database: a resource for identification of protein families. Brief Bioinform, 2002, 3: 252–263
Hunter S, Apweiler R, Attwood T K, et al. InterPro: the integrative protein signature database. Nucleic Acids Res, 2009, 37: D211–D215
Haft D H, Selengut J D, White O. The TIGRFAMs database of protein families. Nucleic Acids Res, 2003, 31: 371–373
Hulo N, Bairoch A, Bulliard V, et al. The 20 years of PROSITE. Nucleic Acids Res, 2008, 36: D245–D249
Tatusov R L, Fedorova N D, Jackson J D, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 2003, 4: 41–54
Hoogland C, Mostaguir K, Appel R D, et al. The World-2DPAGE Constellation to promote and publish gel-based proteomics data through the ExPASy server. J Proteomics, 2008, 71: 245–248
Craig R, Cortens J C, Fenyo D, et al. Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res, 2006, 5: 1843–1849
Vizcaíno J A, Côté R, Reisinger F, et al. The proteomics identifications database: 2010 update. Nucleic Acids Res, 2009, 38: D736–D742
Deutsch E W, Lam H, Aebersold R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep, 2008, 9: 429–434
Slotta D J, Barrett T, Edgar R. NCBI peptidome: a new public repository for mass spectrometry peptide identifications. Nat Biotechnol, 2009, 27: 600–601
Kasprzyk A. BioMart: driving a paradigm change in biological data management. Database (Oxford), 2011, bar049
Schulz K R, Danna E A, Krutzik P O, et al. Single-cell phospho-protein analysis by flow cytometry. Curr Protoc Immunol, 2007, Chapter 8: Unit 8.17
Fournier F, Guo R, Gardner E M, et al. Biological and biomedical applications of two-dimensional vibrational spectroscopy: proteomics, imaging, and structural analysis. Acc Chem Res, 2009, 42: 1322–1331
Faley S L, Copland M, Wlodkowic D, et al. Microfluidic single cell arrays to interrogate signalling dynamics of individual, patient derived hematopoietic stem cells. Lab Chip, 2009, 9: 2659–2664
Colland F, Jacq X, Trouplin V, et al. Functional proteomics mapping of a human signaling pathway. Genome Res, 2004, 14: 1324–1332
Formstecher E, Aresta S, Collura V, et al. Protein interaction mapping: a Drosophila case study. Genome Res, 2005, 15: 376–384
Dyson H J, Wright P E. According to current textbooks, a well-defined three-dimensional structure is a prerequisite for the function of a protein. Is this correct? IUBMB Life, 2006, 58: 107–109
Radivojac P, Iakoucheva L, Oldfield Christopher, et al. Intrinsic disorder and functional proteomics. Biophys J, 2007, 92: 1439–1456
Sim K L, Uchida T, Miyano S. ProDDO: a database of disordered proteins from the Protein Data Bank (PDB). Bioinformatics, 2001, 17: 379–380
Author information
Authors and Affiliations
Corresponding authors
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Zhang, Y., Zhu, Y. & He, F. An overview of human protein databases and their application to functional proteomics in health and disease. Sci. China Life Sci. 54, 988–998 (2011). https://doi.org/10.1007/s11427-011-4247-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-011-4247-x