Biological Databases

  • Mario Cannataro
  • Pietro H. Guzzi
  • Giuseppe Tradigo
  • Pierangelo Veltri


Biological databases constitute the data layer of molecular biology and bioinformatics and are becoming a central component of some emerging fields such as clinical bioinformatics, and translational and personalized medicine. The building of biological databases has been conducted either considering the different representations of molecular entities, such as sequences and structures, or more recently by taking into account high-throughput platforms used to investigate cells and organisms, such as microarray and mass spectrometry technologies. This chapter provides an overview of the main biological databases currently available and underlines open problems and future trends.

This chapter reports on examples of existing biological databases with information about their use and application for the life sciences. We cover examples in the areas of sequence, interactomics, and proteomics databases. In particular, Sect. 26.1 discusses sequence databases, Sect. 26.2 presents structure databases including protein contact maps, Sect. 26.3 introduces a novel class of databases representing the interactions among proteins, Sect. 26.4 describes proteomics databases, an area of biological databases that is being continuously enriched by proteomics experiments, and finally Sect. 26.5 concludes the chapter by underlining future developments and the evolution of biological databases.


Protein Data Bank Biological Database European Molecular Biology Laboratory Flat File Protein Data Bank Entry 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.







application programming interface


basic local alignment search tool


Biomedical Proteomics Research Group


class, architecture, topology, and homologous


DNA Data Bank of Japan


deoxyribonucleic acid


European Bioinformatics Institute


European Molecular Biology Laboratory


expressed sequence tag


Global Proteome Machine


HUPO–Proteomics Standards Initiative


images line


liquid chromatography


matrix assisted laser desorption/ionization


minimum information about a proteomics experiment


Molecular INTeraction


tandem mass spectrometry


mass spectrometry


master line


National Center for Biotechnology Information


nuclear magnetic resonance


polyacrylamide gel electrophoresis


protein data bank


protein information resource project


protein–protein interaction


Proteomics Identifications Database


proteomics standard initiative


Systems Biology Experiment Analysis Management System


structural classification of proteins


surface-enhanced laser desorption ionization


Swiss Institute of Bioinformatics


structured query language


sequence retrieval system


sequential structure alignment program for protein structure comparison




third party annotation


XML information about a proteomics experiment


extensible markup language


file transfer protocol


macromolecular crystallographic information file


mass spectrometry data


  1. 26.1.
    R. Matthiesen: Methods, algorithms and tools in computational proteomics: A practical point of view, Proteomics 7(16), 2815–2832 (2007)CrossRefGoogle Scholar
  2. 26.2.
    EMBL Nucleotide Sequence (European Molecular Biology Laboratory, EMBL Heidelberg, Heidelberg) available online at
  3. 26.3.
    B. Boeckmann, A. Bairoch, R. Apweiler, M.-C.C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. OʼDonovan, I. Phan, S. Pilbout, M. Schneider: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31(1), 365–370 (2003)CrossRefGoogle Scholar
  4. 26.4.
    GenBank database (National Center for Biotechnology Information, National Library of Medicine, Bethesda) USA available online at
  5. 26.5.
    D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler: GenBank, Nucleic Acids Res. 36, D25–D30 (2008)CrossRefGoogle Scholar
  6. 26.6.
    W.C. Barker, J.S. Garavelli, P.B. Mcgarvey, C.R. Marzec, B.C. Orcutt, G.Y. Srinivasarao, L.S. Yeh, R.S. Ledley, H.W. Mewes, F. Pfeiffer, A. Tsugita, C. Wu: The PIR-international protein sequence database, Nucleic Acids Res. 27(1), 39–43 (1999)CrossRefGoogle Scholar
  7. 26.7.
    The UniProt Consortium: The Universal Protein Resource (UniProt) in 2010, Nucleic Acids Res. 38(suppl 1), D142–D148 (2010)CrossRefGoogle Scholar
  8. 26.8.
    H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The protein data bank, Nucleic Acids Res. 28(1), 235–242 (2000)CrossRefGoogle Scholar
  9. 26.9.
    T.J.P. Hubbard, A.G. Murzin, S.E. Brenner, C. Chothia: SCOP: A structural classification of proteins database, Nucleic Acids Res. 25(1), 236–239 (1997)CrossRefGoogle Scholar
  10. 26.10.
    C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton: CATH – a hierarchic classification of protein domain structures, Structure 5(8), 1093–1108 (1997)CrossRefGoogle Scholar
  11. 26.11.
    C. Orengo, W. Taylor: SSAP: Sequential structure alignment program for protein structure comparison. In: Computer Methods for Macromolecular Sequence Analysis, Methods in Enzymology, Vol. 266, ed. by S.P. Colowick, R.F. Doolittle, N.O. Kaplan (Academic, New York 1996) pp. 617–635CrossRefGoogle Scholar
  12. 26.12.
    M. Vendruscolo, E. Kussell, E. Domany: Recovery of protein structure from contact maps, Fold. Des. 2(5), 295–306 (1997)CrossRefGoogle Scholar
  13. 26.13.
    I. Walsh, D. Baú, A.J.M. Martin, C. Mooney, A. Vullo, G. Pollastri: Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks, BMC Struct. Biol. 9(1), 5 (2009)CrossRefGoogle Scholar
  14. 26.14.
    P. Chen, C. Liu, L. Burge, M. Mohammad, B. Southerland, C. Gloster, B. Wang: IRCDB: A database of inter-residues contacts in protein chains, 1st Int. Conf. Adv. Databases (2009) pp. 1–6Google Scholar
  15. 26.15.
    D. Baú, A. Martin, C. Mooney, A. Vullo, I. Walsh, G. Pollastri: Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins, BMC Bioinformatics 7, 1–8 (2006)CrossRefGoogle Scholar
  16. 26.16.
    R.M. MacCallum: Striped sheets and protein contact prediction, Bioinformatics 20(suppl 1), i224–i231 (2004)CrossRefGoogle Scholar
  17. 26.17.
    B. Rost, M. Punta: PROFcon: Novel prediction of long-range contacts, Bioinformatics 21(9), 2960–2968 (2005)Google Scholar
  18. 26.18.
    H. Hermjakob, L. Montecchi-Palazzi, C. Lewington, S. Mudali, S. Kerrien, S. Orchard, M. Vingron, B. Roechert, P. Roepstorff, A. Valencia, H. Margalit, J. Armstrong, A. Bairoch, G. Cesareni, D. Sherman, R. Apweiler: IntAct: An open source molecular interaction database, Nucleic Acids Res. 1(32), 452–455 (2004)CrossRefGoogle Scholar
  19. 26.19.
    A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, G. Ausiello, M. Helmer-Citterich, G. Cesareni: MINT: A Molecular INTeraction database, FEBS Lett. 513(1), 135–140 (2002)CrossRefGoogle Scholar
  20. 26.20.
    M. Cannataro, P.H. Guzzi, P. Veltri: Protein-to-protein interactions: Technologies, databases, and algorithms, ACM Comput. Surv. 43, 1 (2010)CrossRefGoogle Scholar
  21. 26.21.
    A. Batemen: NAR Database ISSUE, Nucleic Acids Res. 35(Suppl. 1) (2007)Google Scholar
  22. 26.22.
    G. Chaurasia, Y. Iqbal, C. Hanig, H. Herzel, E.E. Wanker, M.E. Futschik: UniHI: An entry gate to the human protein interactome, Nucleic Acids Res. 35(suppl1), D590–594 (2007)CrossRefGoogle Scholar
  23. 26.23.
    S. Zhang, X.-S. Zhang, L. Chen: Biomolecular network querying: A promising approach in systems biology, BMC Syst. Biol. 2(1), 5 (2008)CrossRefGoogle Scholar
  24. 26.24.
    C. Robertson, J.P. Cortens, R.C. Beavis: Open source system for analyzing, validating, and storing protein identification data, J. Proteome Res. 3(6), 1234–1242 (2004)CrossRefGoogle Scholar
  25. 26.25.
    C.F. Taylor, H. Hermjakob, R.K. Julian, J.S. Garavelli, R. Aebersold, R. Apweiler: The work of the human proteome organisationʼs proteomics standards initiative (HUPO PSI), OMICS 10(2), 145–151 (2006)CrossRefGoogle Scholar
  26. 26.26.
    F. Desiere, E.W. Deutsch, N.L. King, A.I. Nesvizhskii, P. Mallick, J. Eng, S. Chen, J. Eddes, S.N. Loevenich, R. Aebersold: The PeptideAtlas project, Nucleic Acids Res. 34(Suppl. 1), D655–D658 (2006)CrossRefGoogle Scholar
  27. 26.27.
    P. Jones, R.G. Côté, L. Martens, A.F. Quinn, C.F. Taylor, W. Derache, H. Hermjakob, R. Apweiler: PRIDE: A public repository of protein and peptide identifications for the proteomics community, Nucleic Acids Res. 34(Suppl. 1), D659–D663 (2006)CrossRefGoogle Scholar
  28. 26.28.
    J.-C. Sanchez, D. Chiappe, V. Converset, C. Hoogland, P.-A. Binz, S. Paesano, R.D. Appel, S. Wang, M. Sennitt, A. Nolan, M.A. Cawthorne, D.F. Hochstrasser: The mouse SWISS-2D PAGE database: A tool for proteomics study of diabetes and obesity, Proteomics 1(1), 136–163 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly
  2. 2.Surgical and Medical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly
  3. 3.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly
  4. 4.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly

Personalised recommendations