Biological Databases

Abstract

Biological databases constitute the data layer of molecular biology and bioinformatics and are becoming a central component of some emerging fields such as clinical bioinformatics, and translational and personalized medicine. The building of biological databases has been conducted either considering the different representations of molecular entities, such as sequences and structures, or more recently by taking into account high-throughput platforms used to investigate cells and organisms, such as microarray and mass spectrometry technologies. This chapter provides an overview of the main biological databases currently available and underlines open problems and future trends.

This chapter reports on examples of existing biological databases with information about their use and application for the life sciences. We cover examples in the areas of sequence, interactomics, and proteomics databases. In particular, Sect. 26.1 discusses sequence databases, Sect. 26.2 presents structure databases including protein contact maps, Sect. 26.3 introduces a novel class of databases representing the interactions among proteins, Sect. 26.4 describes proteomics databases, an area of biological databases that is being continuously enriched by proteomics experiments, and finally Sect. 26.5 concludes the chapter by underlining future developments and the evolution of biological databases.

Keywords

Protein Data Bank Biological Database European Molecular Biology Laboratory Flat File Protein Data Bank Entry 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Abbreviations

2-D

two-dimensional

3-D

three-dimensional

API

application programming interface

BLAST

basic local alignment search tool

BPRG

Biomedical Proteomics Research Group

CATH

class, architecture, topology, and homologous

DDBJ

DNA Data Bank of Japan

DNA

deoxyribonucleic acid

EBI

European Bioinformatics Institute

EMBL

European Molecular Biology Laboratory

EST

expressed sequence tag

GPM

Global Proteome Machine

HUPO-PSI

HUPO–Proteomics Standards Initiative

IM

images line

LC

liquid chromatography

MALDI

matrix assisted laser desorption/ionization

MIAPE

minimum information about a proteomics experiment

MINT

Molecular INTeraction

MS/MS

tandem mass spectrometry

MS

mass spectrometry

MT

master line

NCBI

National Center for Biotechnology Information

NMR

nuclear magnetic resonance

PAGE

polyacrylamide gel electrophoresis

PDB

protein data bank

PIR

protein information resource project

PPI

protein–protein interaction

PRIDE

Proteomics Identifications Database

PSI

proteomics standard initiative

SBEAMS

Systems Biology Experiment Analysis Management System

SCOP

structural classification of proteins

SELDI

surface-enhanced laser desorption ionization

SIB

Swiss Institute of Bioinformatics

SQL

structured query language

SRS

sequence retrieval system

SSAP

sequential structure alignment program for protein structure comparison

TOF

time-of-flight

TPA

third party annotation

XIAPE

XML information about a proteomics experiment

XML

extensible markup language

ftp

file transfer protocol

mmCIF

macromolecular crystallographic information file

mzData

mass spectrometry data

References

  1. 26.1.
    R. Matthiesen: Methods, algorithms and tools in computational proteomics: A practical point of view, Proteomics 7(16), 2815–2832 (2007)CrossRefGoogle Scholar
  2. 26.2.
    EMBL Nucleotide Sequence (European Molecular Biology Laboratory, EMBL Heidelberg, Heidelberg) available online at http://www.ebi.ac.uk/embl
  3. 26.3.
    B. Boeckmann, A. Bairoch, R. Apweiler, M.-C.C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. OʼDonovan, I. Phan, S. Pilbout, M. Schneider: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31(1), 365–370 (2003)CrossRefGoogle Scholar
  4. 26.4.
    GenBank database (National Center for Biotechnology Information, National Library of Medicine, Bethesda) USA available online at www.ncbi.nlm.nih.gov/genbank/
  5. 26.5.
    D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler: GenBank, Nucleic Acids Res. 36, D25–D30 (2008)CrossRefGoogle Scholar
  6. 26.6.
    W.C. Barker, J.S. Garavelli, P.B. Mcgarvey, C.R. Marzec, B.C. Orcutt, G.Y. Srinivasarao, L.S. Yeh, R.S. Ledley, H.W. Mewes, F. Pfeiffer, A. Tsugita, C. Wu: The PIR-international protein sequence database, Nucleic Acids Res. 27(1), 39–43 (1999)CrossRefGoogle Scholar
  7. 26.7.
    The UniProt Consortium: The Universal Protein Resource (UniProt) in 2010, Nucleic Acids Res. 38(suppl 1), D142–D148 (2010)CrossRefGoogle Scholar
  8. 26.8.
    H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The protein data bank, Nucleic Acids Res. 28(1), 235–242 (2000)CrossRefGoogle Scholar
  9. 26.9.
    T.J.P. Hubbard, A.G. Murzin, S.E. Brenner, C. Chothia: SCOP: A structural classification of proteins database, Nucleic Acids Res. 25(1), 236–239 (1997)CrossRefGoogle Scholar
  10. 26.10.
    C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton: CATH – a hierarchic classification of protein domain structures, Structure 5(8), 1093–1108 (1997)CrossRefGoogle Scholar
  11. 26.11.
    C. Orengo, W. Taylor: SSAP: Sequential structure alignment program for protein structure comparison. In: Computer Methods for Macromolecular Sequence Analysis, Methods in Enzymology, Vol. 266, ed. by S.P. Colowick, R.F. Doolittle, N.O. Kaplan (Academic, New York 1996) pp. 617–635CrossRefGoogle Scholar
  12. 26.12.
    M. Vendruscolo, E. Kussell, E. Domany: Recovery of protein structure from contact maps, Fold. Des. 2(5), 295–306 (1997)CrossRefGoogle Scholar
  13. 26.13.
    I. Walsh, D. Baú, A.J.M. Martin, C. Mooney, A. Vullo, G. Pollastri: Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks, BMC Struct. Biol. 9(1), 5 (2009)CrossRefGoogle Scholar
  14. 26.14.
    P. Chen, C. Liu, L. Burge, M. Mohammad, B. Southerland, C. Gloster, B. Wang: IRCDB: A database of inter-residues contacts in protein chains, 1st Int. Conf. Adv. Databases (2009) pp. 1–6Google Scholar
  15. 26.15.
    D. Baú, A. Martin, C. Mooney, A. Vullo, I. Walsh, G. Pollastri: Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins, BMC Bioinformatics 7, 1–8 (2006)CrossRefGoogle Scholar
  16. 26.16.
    R.M. MacCallum: Striped sheets and protein contact prediction, Bioinformatics 20(suppl 1), i224–i231 (2004)CrossRefGoogle Scholar
  17. 26.17.
    B. Rost, M. Punta: PROFcon: Novel prediction of long-range contacts, Bioinformatics 21(9), 2960–2968 (2005)Google Scholar
  18. 26.18.
    H. Hermjakob, L. Montecchi-Palazzi, C. Lewington, S. Mudali, S. Kerrien, S. Orchard, M. Vingron, B. Roechert, P. Roepstorff, A. Valencia, H. Margalit, J. Armstrong, A. Bairoch, G. Cesareni, D. Sherman, R. Apweiler: IntAct: An open source molecular interaction database, Nucleic Acids Res. 1(32), 452–455 (2004)CrossRefGoogle Scholar
  19. 26.19.
    A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, G. Ausiello, M. Helmer-Citterich, G. Cesareni: MINT: A Molecular INTeraction database, FEBS Lett. 513(1), 135–140 (2002)CrossRefGoogle Scholar
  20. 26.20.
    M. Cannataro, P.H. Guzzi, P. Veltri: Protein-to-protein interactions: Technologies, databases, and algorithms, ACM Comput. Surv. 43, 1 (2010)CrossRefGoogle Scholar
  21. 26.21.
    A. Batemen: NAR Database ISSUE, Nucleic Acids Res. 35(Suppl. 1) (2007)Google Scholar
  22. 26.22.
    G. Chaurasia, Y. Iqbal, C. Hanig, H. Herzel, E.E. Wanker, M.E. Futschik: UniHI: An entry gate to the human protein interactome, Nucleic Acids Res. 35(suppl1), D590–594 (2007)CrossRefGoogle Scholar
  23. 26.23.
    S. Zhang, X.-S. Zhang, L. Chen: Biomolecular network querying: A promising approach in systems biology, BMC Syst. Biol. 2(1), 5 (2008)CrossRefGoogle Scholar
  24. 26.24.
    C. Robertson, J.P. Cortens, R.C. Beavis: Open source system for analyzing, validating, and storing protein identification data, J. Proteome Res. 3(6), 1234–1242 (2004)CrossRefGoogle Scholar
  25. 26.25.
    C.F. Taylor, H. Hermjakob, R.K. Julian, J.S. Garavelli, R. Aebersold, R. Apweiler: The work of the human proteome organisationʼs proteomics standards initiative (HUPO PSI), OMICS 10(2), 145–151 (2006)CrossRefGoogle Scholar
  26. 26.26.
    F. Desiere, E.W. Deutsch, N.L. King, A.I. Nesvizhskii, P. Mallick, J. Eng, S. Chen, J. Eddes, S.N. Loevenich, R. Aebersold: The PeptideAtlas project, Nucleic Acids Res. 34(Suppl. 1), D655–D658 (2006)CrossRefGoogle Scholar
  27. 26.27.
    P. Jones, R.G. Côté, L. Martens, A.F. Quinn, C.F. Taylor, W. Derache, H. Hermjakob, R. Apweiler: PRIDE: A public repository of protein and peptide identifications for the proteomics community, Nucleic Acids Res. 34(Suppl. 1), D659–D663 (2006)CrossRefGoogle Scholar
  28. 26.28.
    J.-C. Sanchez, D. Chiappe, V. Converset, C. Hoogland, P.-A. Binz, S. Paesano, R.D. Appel, S. Wang, M. Sennitt, A. Nolan, M.A. Cawthorne, D.F. Hochstrasser: The mouse SWISS-2D PAGE database: A tool for proteomics study of diabetes and obesity, Proteomics 1(1), 136–163 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly
  2. 2.Surgical and Medical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly
  3. 3.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly
  4. 4.Department of Medical and Surgical SciencesUniversity Magna Graecia of CatanzaroCatanzaroItaly

Personalised recommendations