Addressing the problems with life-science databases for traditional uses and systems biology

Philippi, Stephan; Köhler, Jacob

doi:10.1038/nrg1872

Addressing the problems with life-science databases for traditional uses and systems biology

Opinion
Published: 09 May 2006

Volume 7, pages 482–488, (2006)
Cite this article

From

View current issue Sign up to alerts

Stephan Philippi¹ &
Jacob Köhler²

752 Accesses
59 Citations
Explore all metrics

Abstract

A prerequisite to systems biology is the integration of heterogeneous experimental data, which are stored in numerous life-science databases. However, a wide range of obstacles that relate to access, handling and integration impede the efficient use of the contents of these databases. Addressing these issues will not only be essential for progress in systems biology, it will also be crucial for sustaining the more traditional uses of life-science databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 1: Classical and systems biology roles of life-science databases.**

**Figure 2: The database integration process: a database warehouse as an example.**

**Figure 3: Alternative representations of metabolic pathways: alcohol dehydrogenase as an example.**

Data Management in Computational Systems Biology: Exploring Standards, Tools, Databases, and Packaging Best Practices

An Insight of Biological Databases Used in Bioinformatics

Semantic Integration and Enrichment of Heterogeneous Biological Databases

References

Kitano, H. Systems biology: a brief overview. Science 295, 1662–1664 (2002).
Article CAS PubMed Google Scholar
Pennisi, E. How will big pictures emerge from a sea of biological data? Science 309, 94 (2005).
Article CAS PubMed Google Scholar
Roos, D. S. Computational biology. Bioinformatics — trying to swim in a sea of data. Science 291, 1260–1261 (2001).
Article CAS PubMed Google Scholar
Augen, J. Information technology to the rescue! Nature Biotechnol. 19, BE39–BE40 (2001).
Article CAS Google Scholar
Ge, H., Walhout, A. J. & Vidal, M. Integrating 'omic' information: a bridge between genomics and systems biology. Trends Genet. 19, 551–560 (2003).
Article CAS PubMed Google Scholar
Carel, R. Practical data integration in biopharmaceutical research and development. PharmaGenomics 22–35 (June 2003).
Galperin, M. Y. The Molecular Biology Database Collection: 2006 update. Nucleic Acids Res. 34, D3–D5 (2006).
Article CAS PubMed Google Scholar
Cerami, E. Web services essentials (O'Reilly, Beijing; Sebastopol, California, 2002).
Google Scholar
Sugawara, H. & Miyazaki, S. Biological SOAP servers and web services provided by the public sequence data bank. Nucleic Acids Res. 31, 3836–3839 (2003).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004).
Article CAS PubMed PubMed Central Google Scholar
Pillai, S. et al. SOAP-based services provided by the European Bioinformatics Institute. Nucleic Acids Res. 33, W25–W28 (2005).
Article CAS PubMed PubMed Central Google Scholar
Stajich, J. E. et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12, 1611–1618 (2002).
Article CAS PubMed PubMed Central Google Scholar
Mangalam, H. The Bio ^* toolkits — a brief overview. Brief. Bioinformatics 3, 296–302 (2002).
Article PubMed Google Scholar
Wang, L., Riethoven, J. J. & Robinson, A. XEMBL: distributing EMBL data in XML format. Bioinformatics 18, 1147–1148 (2002).
Article CAS PubMed Google Scholar
Bairoch, A. et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005).
Article CAS PubMed Google Scholar
Luciano, J. S. PAX of mind for pathway researchers. Drug Discov. Today 10, 937–942 (2005).
Article CAS PubMed Google Scholar
Lloyd, C. M., Halstead, M. D. & Nielsen, P. F. CellML: its future, present and past. Prog. Biophys. Mol. Biol. 85, 433–450 (2004).
Article CAS PubMed Google Scholar
Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).
Orchard, S. et al. Further steps in standardisation. Report of the second annual Proteomics Standards Initiative Spring Workshop (Siena, Italy 17–20th April 2005). Proteomics 5, 3552–3555 (2005).
Article CAS PubMed Google Scholar
Hucka, M. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).
Article CAS PubMed Google Scholar
Green, M. L. & Karp, P. D. Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers. Nucleic Acids Res. 33, 4035–4039 (2005).
Article CAS PubMed PubMed Central Google Scholar
Stevens, R. et al. TAMBIS: transparent access to multiple bioinformatics information sources. Bioinformatics 16, 184–185 (2000).
Article CAS PubMed Google Scholar
Köhler, J., Philippi, S. & Lange, M. SEMEDA: ontology based semantic integration of biological databases. Bioinformatics 19, 2420–2427 (2003).
Article PubMed Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).
Article CAS PubMed Google Scholar
Philippi, S. & Köhler, J. Using XML technology for the ontology-based semantic integration of life science databases. IEEE Trans. Inf. Technol. Biomed. 8, 154–160 (2004).
Article PubMed Google Scholar
NC-IUBMB. Enzyme Nomenclature 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes (Academic Press, San Diego, 1992).
Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32, D35–D40 (2004).
Article CAS PubMed PubMed Central Google Scholar
Hendler, J. Communication. Science and the semantic web. Science 299, 520–521 (2003).
Article CAS PubMed Google Scholar
Noble, D. Will genomics revolutionise pharmaceutical R&D? Trends Biotechnol. 21, 333–337 (2003).
Article CAS PubMed Google Scholar
Smith, B., Köhler, J. & Kumar, A. On the application of formal principles to life science data: a case study in the gene ontology. Proc. Data Integr. Life Sci. First Int. Workshop 79–94 (2004).
Zhang, S. & Bodenreider, O. Law and order: assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy. Comput. Biol. Med. 6 Sep 2005 (doi:10.1016/j.compbiomed.2005.04.007).
van Helden, J. et al. Representing and analysing molecular and cellular function using the computer. Biol. Chem. 381, 921–935 (2000).
CAS PubMed Google Scholar
Bornberg-Bauer, E. & Paton, N. W. Conceptual data modelling for bioinformatics. Brief. Bioinformatics 3, 166–180 (2002).
Article CAS PubMed Google Scholar
Nelson, M. R., Reisinger, S. J. & Henry, S. G. Designing databases to store biological information. BioSilico 1, 134–142 (2003).
Article CAS Google Scholar
Taylor, C. F. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnol. 21, 247–254 (2003).
Article CAS Google Scholar
Ma, Z. & Chen, J. (eds) Database Modeling in Biology: Practices and Challenges (Springer, in the press).
Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005).
Article CAS PubMed PubMed Central Google Scholar
Searls, D. B. Data integration — connecting the dots. Nature Biotechnol. 21, 844–845 (2003).
Article CAS Google Scholar
Karp, P. D. What we do not know about sequence analysis and sequence databases. Bioinformatics 14, 753–754 (1998).
Article CAS PubMed Google Scholar
Camon, E. et al. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32, D262–D266 (2004).
Article CAS PubMed PubMed Central Google Scholar
Gattiker, A. et al. Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27, 49–58 (2003).
Article CAS PubMed Google Scholar
Garcia-Berthou, E. & Alcaraz, C. Incongruence between test statistics and P values in medical papers. BMC Med. Res. Methodol. 4, 13 (2004).
Article PubMed PubMed Central Google Scholar
Mecham, B. H. et al. Increased measurement accuracy for sequence-verified microarray probes. Physiol. Genomics 18, 308–315 (2004).
Article CAS PubMed Google Scholar
Ntzani, E. E. & Ioannidis, J. P. Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362, 1439–1444 (2003).
Article CAS PubMed Google Scholar
Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).
Article CAS PubMed Google Scholar
Müller, H., Naumann, F. & Freytag, J.-C. Data quality in genome databases. Proc. Conf. Inf. Qual. (IQ 03) 269–284 (2003).
Iliopoulos, I. et al. Evaluation of annotation strategies using an entire genome sequence. Bioinformatics 19, 717–726 (2003).
Article CAS PubMed Google Scholar
Leser, U. & Hakenberg, J. What makes a gene name? Named entity recognition in the biomedical literature. Brief. Bioinformatics 6, 357–369 (2005).
Article CAS PubMed Google Scholar
Resnik, D. B. Strengthening the United States' database protection laws: balancing public access and private control. Sci. Eng. Ethics 9, 301–318 (2003).
Article PubMed Google Scholar
Maurer, S. M., Hugenholtz, P. B. & Onsrud, H. J. Intellectual property. Europe's database experiment. Science 294, 789–790 (2001).
Article CAS PubMed Google Scholar
Merali, Z. & Giles, J. Databases in peril. Nature 435, 1010–1011 (2005).
Article CAS PubMed Google Scholar
Ellis, L. B. & Kalumbi, D. The demise of public data on the web? Nature Biotechnol. 16, 1323–1324 (1998).
Article CAS Google Scholar
Greenbaum, D. & Gerstein, M. A universal legal framework as a prerequisite for database interoperability. Nature Biotechnol. 21, 979–982 (2003).
Article CAS Google Scholar
Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001).
Article CAS PubMed Google Scholar
Bourne, P. Will a biological database be different from a biological journal? PLoS Comput. Biol. 1, 179–181 (2005).
CAS PubMed Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central Google Scholar
Rother, K. et al. Columba: multidimensional data integration of protein annotations. Proc. Data Integr. Life Sci. First Int. Workshop 156–171 (2004).
Zdobnov, E. M., Lopez, R., Apweiler, R. & Etzold, T. The EBI SRS server — recent developments. Bioinformatics 18, 368–373 (2002).
Article CAS PubMed Google Scholar
Haas, L. M. et al. DiscoveryLink: a system for integrated access to life sciences data sources. IBM Syst. J. 40, 489–511 (2001).
Article Google Scholar
Köhler, J. et al. Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalised Data Structures. In Silico Biol. 5, 33–44 (2004).
Google Scholar
Stein, L. D. Integrating biological databases. Nature Rev. Genet. 4, 337–345 (2003).
Article CAS PubMed Google Scholar
Köhler, J. Integration of life science databases. Drug Discov. Today 2, 61–69 (2004).
Article Google Scholar
Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).
Article CAS PubMed PubMed Central Google Scholar
Kolchanov, N. A. et al. Transcription Regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids Res. 30, 312–317 (2002).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to thank C. Rawlings and P. Verrier for commenting on an earlier version of this article. Furthermore we would like to thank the following individuals for exploring with us the pitfalls of life-science databases over the past years: J. Baumbach, J. Butz, E. Kirchem, F. Klingert, S. Knop, B. Kormeier, I. Kupp, A. Neu, A. Rüegg, A. Skusa, B. Steuernagel, J. Taubert, P. Verrier and R. Winnenburg. S.P. gratefully acknowledges funding by the European Science Foundation. Rothamsted Research receives grant-aided support from the UK Biotechnological and Biological Science Research Council.

Author information

Authors and Affiliations

Stephan Philippi is at the Department of Computer Science, University of Koblenz, PO Box 201602, Koblenz, 56016, Germany
Stephan Philippi
Jacob Köhler is at the Biomathematics and Bioinformatics Division, Rothamsted Research, Harpenden, AL5 2JQ, Hertfordshire, UK
Jacob Köhler

Authors

Stephan Philippi
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Köhler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Philippi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Controlled vocabulary: A standardized set of terms that can be used in a given application domain. A prominent example is the enzyme class nomenclature, which describes classes of biochemical reaction.
Database management system: A system that provides a means of storing, modifying and extracting data from a database.
Evidence code: A controlled vocabulary that is used to track the types of evidence that support a gene annotation.
Flat file: Human readable, non-standardized files that can be used to exchange the contents of life-science databases.
Ontology: A commonly agreed definition of real-world concepts, such as 'protein' and 'enzyme', and their particular relationships, for example, an enzyme 'is a' protein.
Parser: Software that reads a given input, such as a flat file, for further processing.
Web service: A standardized way to allow for interoperable machine-to-machine interaction over a network.
XML: The extensible markup language (XML) is a standard for the creation of application-specific, self-descriptive markup languages, which, for example, can be used for the definition of data-exchange formats.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Philippi, S., Köhler, J. Addressing the problems with life-science databases for traditional uses and systems biology. Nat Rev Genet 7, 482–488 (2006). https://doi.org/10.1038/nrg1872

Download citation

Published: 09 May 2006
Issue Date: 01 June 2006
DOI: https://doi.org/10.1038/nrg1872
Springer Nature Limited

This article is cited by

Network-based modeling of drug effects on disease module in systemic sclerosis
- Ki-Jo Kim
- Su-Jin Moon
- Ilias Tagkopoulos
Scientific Reports (2020)
Brain Radiation Information Data Exchange (BRIDE): integration of experimental data from low-dose ionising radiation research for pathway discovery
- Christos Karapiperis
- Stefan J. Kempf
- Christos A. Ouzounis
BMC Bioinformatics (2016)
ONTO-ToolKit: enabling bio-ontology engineering via Galaxy
- Erick Antezana
- Aravind Venkatesan
- Martin Kuiper
BMC Bioinformatics (2010)
Data recovery and integration from public databases uncovers transformation-specific transcriptional downregulation of cAMP-PKA pathway-encoding genes
- Chiara Balestrieri
- Lilia Alberghina
- Ferdinando Chiaradonna
BMC Bioinformatics (2009)
An XML transfer schema for exchange of genomic and genetic mapping data: implementation as a web service in a Taverna workflow
- Trevor Paterson
- Andy Law
BMC Bioinformatics (2009)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Addressing the problems with life-science databases for traditional uses and systems biology

From

Abstract

Access this article

Similar content being viewed by others

Data Management in Computational Systems Biology: Exploring Standards, Tools, Databases, and Packaging Best Practices

An Insight of Biological Databases Used in Bioinformatics

Semantic Integration and Enrichment of Heterogeneous Biological Databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Network-based modeling of drug effects on disease module in systemic sclerosis

Brain Radiation Information Data Exchange (BRIDE): integration of experimental data from low-dose ionising radiation research for pathway discovery

ONTO-ToolKit: enabling bio-ontology engineering via Galaxy

Data recovery and integration from public databases uncovers transformation-specific transcriptional downregulation of cAMP-PKA pathway-encoding genes

An XML transfer schema for exchange of genomic and genetic mapping data: implementation as a web service in a Taverna workflow

Navigation

Addressing the problems with life-science databases for traditional uses and systems biology

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation