Ontologies in Bioinformatics

  • Robert Stevens
  • Chris Wroe
  • Phillip Lord
  • Carole Goble
Part of the International Handbooks on Information Systems book series (INFOSYS)

Summary

Molecular biology offers a large, complex and volatile domain that tests knowledge representation techniques to the limit of their fidelity, precision, expressivity and adaptability. The discipline of molecular biology and bioinformatics relies greatly on the use of community knowledge, rather than laws and axioms, to further understanding, and knowledge generation. This knowledge has traditionally been kept as natural language. Given the exponential growth of already large quantities of data and associated knowledge, this is an unsustainable form of representation. This knowledge needs to be stored in a computationally amenable form and ontologies offer a mechanism for creating a shared understanding of a community for both humans and computers. Ontologies have been built and used for many domains and this chapter explores their role within bioinformatics. Structured classifications have a long history in biology; not least in the Linnean description of species. The explicit use of ontologies, however, is more recent. This chapter provides a survey of the need for ontologies; the nature of the domain and the knowledge tasks involved; and then an overview of ontology work in the discipline. The widest use of ontologies within biology is for conceptual annotation — a representation of stored knowledge more computationally amenable than natural language. An ontology also offers a means to create the illusion of a common query interface over diverse, distributed information sources — here an ontology creates a shared understanding for the user and also a means to computationally reconcile heterogeneities between the resources. Ontologies also provide a means for a schema definition suitable for the complexity and precision required for biology’s knowledge bases. Coming right up to date, bioinformatics is well set as an exemplar of the Semantic Web, offering both web accessible content and services conceptually marked up as a means for computational exploitation of its resources — this theme is explored through the myGRID services ontology. Ontologies in bioinformatics cover a wide range of usages and representation styles. Bioinformatics offers an exciting application area in which the community can see a real need for ontology based technology to work and deliver its promise.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J.D. Watson. Molecular Biology of the Cell. Garland, New York, 1989.Google Scholar
  2. 2.
    R. Altman, M. Bada, X.J. Chai, M. Whirl Carillo, R.O. Chen, and N.F. Abernethy. RiboWeb: An Ontology-Based System for Collaborative Molecular Biology. IEEE Intelligent Systems, 14 (5): 68 – 76, 1999.CrossRefGoogle Scholar
  3. 3.
    A. Ankolekar, M. Burstein, J. Hobbs, O. Lassila, D. Martin, S. Mcllraith, S. Narayanan, M. Paolucci, T. Payne, K. Sycara, and H. Zeng. DAML-S: Semantic Markup for Web Services. In Proceedings of the International Semantic Web Working Symposium (SWWS), 2001.Google Scholar
  4. 4.
    T.K. Attwood and D.J. Parry-Smith. Introduction to bioinformatics. Addison Wesley Longman, 1999.Google Scholar
  5. 5.
    F. Baader, D. McGuinness, D. Nardi, and P. P. Schneider, editors. The Description Logic Handbook Theory, Implementation and Applications. Cambridge University Press, 2003.Google Scholar
  6. 6.
    A. Bairoch and R. Apweiler. The SWISS-PROT Protein Sequence Data Bank and its Supplement TrEMBL in 1999. Nucleic Acids Research, 27: 49 – 5, 1999.CrossRefGoogle Scholar
  7. 7.
    P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass. An Ontology for Bioinformatics Applications. Bioinformatics, 15 (6): 510 – 520, 1999.CrossRefGoogle Scholar
  8. 8.
    T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, pages 28 – 37, May 2001.Google Scholar
  9. 9.
    Maryann E. Martone Bertram Ludscher, Amarnath Gupta. Model-based mediation with domain maps. In Conference on Data Engineering (ICDE), Heidelberg, Germany. IEEE Computer Society, 2001.Google Scholar
  10. 10.
    P. Buneman, S.B. Davidson, K. Hart, C. Overton, and L. Wong. A Data Transformation System for Biological Data Sources. In Proceedings of VLDB, pages 158–169. Morgan Kaufmann, 1995.Google Scholar
  11. 11.
    David Buttler, Matthew Colemanl, Terence Critchlowl, Renato Fileto, Wei Han, Ling Liu, Calton Pu, Daniel Rocco, and Li Xiong. Querying multiple bioinformatics data sources: Can semantic web research help? SIGMOD Record,2002. Special Issue.Google Scholar
  12. 12.
    I.A. Chen and V.M. Markowitz. An Overview of the Object-Protocol Model (OPM) and the OPM Data Management Tools. Information Systems, 20 (5): 393 – 418, 1995.CrossRefGoogle Scholar
  13. 13.
    C. Discala, X. Benigni, E. Barillot, and G. Vaysseix. DBcat: A Catalog of 500 Biological Databases. Nucleic Acids Research, 28 (1): 8 – 9, 2000.CrossRefGoogle Scholar
  14. 14.
    Electronic Commerce Code Management Association Technical Secretariat. Universal Products and Services Classification Implementation Guide, June 2001. Available: http://eccma.org/unspsc.Google Scholar
  15. 15.
    C.A. Goble, R. Stevens, G. Ng, S. Bechhofer, N.W. Paton, P.G. Baker, M. Peim, and A. Brass. Transparent Access to Multiple Bioinformatics Information Sources. IBM Systems Journal Special issue on deep computing for the life sciences, 40 (2): 532 – 552, 2001.Google Scholar
  16. 16.
    Caroline Hadley and David T. Jones. A Systematic Comparison of Protein Structure Classifications: SCOP, CATH and FSSP. Structure, 7 (9): 1099 – 1112, 1999.CrossRefGoogle Scholar
  17. 17.
    I. Horrocks. DAML+OIL: a reason-able web ontology language. In Proc. of EDBT 2002, pages 2–13. Lecture Notes in Computer Science, 2002.Google Scholar
  18. 18.
    International Union of Biochemistry. Enzyme Nomenclature 1984: Recommendations of the Nomenclature Committee of the International Union of Biochemistry on the Nomenclature and Classification of Enzyme-Catalyzed Reactions. Academic Press (for The International Union of Biochemistry by ), Orlando, FL, 1984.Google Scholar
  19. 19.
    D.M. Jones, P.R.S. Visser, and R.C. Paton. Addressing Biological Complexity to Enable Knowledge Sharing. In AAAI’98 Workshop on Knowledge Sharing Across Biological and Medical Knowledge-based Systems, 1998.Google Scholar
  20. 20.
    R. Kak and D. Sotero. Implementing RosettaNet E-Business Standards for Greater Supply Chain Collaboration and Efficiency, 2002. RosettaNet White Paper Available: http: //www.rosettanet.org.Google Scholar
  21. 21.
    M. Karp, P. amd Krummenacker, S. Paley, and J. Wagg. Integrated pathway/genome databases and their role in drug discovery. Trends in Biotechnology, 17: 275 – 281, 1999.CrossRefGoogle Scholar
  22. 22.
    P. Karp. Frame representation and relational data bases: Alternative information management technologies for systematics. In R. Fortuner, editor, Advanced Computer Methods for Systematic Biology: Artificial Intelligence, Database Systems, Computer Vision. The Johns Hopkins University Press, 1993.Google Scholar
  23. 23.
    P. Karp. A Strategy for Database Interoperation. Journal of Computational Biology, 2 (4): 573 – 586, 1995.CrossRefGoogle Scholar
  24. 24.
    P. Karp and S. Paley. Integrated Access to Metabolic and Genomic Data. Journal of Computational Biology, 3 (1): 191 – 212, 1996.CrossRefGoogle Scholar
  25. 25.
    P.D. Karp, M. Riley, M. Saier, I.T. Paulsen, S.M. Paley, and A. Pellegrini-Toole. The EcoCyc and MetaCyc Databases. Nucleic Acids Research, 28: 56 – 59, 2000.CrossRefGoogle Scholar
  26. 26.
    Peter D. Karp, Monica Riley, Milton Saier amd Ian T. Paulsen amd Julio Collado-Vides, Suzanne M. Paley, Alida Pellegrini-Toole, and esar Bonavides amd Socorro Gama-Castro. The EcoCyc Database. Nucleic Acids Research, 30 (1): 56 – 58, 2002.CrossRefGoogle Scholar
  27. 27.
    Peter D. Karp, Monica Riley, Suzanne M. Paley, and Alida Pellegrini-Toole. The MetaCyc Database. Nucleic Acids Research, 30 (1): 59 – 61, 2002.CrossRefGoogle Scholar
  28. 28.
    Natalya Fridman Noy and Carole D. Hafner. Representing scientific experiments: Implications for ontology design and knowledge sharing. In AAAUTAAI, pages 615 – 622, 1998.Google Scholar
  29. 29.
    P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19 (10): 1275 – 83, 2003.CrossRefGoogle Scholar
  30. 30.
    P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble. Semantic similarity measures as tools for exploring the Gene Ontology. In Pacific Symposium on Biocomputing, pages 601 – 612, 2003.Google Scholar
  31. 31.
    A.L. Rector, S.K. Bechhofer, C.A. Goble, I. Horrocks, W.A. Nowlan, and W.D. Solomon. The GRAIL Concept Modelling Language for Medical Terminology. Artificial Intelligence in Medicine, 9: 139 – 171, 1997.CrossRefGoogle Scholar
  32. 32.
    P. Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11: 95 – 130, 1999.Google Scholar
  33. 33.
    P. Rice, I. Longde, and A. Bleasby. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics, 16 (6): 276 – 277, 2000.CrossRefGoogle Scholar
  34. 34.
    M. Riley. Functions of the gene products of Escherichia coli. Microbiological Reviews, 57: 862 – 952, 1993.Google Scholar
  35. 35.
    J.E. Rogers, C. Price, A.L. Rector, W.D. Solomon, and N. Smejko. Validating Clinical Terminology Structures: Integration and Cross-Validation of Read Thesaurus and GALEN. In AMIA Fall Symposium, 1998.Google Scholar
  36. 36.
    Daniel L. Rubin, Farhad Shafa, Diane E. Oliver, Micheal Hewett, and Russ B. Altman. Representing genetic sequence data for pharmacogenomics: an evolutionary approach using ontological and relational models. In Chris Sander, editor, Proceedings of Tenth International Conference on Intelligent Systems for Molecular Biology, volume 18 Supplement 1, pages 207 – 215, 2002.Google Scholar
  37. 37.
    A.C. Siepel, A.N. Tolopko, A.D. Farmer, P.A. Steadman, F.D. Schilkey, B.D. Perry, and W.D. Beavis. An integration platform for heterogenous bioinformatics software components. IBM Systems Journal, 40 (2): 570 – 591, 2001.CrossRefGoogle Scholar
  38. 38.
    R. Stevens, C.A. Goble, and S. Bechhofer. Ontology-based Knowledge Representation for Bioinformatics. Briefings in Bioinformatics, 1 (4): 398 – 416, November 2000.CrossRefGoogle Scholar
  39. 39.
    R.D. Stevens, C.A. Goble, P. Baker, and A. Brass. A Classification of Tasks in Bioinformatics. Bioinformatics, 17 (2): 180 – 188, 2001.CrossRefGoogle Scholar
  40. 40.
    The Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biology. Nature Genetics, 25: 25 – 29, 2000.CrossRefGoogle Scholar
  41. 41.
    UDDI. UDDI Technical White Paper, September 2000. Available: http: //www. uddi.org.Google Scholar
  42. 42.
    Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, and Mark Greenwood. A Suite of DAML+OIL Ontologies to Describe Bioinformatics Web Services and Data. Accepted for publication in the International Journal of Cooperative Information Systems, 2003.Google Scholar
  43. 43.
    C.J. Wroe, R.D. Stevens, C.A. Goble, and M. Ashburner. A Methodology to Migrate the Gene Ontology to a Description Logic Environment Using DAML+OIL. 8th Pacific Symposium on biocomputing (PSB), 2003.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Robert Stevens
    • 1
  • Chris Wroe
    • 1
  • Phillip Lord
    • 1
  • Carole Goble
    • 1
  1. 1.Department of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations