Advertisement

Bioinformatics Data Source Integration Based on Semantic Relationships Across Species

  • Badr Al-Daihani
  • Alex Gray
  • Peter Kille
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4316)

Abstract

Bioinformatics databases are heterogeneous, differ in their representation as well as in their query capabilities across diverse information held in distributed autonomous resources. Current approaches to integrating heterogeneous bioinformatics data sources are based on one of a: common field, ontology or cross-reference. In this paper we investigate the use of semantic relationships across species to link, integrate and annotate genes from publicly available data sources and a novel Soft Link approach is introduced, to link information across species held in biological databases, through providing a flexible method of joining related information from different databases, including non-bioinformatics databases. A measure of relationship closeness will afford a biologist a new tool in their repertoire for analysis. Soft Links are identified as interrelated concepts and can be used to create a rich set of possible relation types supporting the investigation of alternative hypothesis.

Keywords

Biological Object Relationship Type Semantic Relationship Gene Expression Dataset Flexible Linkage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aparicio, A.S., Farias, O.L.M., et al.: Applying Ontologies in the Integration of Heterogeneous Relational Databases. In: Australasian Ontology Workshop (AOW 2005), Sydney, Australia, ACS (2005)Google Scholar
  2. Baxevanis, A.D., Ouellette, B.F.F. (eds.): Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. John Wiley & Sons, New York (2001)Google Scholar
  3. Ben-Miled, Z., Li, N., et al.: On the Integration of a Large Number of Life Science Web Databases. Lecture Notes in Bioinformatics (LNBI), pp. 172–186 (2004)Google Scholar
  4. Ben Milad, Z., Liu, Y., et al.: Distributed Databases (2003)Google Scholar
  5. Bleiholder, J., Lacroix, Z.e., et al.: BioFast: Challenges in Exploring Linked Life Science Sources. SIGMOD Record 33(2), 72–77 (2004)CrossRefGoogle Scholar
  6. Carel, R.: Practical Data Integration In Biopharmaceutical Research and Development. PharmaGenomics, 22–35 (2003)Google Scholar
  7. Collet, C., Huhns, M.N., et al.: Resource Integration Using a Large Knowledge Base in Carnot. IEEE Computer 24(12), 55–62 (1991)Google Scholar
  8. Davidson, S., Crabtree, J., et al.: K2/Kleisli and GUS: experiments in integrated access to genomic data sources. IBM Journal (2001)Google Scholar
  9. Decker, S., Erdmann, M., et al.: Ontobroker: Ontology Based Access to Distributed and Semi-Structured Information. Database Semantics - Semantic Issues in Multimedia Systems. In: Proceedings TC2/WG 2.6 8th Working Conference on Database Semantics (DS-8), Rotorua, New Zealand. Kluwer Academic Publishers, Boston (1999)Google Scholar
  10. Dennis Jr., G., Sherman, B.T., et al.: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4(5), P3 (2003)Google Scholar
  11. Etzold, T., Ulyanov, A., et al.: SRS: information retrieval system for molecular biology data banks. Methods Enzymol. 266, 114–128 (1996)CrossRefGoogle Scholar
  12. Freier, A., Hofestadt, R., et al.: BioDataServer: a SQL-based service for the online integration of life science data. Silico Biol. 2(2), 37–57 (2002)Google Scholar
  13. Goble, C., Stevens, R., et al.: Transparent Access to Multiple Bioinformatics Information Sources. IBM Systems Journal 40(2), 534–551 (2001)CrossRefGoogle Scholar
  14. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing. International Journal of HumanComputer Studies 43, 907–928 (1995)CrossRefGoogle Scholar
  15. Gupta, A., Ludäscher, B., et al.: Knowledge-Based Integration of Neuroscience Data Sources. In: 12th International Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  16. Heflin, J., Hendler, J.: Dynamic Ontologies on the Web. In: Proceedings of 17th National Conference on Artificial Intelligence (AAAI 2000), Menlo Park,CA. AAAI/MIT Press (2000)Google Scholar
  17. Kashyap, V., Sheth, A.P.: Semantic and schematic similarities between database objects: A context-based approach. VLDB Journal: Very Large Data Bases 5(4), 276–304 (1996)CrossRefGoogle Scholar
  18. Lacroix, Z., Critchlow, T. (eds.): Bioinformatics: Managing Scientific Data. Multimedia information and systems. Morgan Kaufmann, San Francisco (2003)Google Scholar
  19. Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: Proceedings of the Conference in Innovative Database Research (CIDR) 2005, Asilomar, CA (2005)Google Scholar
  20. Necib, C.B., Freytag, J.C.: Using Ontologies for Database Query Reformulation. In: ADBIS (Local Proceedings) (2004)Google Scholar
  21. Rector, A., Bechhofer, S., et al.: The grail concept modelling language for medical terminology. Artificial Intelligence in Medicine 9, 139–171 (1997)CrossRefGoogle Scholar
  22. Robert, H., Patricia, M.: SRS as a possible infrastructure for GBIF. GBIF DADI Meeting, San Diego (2002)Google Scholar
  23. Venkatesh, T.V., Harlow, H.: Integromics: challenges in data integration. Genome Biology 3(8), reports4027.1 – reports4027.3 (2002)Google Scholar
  24. Wache, H., Ogele, T.V., et al.: Ontology-Based Integration of Information — A Survey of Existing Approaches. In: IJCAI 2001 Workshop on Ontologies and Information Sharing, Seattle, USA. (2001)Google Scholar
  25. Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992); The Genomics Unified Schema(GUS) platform for Functional genomics (2004)Google Scholar
  26. Al-Daihani, B., Gray, A., et al.: Soft Link Model(SLM) for Bioinformatics Data Source Integration. In: International Symposium on Health Informatics and Bioinformatics, Turkey 2005, Antalya, Turkey, Middle East Technical University (2005)Google Scholar
  27. Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  28. Benson, D.A., Karsch-Mizrachi, I., et al.: GenBank. Nucleic Acids Res 33(Database issue), D34–D38 (2005)CrossRefGoogle Scholar
  29. Bleiholder, J., Lacroix, Z.e., et al.: BioFast: Challenges in Exploring Linked Life Science Sources. SIGMOD Record 33(2), 72–77 (2004)CrossRefGoogle Scholar
  30. Buntrock, R.E.: Chemical registries–in the fourth decade of service. J. Chem. Inf. Comput. Sci. 41(2), 259–263 (2001)Google Scholar
  31. Etzold, T., Ulyanov, A., et al.: SRS: information retrieval system for molecular biology data banks. Methods Enzymol. 266, 114–128 (1996)CrossRefGoogle Scholar
  32. Freier, A., Hofestadt, R., et al.: BioDataServer: a SQL-based service for the online integration of life science data. Silico Biol. 2(2), 37–57 (2002)Google Scholar
  33. Gupta, A., Ludäscher, B., et al.: Knowledge-Based Integration of Neuroscience Data Sources. In: 12th International Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  34. Kanz, C., Aldebert, P., et al.: The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 33(Database issue), D29–D33 (2005)Google Scholar
  35. Kohler, J.: SEMEDA: Ontology based semantic integration of biological databases (2003)Google Scholar
  36. Kohler, J.: Integration of life science databases. BioSlico 2(2), 61–69 (2004)Google Scholar
  37. Lacroix, Z., Critchlow, T. (eds.): Bioinformatics: Managing Scientific Data. Multimedia information and systems. Morgan Kaufmann, San Francisco (2003)Google Scholar
  38. Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: Proceedings of the Conference in Innovative Database Research (CIDR) 2005, Asilomar, CA (2005)Google Scholar
  39. Letovsky, S.L. (ed.): Bioinformatics: databases and systems. Kluwer Academic Publishers, Massachusetts (1999)Google Scholar
  40. Maglott, D., Ostell, J., et al.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 33(Database issue), D54–D58 (2005)Google Scholar
  41. Robbins, R.J.: Information infrastructure for the human genome project. IEEE Engineering in Medicine and Biology 14(6), 746–759 (1995)CrossRefMathSciNetGoogle Scholar
  42. Schneider, M., Tognolli, M., et al.: The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant Physiol Biochem. 42(12), 1013–1021 (2004)CrossRefGoogle Scholar
  43. Williams, N.: How to get databases talking the same language. Science 275(5298), 301–302 (1997)CrossRefGoogle Scholar
  44. Barrett, T., Suzek, T.O., et al.: NCBI GEO: mining millions of expression profiles–database and tools. Nucl. Acids Res. %R 10.1093/nar/gki022 33(suppl. 1), D562–D566 (2005)Google Scholar
  45. Lord, P.W., Stevens, R.D., et al.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics %R 10.1093/bioinformatics/btg153 19(10), 1275–1283 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Badr Al-Daihani
    • 1
  • Alex Gray
    • 1
  • Peter Kille
    • 2
  1. 1.School of Computer SciencesCardiff UniversityCardiffUK
  2. 2.School of BiosciencesCardiff UniversityCardiffUK

Personalised recommendations