Flexible Integration of Molecular-Biological Annotation Data: The GenMapper Approach

  • Hong-Hai Do
  • Erhard Rahm
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

Molecular-biological annotation data is continuously being collected, curated and made accessible in numerous public data sources. Integration of this data is a major challenge in bioinformatics. We present the GenMapper system that physically integrates heterogeneous annotation data in a flexible way and supports large-scale analysis on the integrated data. It uses a generic data model to uniformly represent different kinds of annotations originating from different data sources. Existing associations between objects, which represent valuable biological knowledge, are explicitly utilized to drive data integration and combine annotation knowledge from different sources. To serve specific analysis needs, powerful operators are provided to derive tailored annotation views from the generic data representation. GenMapper is operational and has been successfully used for large-scale functional profiling of genes. Interactive access is provided under http://www.izbi.de.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    Human Genome Browser: http://genome.ucsc.edu/
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Agrawal, R., Somani, A., Xu, Y.: Storage and Querying of E-Commerce Data. In: Proc. VLDB (2001)Google Scholar
  14. 14.
    Baxevanis, A.: The Molecular Biology Database Collection: 2003 Update. Nucleic Acids Research 31(1) (2003)Google Scholar
  15. 15.
    Bernstein, P., et al.: The Microsoft Repository. In: Proc. VLDB (1997)Google Scholar
  16. 16.
    Critchlow, T., et al.: DataFoundry: Information Managemenet for Scientific Data. IEEE Trans. on Information Management in Biomedicine 4(1) (2000)Google Scholar
  17. 17.
    Dowell, R.D., et al.: The Distributed Annotation System. BMC Bioinformatics 2(7) (2001)Google Scholar
  18. 18.
    Enard, W., et al.: Intra- and Inter-specific Variation in Primate Gene Expression Patterns. Science 296 (2002)Google Scholar
  19. 19.
    Etzold, T., Ulyanov, A., Argos, P.: SRS – Information Retrieval System for Molecular Biology Data Banks. Methods in Enzymology 266 (1996)Google Scholar
  20. 20.
    Fujibuchi, W., et al.: DBGET/LinkDB: An Integrated Database Retrieval System. In: Proc. PSB (1997)Google Scholar
  21. 21.
    Goble, C., et al.: Transparent Access to Multiple Bioinformatics Information Sources. IBM System Journal 40(2) (2001)Google Scholar
  22. 22.
    Haas, L., et al.: DiscoveryLink – A System for Integrated Access to Life Sciences Data Sources. IBM System Journal 40(2) (2001)Google Scholar
  23. 23.
    Kementsietsidis, A., Arenas, M., Miller, R.J.: Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues. In: Proc. SIGMOD (2003)Google Scholar
  24. 24.
    Kemp, G., Angelopoulos, N., Gray, P.: A Schema-based Approach to Building Bioinformatics Database Federation. In: Proc. BIBE (2000)Google Scholar
  25. 25.
    Khaitovich, P., Mützel, B., Weiss, G., Do, H.-H., Lachmann, M., Hellmann, I., Enard, W., Arendt, T., Dietzsch, J., Steigele, S., Nieselt-Struwe, K., Pääbo, S.: Evolution of Gene Expression in the human brain (submitted for publication)Google Scholar
  26. 26.
    Lacroix, Z., Critchlow, T. (eds.): Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)Google Scholar
  27. 27.
    Mützel, B., Do, H.-H., Khaitovich, P., Weiß, P.G., Rahm, E., Pääbo, S.: Functional Profiling of Genes Differently Expressed in the Brains of Humans and Chimpanzees (in preparation)Google Scholar
  28. 28.
    Nadkarni, P., et al.: Organization of Heterogeneous Scientific Data Using the EAV/CR Representation. Journal of American Medical Informatics Association 6(6) (1999)Google Scholar
  29. 29.
    Paton, N., et al.: Conceptual Modeling of Genomic Information. Bioinformatics 16(6) (2000)Google Scholar
  30. 30.
    Ritter, O.: The Integrated Genomic Database (IGD). In: Suhai, S. (ed.) Computational Methods in Genome Research, Plenum Press, New York (1994)Google Scholar
  31. 31.
    Wong, L.: Kleisli, Its Exchange Format, Supporting Tools, and an Application in Protein Interaction Extraction. In: Proc. BIBE (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Hong-Hai Do
    • 1
  • Erhard Rahm
    • 2
  1. 1.Interdisciplinary Centre for Bioinformatics 
  2. 2.Department of Computer ScienceUniversity of LeipzigGermany

Personalised recommendations