Abstract
Molecular-biological annotation data is continuously being collected, curated and made accessible in numerous public data sources. Integration of this data is a major challenge in bioinformatics. We present the GenMapper system that physically integrates heterogeneous annotation data in a flexible way and supports large-scale analysis on the integrated data. It uses a generic data model to uniformly represent different kinds of annotations originating from different data sources. Existing associations between objects, which represent valuable biological knowledge, are explicitly utilized to drive data integration and combine annotation knowledge from different sources. To serve specific analysis needs, powerful operators are provided to derive tailored annotation views from the generic data representation. GenMapper is operational and has been successfully used for large-scale functional profiling of genes. Interactive access is provided under http://www.izbi.de.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Affymetrix: http://www.affymetrix.com/
Ensembl: http://www.ensembl.org/
Enzyme: http://www.expasy.ch/enzyme/
GeneOntology: http://www.geneontology.org/
Human Genome Browser: http://genome.ucsc.edu/
InterPro: http://www.ebi.ac.uk/interpro/
LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/
SwissProt: http://www.expasy.ch/sprot/
Unigene: http://www.ncbi.nlm.nih.gov/UniGene/
Agrawal, R., Somani, A., Xu, Y.: Storage and Querying of E-Commerce Data. In: Proc. VLDB (2001)
Baxevanis, A.: The Molecular Biology Database Collection: 2003 Update. Nucleic Acids Research 31(1) (2003)
Bernstein, P., et al.: The Microsoft Repository. In: Proc. VLDB (1997)
Critchlow, T., et al.: DataFoundry: Information Managemenet for Scientific Data. IEEE Trans. on Information Management in Biomedicine 4(1) (2000)
Dowell, R.D., et al.: The Distributed Annotation System. BMC Bioinformatics 2(7) (2001)
Enard, W., et al.: Intra- and Inter-specific Variation in Primate Gene Expression Patterns. Science 296 (2002)
Etzold, T., Ulyanov, A., Argos, P.: SRS – Information Retrieval System for Molecular Biology Data Banks. Methods in Enzymology 266 (1996)
Fujibuchi, W., et al.: DBGET/LinkDB: An Integrated Database Retrieval System. In: Proc. PSB (1997)
Goble, C., et al.: Transparent Access to Multiple Bioinformatics Information Sources. IBM System Journal 40(2) (2001)
Haas, L., et al.: DiscoveryLink – A System for Integrated Access to Life Sciences Data Sources. IBM System Journal 40(2) (2001)
Kementsietsidis, A., Arenas, M., Miller, R.J.: Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues. In: Proc. SIGMOD (2003)
Kemp, G., Angelopoulos, N., Gray, P.: A Schema-based Approach to Building Bioinformatics Database Federation. In: Proc. BIBE (2000)
Khaitovich, P., Mützel, B., Weiss, G., Do, H.-H., Lachmann, M., Hellmann, I., Enard, W., Arendt, T., Dietzsch, J., Steigele, S., Nieselt-Struwe, K., Pääbo, S.: Evolution of Gene Expression in the human brain (submitted for publication)
Lacroix, Z., Critchlow, T. (eds.): Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)
Mützel, B., Do, H.-H., Khaitovich, P., Weiß, P.G., Rahm, E., Pääbo, S.: Functional Profiling of Genes Differently Expressed in the Brains of Humans and Chimpanzees (in preparation)
Nadkarni, P., et al.: Organization of Heterogeneous Scientific Data Using the EAV/CR Representation. Journal of American Medical Informatics Association 6(6) (1999)
Paton, N., et al.: Conceptual Modeling of Genomic Information. Bioinformatics 16(6) (2000)
Ritter, O.: The Integrated Genomic Database (IGD). In: Suhai, S. (ed.) Computational Methods in Genome Research, Plenum Press, New York (1994)
Wong, L.: Kleisli, Its Exchange Format, Supporting Tools, and an Application in Protein Interaction Extraction. In: Proc. BIBE (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Do, HH., Rahm, E. (2004). Flexible Integration of Molecular-Biological Annotation Data: The GenMapper Approach. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-540-24741-8_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21200-3
Online ISBN: 978-3-540-24741-8
eBook Packages: Springer Book Archive