Improving Cross Mapping in Biomedical Databases

  • Joel Arrais
  • João E. Pereira
  • Pedro Lopes
  • Sérgio Matos
  • José Luis Oliveira
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 74)


The complete analysis of many large scale experiments requires the comparison of the produced output with data available in public databases. Because each datapbase uses its own nomenclature to classify entries, this task frequently implies the conversion of identifiers and, due to incomplete mapping between those identifiers, this tasks commonly causes loss of information.

In this paper, we propose a methodology to improve the coverage of the mapping between database identifiers. As a starting point we use a local warehouse with the default mappings from the most relevant biological databases. Next we apply a methodology to four database identifiers (Ensembl, Entrez Gene, KEGG and UniProt). The results showed an improvement in the coverage of all relationships superior to 10% in three and to 7% in five relations.


Biomedical databases identifiers mapping 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Widom, J.: Research problems in data warehousing. In: Proceedings of the fourth international conference on Information and knowledge management, pp. 25–30. ACM, Baltimore (1995)Google Scholar
  2. 2.
    Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. J. Biomed. Inform. 41(5), 687–693 (2008)CrossRefGoogle Scholar
  3. 3.
    Galperin, M.Y.: The Molecular Biology Database Collection: 2008 update. Nucleic Acids Res. (2007)Google Scholar
  4. 4.
    Al-Shahrour, F., et al.: From genes to functional classes in the study of biological systems. BMC Bioinformatics 8, 114 (2007)CrossRefGoogle Scholar
  5. 5.
    Kanehisa, M., et al.: KEGG for linking genomes to life and the environment. Nucleic Acids Res. (2007)Google Scholar
  6. 6.
    Maglott, D., et al.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 35(Database issue), D26–D31 (2007)CrossRefGoogle Scholar
  7. 7.
    Flicek, P., et al.: Ensembl 2008. Nucleic Acids Res. 36(Database issue), 707–714 (2008)Google Scholar
  8. 8.
    Wu, C.H., et al.: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34(Database issue), D187–D91 (2006)Google Scholar
  9. 9.
    Arrais, J., et al.: GeNS: a biological data integration platform. In: International Conference on Bioinformatics and Biomedicine, venice, Italy (2009)Google Scholar
  10. 10.
    Diehn, M., et al.: Source: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res. 31(1), 219–223 (2003)Google Scholar
  11. 11.
    Tsai, J., et al.: Resourcerer: a database for annotating and linking microarray resources within and across species. Genome Biol. 2(11) (2001) Software0002Google Scholar
  12. 12.
    Lenhard, B., Wahlestedt, C., Wasserman, W.W.: GeneLynx mouse: integrated portal to the mouse genome. Genome Res. 13(6B), 1501–1504 (2003)Google Scholar
  13. 13.
    Castillo-Davis, C.I., Hartl, D.L.: GeneMerge–post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19(7), 891–892 (2003)Google Scholar
  14. 14.
    Zhang, J., Carey, V., Gentleman, R.: An extensible application for assembling annotation for genomic data. Bioinformatics 19(1), 155–156 (2003)zbMATHCrossRefGoogle Scholar
  15. 15.
    Draghici, S., et al.: Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 31(13), 3775–3781 (2003)CrossRefGoogle Scholar
  16. 16.
    Alibes, A., et al.: IDconverter and IDClight: conversion and annotation of gene and protein IDs. BMC Bioinformatics 8, 9 (2007)CrossRefGoogle Scholar
  17. 17.
    Bussey, K.J., et al.: MatchMiner: a tool for batch navigation among gene and gene product identifiers. Genome Biol. 4(4), R27 (2003)CrossRefGoogle Scholar
  18. 18.
    Lee, T.J., et al.: BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinformatics 7, 170 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Joel Arrais
    • 1
  • João E. Pereira
    • 1
  • Pedro Lopes
    • 1
  • Sérgio Matos
    • 1
  • José Luis Oliveira
    • 1
  1. 1.DETI/IEETAUniversity of AveiroAveiroPortugal

Personalised recommendations