Data Mining and Knowledge Discovery

, Volume 15, Issue 2, pp 275–296 | Cite as

Detecting inconsistency in biological molecular databases using ontologies

  • Qingfeng ChenEmail author
  • Yi-Ping Phoebe Chen
  • Chengqi Zhang


The rapid growth of life science databases demands the fusion of knowledge from heterogeneous databases to answer complex biological questions. The discrepancies in nomenclature, various schemas and incompatible formats of biological databases, however, result in a significant lack of interoperability among databases. Therefore, data preparation is a key prerequisite for biological database mining. Integrating diverse biological molecular databases is an essential action to cope with the heterogeneity of biological databases and guarantee efficient data mining. However, the inconsistency in biological databases is a key issue for data integration. This paper proposes a framework to detect the inconsistency in biological databases using ontologies. A numeric estimate is provided to measure the inconsistency and identify those biological databases that are appropriate for further mining applications. This aids in enhancing the quality of databases and guaranteeing accurate and efficient mining of biological databases.


Data preparation Inconsistency Ontology Measure Biological molecular databases Integration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. AmiGO browser, (2005)
  2. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM and Sherlock G (2000). The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology. Nat Genet 25(1): 25–29 CrossRefGoogle Scholar
  3. Baker PG, Goble CA, Bechhofer S, Paton NW, Stevens R and Brass A (1999). An ontology for bioinformatics applications. Bioinformatics 15(6): 510–520 CrossRefGoogle Scholar
  4. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J and Wheeler DL (2004). GenBank update. Nucleic Acids Res 32(Database issue): 23–26 CrossRefGoogle Scholar
  5. Chen Y-PP (ed) (2005) Bioinformatics technologies. Springer.Google Scholar
  6. Chen Y-PP, Colomb BM (2003) Database technologies for L-system simulations in virtual plant applications on bioinformatics. Knowledge Inform Syst 5(3):288–314, Springer-Verlag.Google Scholar
  7. Chen RO, Felciano R, Altman RB (1997) RiboWeb: Linking structural computations to a knowledge base of published experimental data. In: Proceeding of the 5th international conference on intelligent systems for molecular biology. AAAI Press, pp 84–87Google Scholar
  8. DNA data bank of Japan,
  9. EMBL-the European molecular biology laboratory (2005) Scholar
  10. Etzold T, Ulyanov A and Argos P (1996). SRS: information retrieval system for molecular biology data banks. Methods Enzymol 226: 114–128 CrossRefGoogle Scholar
  11. Fujibuchi W, Goto S, Migimatsu H, Uchiyama I, Ogiwara A, Akiyama Y, Kanehisa M (1998) DBGET/LinkDB: an integrated database retrieval system. In: Proceeding of the pacific symposium on biocomputing, pp 683–694, HawaiiGoogle Scholar
  12. Gene ontology (2006)
  13. Gene ontology annotation database (2006)
  14. Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC (2001) DiscoveryLink: a system for integrated access to life sciences data sources. IBM Syst J 40(2): DOI:  10.1147/sj.402.0489
  15. Hunter L (ed) (1993) Artificial intelligence and molecular biology. MIT PressGoogle Scholar
  16. Hunter A (2002) Measuring inconsistency in knowledge via quasi-classical models. In: Proceedings of AAAI-02, pp 68–73Google Scholar
  17. Hunter A (2003) Evaluating the Significance of Inconsistencies. In: Proceedings of the International Joint Conference on AI (IJCAI’03), pp 468–473Google Scholar
  18. Karp PD (1995) A strategy for database interoperation. J comput Biol 2(4):59–61CrossRefGoogle Scholar
  19. Karp PD (2000). An ontology for biological function based on molecular interactions. Bioinformatics 16(3): 269–285 CrossRefGoogle Scholar
  20. Karp PD, Riley M, Saier M, Paulsen IT, Paley SM and Pellegrini-Toole A (2000). The EcoCyc and MetaCyc databases. Nucleic Acids Res 30(1): 59–61 CrossRefGoogle Scholar
  21. Kohler J, Philippi S and Lange M (2003). SEMEDA: ontology based semantic integration of biological databases. Bioinformatics 19(18): 2420–2427 CrossRefGoogle Scholar
  22. Lin JX (1996). Integration of weighted knowledge bases. Artif Int 83(2): 363–378 CrossRefGoogle Scholar
  23. Miyazaki S, Sugawara H, Gojobori T and Tateno Y (2003). DNA Data Bank of Japan (DDBJ) in XML. Nucleic Acids Res 31(1): 13–16 CrossRefGoogle Scholar
  24. Oinn TM (2003). Talisman–rapid application development for the grid. Bioinformatics 19(Suppl): 212–214 CrossRefGoogle Scholar
  25. Philippi S and Kohler J (2004). Using XML technology for the ontology-based semantic integration of life science databases. IEEE Trans Inf Technol Biomed 8(2): 154–160 CrossRefGoogle Scholar
  26. Stevens R, Goble C, Horrocks I and Bechhofer S (2002). OILing the way to machine understandable bioinformatics resources. IEEE Trans Inf Technol Biomed 6(2): 129–134 CrossRefGoogle Scholar
  27. The national center for biotechnology information (NCBI) (2005).
  28. Williams N (1997). Bioinformatics: how to get databases talking the same language. Science 275(5298): 301–302 CrossRefGoogle Scholar
  29. Yeh I, Karp PD, Noy NF and Altman RB (2003). Knowledge acquisition, consistency checking and concurrency control for Gene Ontology (GO). Bioinformatics 19(2): 241–248 CrossRefGoogle Scholar
  30. Zhang SC, Yang Q and Zhang CQ (2003). Data preparation for data mining. Appl Artif Intel 17: 375–382 CrossRefGoogle Scholar
  31. Zhang SC, Zhang CQ and Yang Q (2004). Information enhancement for data mining. IEEE Intelligent Sys 9(2): 12–13 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Qingfeng Chen
    • 1
    Email author
  • Yi-Ping Phoebe Chen
    • 1
    • 2
  • Chengqi Zhang
    • 3
  1. 1.School of Information TechnologyDeakin UniversityMelbourneAustralia
  2. 2.ARC Centre in BioinformaticsMelbourneAustralia
  3. 3.Faculty of Information TechnologyUniversity of TechnologySydneyAustralia

Personalised recommendations