An Ontology-Based Method to Link Database Integration and Data Mining within a Biomedical Distributed KDD

  • David Perez-Rey
  • Victor Maojo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5651)


Over the last years, collaborative research has been continuously growing in many scientific areas such as biomedicine. However, traditional Knowledge Discovery in Databases (KDD) processes generally adopt centralized approaches that do not fully address many research needs in these distributed environments. This paper presents a method to improve traditional centralized KDD by adopting an ontology-based distributed model. Ontologies are used within this model: (i) as Virtual Schemas (VS) to solve structural heterogeneities in databases and (ii) as frameworks to guide automatic transformations when data is retrieved by users—Preprocessing Ontologies (PO). Both types of ontologies aim to facilitate data gathering and preprocessing while maintaining data source decentralization. This ontology-based approach allows to link database integration and data mining, improving final results, reusability and interoperability. The results obtained present improvements in outcome performance and new capabilities compared to traditional KDD processes.


Database Integration Distributed KDD Ontologies Preprocessing Data Mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fayyad, U., Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in databases. AI Magazine 17, 37–54 (1996)Google Scholar
  2. 2.
    Gurwitz, D., Lunshof, J.E., Altman, R.B.: A call for the creation of personalized medicine database. Nature Reviews, Drug Discovery 5, 23–26 (2006)CrossRefPubMedGoogle Scholar
  3. 3.
    Perez-Rey, D., et al.: ONTOFUSION: Ontology-Based Integration of Genomic and Clinical Databases. Comput. Biol. Med. 36, 712–730 (2006)CrossRefPubMedGoogle Scholar
  4. 4.
    Perez-Rey, D., Anguita, A., Crespo, J.: OntoDataClean: Ontology-based Integration and Preprocessing of Distributed Data. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds.) ISBMDA 2006. LNCS (LNBI), vol. 4345, pp. 262–272. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Weiss, S.M., Indurkhya, N.: Predictive Data Mining: A Practical Guide. Morgan Kaufmann, San Francisco (1998)Google Scholar
  6. 6.
    Kedad, Z., Metais, E.: Ontology-based Data Cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    SEER Cancer Statistics Review. Surveillance, Epidemiology and End Results (SEER) program, (last accesed on April 2009)
  8. 8.
    Kohler, J., Philippi, S., Lange, M.: SEMEDA: ontology based semantic integration of biological databases. Bioinformatics 19(18), 2420–2427 (2003)CrossRefPubMedGoogle Scholar
  9. 9.
    Librelotto, G.R., Souza, W., Ramalho, J.C., Henriques, P.R.: Using the Ontology Paradigm to Integrate Information Systems. In: International Conference on Knowledge Engineering and Decision Support, pp. 497–504 (2003)Google Scholar
  10. 10.
    Xu, Z., Zhang, S., Dong, Y.: Mapping between Relational Database Schema and OWL Ontology for Deep Annotation. In: International Conference on Web Intelligence, pp. 548–552 (2006)Google Scholar
  11. 11.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • David Perez-Rey
    • 1
  • Victor Maojo
    • 1
  1. 1.Artificial Intelligence Department, Facultad de InformáticaUniversidad Politécnica de MadridMadridSpain

Personalised recommendations