Data Quality Enhancement of Databases Using Ontologies and Inductive Reasoning

  • Olivier Curé
  • Robert Jeansoulin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4803)

Abstract

The objective of this paper is twofold: create domain ontologies by induction on source databases and enhance data quality features in relational databases using these ontologies. The proposed method consists of the following steps : (1) transforming domain specific controlled terminologies into Semantic Web compliant Description Logics, (2) associating new axioms to concepts of these ontologies based on inductive reasoning on source databases, and (3) providing domain experts with an ontology-based tool to enhance the data quality of source databases. This last step aggregates tuples using ontology concepts and checks the characteristics of those tuples with the concept’s properties. We present a concrete example of this solution on a medical application using well-established drug related terminologies.

Keywords

Relational Database Inductive Reasoning Database Schema Source Database Drug Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, Cambridge (2003)MATHGoogle Scholar
  3. 3.
    Bertossi, L., Chomicki, J.: Query Answering in Inconsistent Databases Chapter in book. In: Chomicki, J., Saake, G., van der Meyden, R. (eds.) Logics for emerging applications of databases, Springer, Heidelberg (2003)Google Scholar
  4. 4.
    Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietdsidis, A.: Conditional functional Dependencies for Data CleaningGoogle Scholar
  5. 5.
    Borgida, A.: On the Relative Expressiveness of Description Logics and Precidate Logics. Artificial intelligence 82(1-2), 353–367 (1996)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Brachman, R.J.: What IS-A is and isn’t: an analysis of taxonomic links in semantic networks. IEEE Computer 16, 30–36 (1983)Google Scholar
  7. 7.
    Cimino, J.J., Zhu, X.: The practical impact of ontologies on biomedical informatics IMIA Yearbook of Medical Informatics, pp. 1-12 (2006)Google Scholar
  8. 8.
    Curé, O., Squelbut, R.: A database trigger strategy to maintain knowledge bases developed via dat a migration. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS (LNAI), vol. 3808, pp. 206–217. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Curé, O.: Ontology Interaction with a Patient Electronic Health Record. In: Proceedings of 18th IEEE Symposium on Computer-Based Medical Systems, pp. 185–190 (2005)Google Scholar
  10. 10.
    Curé, O., Squelbut, R.: Integrating data into an OWL Knowledge Base via the DBOM Protplug-in. In: Proceedings of the 9th International Protégé conference (2006)Google Scholar
  11. 11.
    Dean, M., Schreiber, G.: OWL Web Ontology Language Reference. W3C Recommendation (2004)Google Scholar
  12. 12.
    de Bruijn, J., Lara, R., Polleres, A., Fensel, D.: OWL DL vs. OWL flight: conceptual modeling and reasoning for the semantic Web. In: Proceedings of 14th international conference on World Wide Web, pp. 623–632 (2005)Google Scholar
  13. 13.
    Gennari, J., Musen, M., Fergerson, R., Grosso, W., Crubezy, M., Eriksson, H., Noy, N., Tu, S.: The evolution of protege: an environment for knowledge - based systems development. International Journal of Human - Computer Studies 123, 58–89 (2003)Google Scholar
  14. 14.
    Hepp, M., de Bruijn, J.: GenTax: a gerenric methodology for deriving OWL and RDF-S ontologies from hierarchical classifications thesauri, and inconsistent taxonomies. In: Proceedings of the European Semantic Web Conference (to appear, 2007)Google Scholar
  15. 15.
    Horrocks, I., Sattler, U.: A Tableaux Decision Procedure for SHOIQ. In: Proc. of IJCAI 2005, pp. 448–453 (2005)Google Scholar
  16. 16.
    Kanellakis, P.C.: Elements of relational database theory. In: Handbook of theoretical computer science (vol. B): formal models and semantics, pp. 1073–1156. MIT Press, Cambridge (1990)Google Scholar
  17. 17.
    Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational databases. In: Proceedings of the 16th International World Wide Web Conference, to appear (to appear 2007)Google Scholar
  18. 18.
    Quinlan, J.R.: Induction of Decision Trees. In: Readings in Machine Learning, pp. 81–106. Morgan Kaufamn, San Francisco (1990)Google Scholar
  19. 19.
    Reiter, R.: What Should a Database Know? Journal of Logic Programming 14(1-2), 127–153 (1992)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. Journal of Data Semantics IV, 146–171 (2005)CrossRefGoogle Scholar
  21. 21.
    Taylor, M., Stoffel, K., Hendler, J.: Ontology-based Induction of High Level Classification Rules. Research Issues on Data Mining and Knowledge Discovery (DMKD) (1997)Google Scholar
  22. 22.
    WHO Collaborating Centre for Drug Statistics Methodology URL of Web site : http://www.whocc.no/atcddd/
  23. 23.
    Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: ICDT 2003. LNCS, vol. 2572, pp. 378–393. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Olivier Curé
    • 1
  • Robert Jeansoulin
    • 2
  1. 1.S3IS Université Paris Est, Marne-la-ValléeFrance
  2. 2.IGM Université Paris Est, Marne-la-ValléeFrance

Personalised recommendations