How to Semantically Enhance a Data Mining Process?

  • Laurent Brisson
  • Martine Collard
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 19)

Abstract

This paper presents the KEOPS data mining methodology centered on domain knowledge integration. KEOPS is a CRISP-DM compliant methodology which integrates a knowledge base and an ontology. In this paper, we focus first on the pre-processing steps of business understanding and data understanding in order to build an ontology driven information system (ODIS). Then we show how the knowledge base is used for the post-processing step of model interpretation. We detail the role of the ontology and we define a part-way interestingness measure that integrates both objective and subjective criteria in order to eval model relevance according to expert knowledge. We present experiments conducted on real data and their results.

Keywords

Data mining Knowledge integration Ontology Driven Information System 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-dm 1.0: Step-by-step data mining guide. In: SPSS Inc. (2000)Google Scholar
  2. 2.
    Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20, 39–61 (2005)CrossRefGoogle Scholar
  4. 4.
    Guarino, N.: Formal Ontology in Information Systems. IOS Press, Amsterdam (1998); Amended version of previous one in Proceedings of the 1st International Conference, Trento, Italy, June 6-8 (1998)Google Scholar
  5. 5.
    Ceri, S., Fraternali, P.: Designing Database Applications with Objects and Rules: The IDEA Methodology. Series on Database Systems and Applications. Addison-Wesley, Reading (1997)Google Scholar
  6. 6.
    Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Using large linguistic ontologies for gathering information resources from the web. Technical report, LADSEB-CNR (1998)Google Scholar
  7. 7.
    Penarrubia, A., Fernandez-Caballero, A., Gonzalez, P., Botella, F., Grau, A., Martinez, O.: Ontology-based interface adaptivity in web-based learning systems. In: ICALT 2004: Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT 2004), Washington, DC, USA, pp. 435–439. IEEE Computer Society, Los Alamitos (2004)CrossRefGoogle Scholar
  8. 8.
    Leacock, C., Chodorow, M.: Combining local context with wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: A Lexical Reference System and its Application. MIT Press, Cambridge (1998)Google Scholar
  9. 9.
    Choi, I., Kim, M.: Topic distillation using hierarchy concept tree. In: ACM SIGIR conference, pp. 371–372 (2003)Google Scholar
  10. 10.
    Zhong, J., Zhu, H., Li, J., Yu, Y.: Conceptual graph matching for semantic search. In: ICCS conference, pp. 92–196 (2002)Google Scholar
  11. 11.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI conference, pp. 448–453 (1995)Google Scholar
  12. 12.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)Google Scholar
  13. 13.
    Lin, D.: An information-theoretic definition of similarity. In: ICML conference (1998)Google Scholar
  14. 14.
    Jiang, J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. CoRR cmp-lg/9709008 (1997)Google Scholar
  15. 15.
    Lord, P., Stevens, R., Brass, A., Goble, C.A.: Semantic similarity measures as tools for exploring the gene ontology. In: PSB conference (2003)Google Scholar
  16. 16.
    Schlicker, A., Domingues, F., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006)CrossRefGoogle Scholar
  17. 17.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Closed set based discovery of small covers for association rules. In: Actes des 15émes journées Bases de Données Avancées (BDA 1999), pp. 361–381 (1999)Google Scholar
  18. 18.
    Becker, H.S.: Sociological Work: Method and Substance. Transaction Publishers, U. S (1976)Google Scholar
  19. 19.
    De Leenheer, P., de Moor, A.: Context-driven disambiguation in ontology elicitation. In: Shvaiko, P., Euzenat, J. (eds.) Context and Ontologies: Theory, Practice and Applications, Pittsburgh, Pennsylvania, AAAI, pp. 17–24. AAAI Press, Menlo Park (2005)Google Scholar
  20. 20.
    Berka, P., Bruha, I.: Discretization and grouping: Preprocessing steps for data mining. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 239–245. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  21. 21.
    Srikant, R., Agrawal, R.: Mining generalized association rules. In: VLDB 1995: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 407–419. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar
  22. 22.
    Brisson, L.: Knowledge extraction using a conceptual information system (ExCIS). In: Collard, M. (ed.) ODBIS 2005/2006. LNCS, vol. 4623, pp. 119–134. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  23. 23.
    Imieliński, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39, 58–64 (1996)CrossRefGoogle Scholar
  24. 24.
    Rizzi, S., Bertino, E., Catania, B., Golfarelli, M., Halkidi, M., Terrovitis, M., Vassiliadis, P., Vazirgiannis, M., Vrachnos, E.: Towards a logical model for patterns. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 77–90. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  25. 25.
    Collard, M., Vansnick, J.C.: How to measure interestingness in data mining: a multiple criteria decision analysis approach. In: RCIS, pp. 395–400 (2007)Google Scholar
  26. 26.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets, vol. 19, pp. 17–30 (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Laurent Brisson
    • 1
    • 3
  • Martine Collard
    • 2
    • 4
  1. 1.Institut TELECOM, TELECOM BretagneCNRS UMR 3192 LAB-STICC, Technopôle Brest-Iroise CS 83818Brest Cedex 3France
  2. 2.INRIA Sophia AntipolisSophia AntipolisFrance
  3. 3.Université européenne de BretagneFrance
  4. 4.Université Nice Sophia AntipolisFrance

Personalised recommendations