Itemset-Based Variable Construction in Multi-relational Supervised Learning

  • Dhafer Lahbib
  • Marc Boullé
  • Dominique Laurent
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7842)

Abstract

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. We introduce a space of itemset based models in the secondary table and conditional density estimation of the related constructed variables. A prior distribution is defined on this model space, resulting in a parameter-free criterion to assess the relevance of the constructed variables. A greedy algorithm is then proposed in order to explore the space of the considered itemsets. Experiments on multi-relationalal datasets confirm the advantage of the approach.

Keywords

Supervised Learning Multi-Relational Data Mining one-to-many relationship variable selection variable construction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Knobbe, A.J., Blockeel, H., Siebes, A., Van Der Wallen, D.: Multi-Relational Data Mining. In: Proceedings of Benelearn 1999 (1999)Google Scholar
  2. 2.
    Džeroski, S., Lavrač, N.: Relational Data Mining. Springer-Verlag New York, Inc. (2001)Google Scholar
  3. 3.
    Kramer, S., Flach, P.A., Lavrač, N.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 262–286. Springer, New York (2001)CrossRefGoogle Scholar
  4. 4.
    Van Laer, W., De Raedt, L., Džeroski, S.: On multi-class problems and discretization in inductive logic programming. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1997. LNCS, vol. 1325, pp. 277–286. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  5. 5.
    Knobbe, A.J., Ho, E.K.Y.: Numbers in multi-relational data mining. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 544–551. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Alfred, R.: Discretization Numerical Data for Relational Data with One-to-Many Relations. Journal of Computer Science 5(7), 519–528 (2009)CrossRefGoogle Scholar
  7. 7.
    Lachiche, N., Flach, P.A.: A first-order representation for knowledge discovery and Bayesian classification on relational data. In: PKDD 2000 Workshop on Data Mining, Decision Support, Meta-learning and ILP, pp. 49–60 (2000)Google Scholar
  8. 8.
    Flach, P.A., Lachiche, N.: Naive Bayesian Classification of Structured Data. Machine Learning 57(3), 233–269 (2004)MATHCrossRefGoogle Scholar
  9. 9.
    Ceci, M., Appice, A., Malerba, D.: Mr-SBC: A Multi-relational Naïve Bayes Classifier. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 95–106. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Lahbib, D., Boullé, M., Laurent, D.: Informative variables selection for multi-relational supervised learning. In: Perner, P. (ed.) MLDM 2011. LNCS, vol. 6871, pp. 75–87. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE (11), 2278–2324 (1998)Google Scholar
  13. 13.
    De Raedt, L., Dehaspe, L.: Mining Association Rules in Multiple Relations. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 125–132. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  14. 14.
    Nijssen, S., Kok, J.N.: Faster association rules for multiple relations. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, vol. (1) (2001)Google Scholar
  15. 15.
    Guo, J., Bian, W., Li, J.: Multi-relational Association Rule Mining with Guidance of User. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), pp. 704–709 (2007)Google Scholar
  16. 16.
    Gu, Y., Liu, H., He, J., Hu, B., Du, X.: MrCAR: A Multi-relational Classification Algorithm Based on Association Rules. In: 2009 International Conference on Web Information Systems and Mining, pp. 256–260 (2009)Google Scholar
  17. 17.
    Crestana-Jensen, V., Soparkar, N.: Frequent itemset counting across multiple tables. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 49–61. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  18. 18.
    Goethals, B., Le Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the 2010 ACM Symposium on Applied Computing, p. 997 (2010)Google Scholar
  19. 19.
    Goethals, B., Laurent, D., Le Page, W., Dieng, C.T.: Mining frequent conjunctive queries in relational databases through dependency discovery. Knowledge and Information Systems 33(3), 655–684 (2012)CrossRefGoogle Scholar
  20. 20.
    Ceci, M., Appice, A.: Spatial associative classification: propositional vs structural approach. Journal of Intelligent Information Systems 27(3), 191–213 (2006)CrossRefGoogle Scholar
  21. 21.
    Ceci, M., Appice, A., Malerba, D.: Emerging pattern based classification in relational data mining. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 283–296. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Boullé, M.: Optimum simultaneous discretization with data grid models in supervised classification A Bayesian model selection approach. Advances in Data Analysis and Classification 3(1), 39–61 (2009)MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Gay, D., Boullé, M.: A bayesian approach for classification rule mining in quantitative databases. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 243–259. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  24. 24.
    Lahbib, D., Boullé, M., Laurent, D.: An evaluation criterion for itemset based variable construction in multi-relational supervised learning. In: Riguzzi, F., Železný, F. (eds.) The 22nd International Conference on Inductive Logic Programming (ILP 2012), Dubrovnik, Croatia (2012)Google Scholar
  25. 25.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)MATHCrossRefGoogle Scholar
  26. 26.
    Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)MathSciNetMATHCrossRefGoogle Scholar
  27. 27.
    Shannon, C.: A mathematical theory of communication. Technical report. Bell Systems Technical Journal (1948)Google Scholar
  28. 28.
    Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)MATHGoogle Scholar
  29. 29.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems 15, pp. 561–568. MIT Press (2003)Google Scholar
  30. 30.
    Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems (NIPS 2006), Number i, pp. 1609–1616. MIT Press, Cambridge (2007)Google Scholar
  31. 31.
    Džeroski, S., Schulze-Kremer, S., Heidtke, K.R., Siems, K., Wettschereck, D., Blockeel, H.: Diterpene Structure Elucidation From 13C NMR Spectra with Inductive Logic Programming. Applied Artificial Intelligence 12(5), 363–383 (1998)CrossRefGoogle Scholar
  32. 32.
    De Raedt, L.: Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract). In: Page, D. (ed.) ILP 1998. LNCS, vol. 1446, pp. 1–8. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  33. 33.
    Srinivasan, A., Muggleton, S., King, R., Sternberg, M.: Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proceedings of the 4th International Workshop on ILP, pp. 217–232 (1994)Google Scholar
  34. 34.
    Tomečková, M., Rauch, J., Berka, P.: STULONG - Data from a Longitudinal Study of Atherosclerosis Risk Factors. In: ECML/PKDD 2002 Discovery Challenge Workshop Notes (2002)Google Scholar
  35. 35.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Dhafer Lahbib
    • 1
  • Marc Boullé
    • 1
  • Dominique Laurent
    • 2
  1. 1.Orange Labs - 2LannionFrance
  2. 2.ETIS-CNRS-Université de Cergy Pontoise-ENSEACergy PontoiseFrance

Personalised recommendations