Relational and Semantic Data Mining

— Invited Talk —
  • Nada Lavrač
  • Anže Vavpetič
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9345)


Inductive Logic Programming (ILP) and Relational Data Mining (RDM) address the task of inducing models or patterns from multi-relational data. One of the established approaches to RDM is propositionalization, characterized by transforming a relational database into a single-table representation. After introducing ILP and RDM, the paper provides an overview of propositionalization algorithms, which have been made publicly available through the web-based ClowdFlows data mining platform. The paper concludes by presenting recent advances in Semantic Data Mining, characterized by exploiting relational background knowledge in the form of domain ontologies in the process of model and pattern construction.


Inductive Logic Programming Relational Data Mining  Semantic Data Mining Propositionalization 



This work was supported by the Slovenian Ministry of Higher Education, Science and Technology [grant number P2-0103], the Slovenian Research Agency [grant number PR-04431], and the SemDM project (Development and application of new semantic data mining methods in life sciences) [grant number J2-5478].


  1. 1.
    Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J.: Explaining mixture models through semantic pattern mining and banded matrix visualization. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 1–12. Springer, Heidelberg (2014) Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
  3. 3.
    Gene Ontology Consortium: the Gene Ontology project in 2008. Nucleic Acids Res. 36(Database-Issue), 440–444 (2008)Google Scholar
  4. 4.
    De Raedt, L.: Logical and relational learning. In: Zaverucha, G., da Costa, A.L. (eds.) SBIA 2008. LNCS (LNAI), vol. 5249, pp. 1–1. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  5. 5.
    Demšar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Džeroski, S., Lavrač, N. (eds.): Relational Data Mining. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  7. 7.
    Flach, P.A., Lachiche, N.: 1BC: a first-order Bayesian classifier. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 92–103. Springer, Heidelberg (1999) CrossRefGoogle Scholar
  8. 8.
    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32, 675–701 (1937)CrossRefGoogle Scholar
  9. 9.
    Hämäläinen, W.: Efficient search for statistically significant dependency rules in binary data. Ph.D. thesis, Department of Computer Science, University of Helsinki, Finland (2010)Google Scholar
  10. 10.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)Google Scholar
  11. 11.
    Knobbe, A.J. (ed.): Multi-Relational Data Mining. Frontiers in Artificial Intelligence and Applications, vol. 145. IOS Press, Amestardam (2005)Google Scholar
  12. 12.
    Kramer, S., Pfahringer, B., Helma, C.: Stochastic propositionalization of non-determinate background knowledge. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  13. 13.
    Kranjc, J., Podpečan, V., Lavrač, N.: ClowdFlows: a cloud based scientific workflow platform. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 816–819. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  14. 14.
    Krogel, M.-A., Rawles, S., Železný, F., Flach, P.A., Lavrač, N., Wrobel, S.: Comparative evaluation of approaches to propositionalization. In: Horváth, T., Yamamoto, A. (eds.) ILP 2003. LNCS (LNAI), vol. 2835, pp. 197–214. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  15. 15.
    Krogel, M.-A., Wrobel, S.: Transformation-based learning using multirelational aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, pp. 142–155. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  16. 16.
    Kuželka, O., Železný, F.: Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Mach. Learn. 83(2), 163–192 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  17. 17.
    Kuželka, O., Železný, F.: Hifi: tractable propositionalization through hierarchical feature construction. In: Železný, F., Lavrač, N. (eds.) Late Breaking Papers, the 18th International Conference on Inductive Logic Programming (2008)Google Scholar
  18. 18.
    Lavrač, N., Džeroski, S., Grobelnik, M.: Learning nonrecursive definitions of relations with LINUS. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 265–281. Springer, Heidelberg (1991) CrossRefGoogle Scholar
  19. 19.
    Lavrač, N., Flach, P.A.: An extended transformation approach to Inductive Logic Programming. ACM Trans. Comput. Logic 2(4), 458–494 (2001)CrossRefGoogle Scholar
  20. 20.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data mining (KDD 1998), pp. 80–86. AAAI Press, August 1998Google Scholar
  21. 21.
    Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 33(Database issue), D54–D58 (2005)CrossRefGoogle Scholar
  22. 22.
    Muggleton, S.: Inverse entailment and Progol. New Gener. Comput. 13(3–4), 245–286 (1995). Special issue on Inductive Logic ProgrammingCrossRefGoogle Scholar
  23. 23.
    Muggleton, S. (ed.): Inductive Logic Programming. Academic Press, London (1992)zbMATHGoogle Scholar
  24. 24.
    Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis (1963)Google Scholar
  25. 25.
    Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)CrossRefGoogle Scholar
  26. 26.
    Perovšek, M., Vavpetič, A., Cestnik, B., Lavrač, N.: A wordification approach to relational data mining. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 141–154. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  27. 27.
    Perovšek, M., Vavpetič, A., Lavrač, N.: A wordification approach to relational data mining: early results. In: Riguzzi, F., Železný, F. (eds.) ILP 2012 Proceedings of Late Breaking Papers of the 22nd International Conference on Inductive Logic Programming, Dubrovnik, Croatia, 17–19 September 2012. CEUR Workshop Proceedings, vol. 975, pp. 56–61. (2012)Google Scholar
  28. 28.
    Perovšek, M., Vavpetič, A., Kranjc, J., Cestnik, B., Lavrač, N.: Wordification: propositionalization by unfolding relational data into bags of words. Expert Syst. Appl. 42(17–18), 6442–6456 (2015)CrossRefGoogle Scholar
  29. 29.
    Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases. AAAI/MIT Press, Menlo Park (1991)Google Scholar
  30. 30.
    Srinivasan, A.: Aleph manual, March 2007.
  31. 31.
    Trajkovski, I., Lavrač, N., Tolar, J.: SEGS: search for enriched gene sets in microarray data. J. Biomed. Inform. 41(4), 588–601 (2008)CrossRefGoogle Scholar
  32. 32.
    Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the SDM-toolkit. Comput. J. 56(3), 304–320 (2013)CrossRefGoogle Scholar
  33. 33.
    Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic data mining of financial news articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 294–307. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  34. 34.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Amsterdam (2011)Google Scholar
  35. 35.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997) CrossRefGoogle Scholar
  36. 36.
    Železný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with RSD. Mach. Learn. 62(1–2), 33–63 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Jožef Stefan InstituteLjubljanaSlovenia
  2. 2.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia
  3. 3.University of Nova GoricaNova GoricaSlovenia

Personalised recommendations