Data Mining and Knowledge Discovery

, Volume 28, Issue 3, pp 808–849 | Cite as

Interesting pattern mining in multi-relational data

Article

Abstract

Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-relational data. Nevertheless, multi-relational data is a more truthful and therefore often also a more powerful representation of reality. Mining patterns of a suitably expressive syntax directly from this representation, is thus a research problem of great importance. In this paper we introduce a novel approach to mining patterns in multi-relational data. We propose a new syntax for multi-relational patterns as complete connected subsets of database entities. We show how this pattern syntax is generally applicable to multi-relational data, while it reduces to well-known tiles “ Geerts et al. (Proceedings of Discovery Science, pp 278–289, 2004)” when the data is a simple binary or attribute-value table. We propose RMiner, a simple yet practically efficient divide and conquer algorithm to mine such patterns which is an instantiation of an algorithmic framework for efficiently enumerating all fixed points of a suitable closure operator “Boley et al. (Theor Comput Sci 411(3):691–700, 2010)”. We show how the interestingness of patterns of the proposed syntax can conveniently be quantified using a general framework for quantifying subjective interestingness of patterns “De Bie (Data Min Knowl Discov 23(3):407–446, 2011b)”. Finally, we illustrate the usefulness and the general applicability of our approach by discussing results on real-world and synthetic databases.

Keywords

Multi-relational data mining Pattern mining Interestingness measures Maximum entropy modelling K-partite graphs 

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499Google Scholar
  2. Angles R, Gutierrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1:1–1:39Google Scholar
  3. Birkhoff G (1967) Lattice theory. American Mathematical Society, ProvidenceMATHGoogle Scholar
  4. Boley M (2011) The efficient discovery of interesting closed pattern collections. PhD thesis, University of Bonn, BonnGoogle Scholar
  5. Boley M, Horvath T, Poigné A, Wrobel S (2010) Listing closed sets of strongly accessible set systems with applications to data mining. Theor Comput Sci 411(3):691–700CrossRefMATHGoogle Scholar
  6. Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577CrossRefMATHGoogle Scholar
  7. Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504CrossRefGoogle Scholar
  8. Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Discov 14(1):171–206CrossRefMathSciNetGoogle Scholar
  9. Cerf L, Besson J, Robardet C, Boulicaut JF (2009) Closed patterns meet n-ary relations. ACM Trans Knowl Discov Data 3(1):3:1–3:36Google Scholar
  10. Cover TM, Thomas JA (2005) Elements of information theory. Wiley, HobokenCrossRefGoogle Scholar
  11. De Bie T (2011a) An information theoretic framework for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 564–572Google Scholar
  12. De Bie T (2011b) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–446CrossRefMATHMathSciNetGoogle Scholar
  13. De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. In: SIGKDD explorations, pp 92–100Google Scholar
  14. De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the SIAM international conference on data mining (SDM), pp 237–248Google Scholar
  15. Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3:7–36CrossRefGoogle Scholar
  16. Elmasri R, Navathe SB (2006) Fundamentals of database systems. Addison Wesley, BostonGoogle Scholar
  17. Garriga GC, Khardon R, De Raedt L (2007) On mining closed sets in multi-relational data. In: Proceedings of the 20th international joint conference on artifical intelligence (IJCAI), pp 804–809Google Scholar
  18. Geerts F, Goethals B, Mielikainen T (2004) Tiling databases. In: Proceedings of discovery science, pp 278–289Google Scholar
  19. Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. In: ACM computing surveys, vol 38. ACM, New YorkGoogle Scholar
  20. Gionis A, Mannila H, Mielikinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3):14CrossRefGoogle Scholar
  21. Goethals B, Le Page W (2008) Mining association rules of simple conjunctive queries. In: Proceedings of the SIAM international conference on data mining (SDM), AtlantaGoogle Scholar
  22. Goethals B, Page WL, Mampaey M (2010) Mining interesting sets and rules in relational databases. In: Proceedings of the ACM symposium on applied computing (SAC), pp 997–1001Google Scholar
  23. Gupta R, Fang G, Field B, Steinbach M, Kumar V (2008) Quantitative evaluation of approximate frequent pattern mining algorithms. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 301–309Google Scholar
  24. Hanhijarvi S, Ojala M, Vuokko N, Puolamaki K, Tatti N, Mannila H (2009) Tell me something i don’t know: randomization strategies for iterative data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 379–388Google Scholar
  25. Jäschke R, Hotho A, Schmitz C, Ganter B, Stumme G (2008) Discovering shared conceptualizations in folksonomies. Web Semant 6(1):38–53CrossRefGoogle Scholar
  26. Jen TY, Laurent D, Spyratos N (2010) Computing supports of conjunctive queries on relational tables with functional dependencies. Fundam Inf 99(3):263–292MATHMathSciNetGoogle Scholar
  27. Ji M, Han J, Danilevsky M (2011) Ranking-based classification of heterogeneous information networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1298–1306Google Scholar
  28. Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: ECML/PKDD (1), pp 570–586Google Scholar
  29. Ji L, Tan KL, Tung AKH (2006) Mining frequent closed cubes in 3d datasets. In: Proceedings of the international conference on very large data bases, VLDB endowment, VLDB, pp 811–822Google Scholar
  30. Kontonasios K, Spyropoulou E, De Bie T (2012) Knowledge discovery interestingness measures based on unexpectedness. In: Wiley interdisciplinary reviews: data mining and knowledge discovery, pp 386–399Google Scholar
  31. Koopman A, Siebes A (2008) Discovering relational item sets efficiently. In: Proceedings of the SIAM conference on data mining (SDM), pp 108–119Google Scholar
  32. Koopman A, Siebes A (2009) Characteristic relational patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 437–446Google Scholar
  33. Korte B, Lovász L (1985) Relations between subclasses of greedoids. Math Methods Oper Res 29:249–267CrossRefGoogle Scholar
  34. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 313–320Google Scholar
  35. Lawler EL, Lenstra JK, Kan AHGR (1980) Generating all maximal independent sets: Np-hardness and polynomial-time algorithms. SIAM J Comput 9(3):558–565CrossRefMATHMathSciNetGoogle Scholar
  36. Makino K, Uno T (2004) New algorithms for enumerating all maximal cliques. In: Scandinavia workshop on algorithm theory (SWAT), pp 260–272Google Scholar
  37. Maruhashi K, Guo F, Faloutsos C (2011) Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In: Proceedings of the international conference on advances in social networks analysis and mining, ASONAM ’11, pp 203–210Google Scholar
  38. Ng EKK, Ng K, Fu AWC, Wang K (2002) Mining association rules from stars. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 322–329Google Scholar
  39. Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: ICDM workshops, pp 1120–1127Google Scholar
  40. Nijssen S, Kok J (2003) Efficient frequent query discovery in FARMER. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 350–362Google Scholar
  41. Ojala M, Garriga GC, Gionis A, Mannila H (2010) Evaluating query result significance in databases via randomizations. In: Proceedings of the SIAM conference on data mining (SDM), pp 906–917Google Scholar
  42. Pardalos PM, Xue J (1994) The maximum clique problem. J Glob Optim 4:301–328CrossRefMATHMathSciNetGoogle Scholar
  43. Poernomo AK, Gopalkrishnan V (2009) Towards efficient mining of proportional fault-tolerant frequent itemsets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 697–706Google Scholar
  44. Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the SIAM conference on data mining (SDM), pp 393–404Google Scholar
  45. Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 675–684Google Scholar
  46. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12Google Scholar
  47. Sun Y, Han J, Aggarwal CC, Chawla NV (2012a) When will it happen?: relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pp 663–672Google Scholar
  48. Sun Y, Norick B, Han J, Yan X, Yu PS, Yu X (2012b) Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: KDD, pp 1348–1356Google Scholar
  49. Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 797–806Google Scholar
  50. Tang L, Wang X, Liu H (2012) Community detection via heterogeneous interaction analysis. Data Min Knowl Discov 25(1):1–33MathSciNetGoogle Scholar
  51. Trabelsi C, Jelassi N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining, pp 231–242Google Scholar
  52. Uno T, Asai T, Uchida Y, Arimura H (2004a) An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery science, pp 16–31Google Scholar
  53. Uno T, Kiyomi M, Arimura H (2004b) Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI), BrightonGoogle Scholar
  54. Voutsadakis G (2002) Polyadic concept analysis. Order 19(3):295–304CrossRefMATHMathSciNetGoogle Scholar
  55. Yahia B, Hamrouni T, Nguifo EM (2006) Frequent closed itemset based algorithms: a thorough structural and analytical survey. SIGKDD Explor Newsl 8(1):93–104CrossRefGoogle Scholar
  56. Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 721–730Google Scholar
  57. Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 286–295Google Scholar
  58. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390CrossRefMathSciNetGoogle Scholar
  59. Zaki M, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478CrossRefGoogle Scholar
  60. Zaki MJ, Peters M, Assent I, Seidl T (2007) Clicks: an effective algorithm for mining subspace clusters in categorical datasets. Data Knowl Eng 60(1):51–70CrossRefGoogle Scholar
  61. Zaki M, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the SIAM international conference on data mining (SDM), pp 457–473Google Scholar
  62. Zaki M, Ogihara M (1998) Theoretical foundations of association rules. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San DiegoGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Intelligent Systems LaboratoryUniversity of BristolBristolUK
  2. 2.Fraunhofer IAISSchloss BirlinghovenSankt AugustinGermany

Personalised recommendations