Advertisement

Data Mining and Knowledge Discovery

, Volume 33, Issue 5, pp 1298–1322 | Cite as

Mining skypatterns in fuzzy tensors

  • Nicolas Nadisic
  • Aurélien Coussat
  • Loïc CerfEmail author
Article
  • 205 Downloads
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2019

Abstract

Many data mining tasks rely on pattern mining. To identify the patterns of interest in a dataset, an analyst may define several measures that score, in different ways, the relevance of a pattern. Until recently, most algorithms have only handled constraints in an efficient way, i.e., every measure had to be associated with a user-defined threshold, which can be tricky to determine. Skypatterns were introduced to allow analysts to simply define the measures of interest, and to get as a result a set of globally optimal and semantically relevant patterns. Skypatterns are Pareto-optimal patterns: no other pattern scores better on one of the chosen measures and scores at least as well on every remaining measure. This article tackles the search of the skypatterns in a more general context than the 0/1 (aka Boolean) matrix: the fuzzy tensor. The proposed solution supports a large class of measures. After explaining why and how their common mathematical property enables a safe pruning of the search space, an algorithm is presented. It builds upon multidupehack, a generalist pattern mining framework, which is now able to efficiently list skypatterns in addition to enforcing constraints on them. Experiments on two real-world fuzzy tensors illustrate the versatility of the proposal. Other experiments show it is typically more than one order of magnitude faster than the state-of-the-art algorithms, which can only mine 0/1 matrices.

Keywords

Pattern mining Skypattern Fuzzy tensor Search space pruning 

Notes

Acknowledgements

We would like to thank Willy Ugarte, Bruno Crémilleux, Chedy Raïssi and Benjamin Négrevergne for providing the source codes of their algorithms and for their valuable comments.

References

  1. Bistarelli S, Bonchi F (2007) Soft constraint based pattern mining. Data Knowl Eng 62(1):118–137CrossRefGoogle Scholar
  2. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: ICDE’01: proceedings of the 17th international conference on data engineering. IEEE Computer Society, pp 421–430Google Scholar
  3. Cerf L, Meira Jr. W (2014) Complete discovery of high-quality patterns in large numerical tensors. In: ICDE’14: proceedings of the 30th international conference on data engineering. IEEE Computer Society, pp 448–459Google Scholar
  4. Cerf L, Besson J, Robardet C, Boulicaut J-F (2009) Closed patterns meet $n$-ary relations. ACM Trans Knowl Discov Data 3(1):1–36CrossRefGoogle Scholar
  5. Coussat A, Nadisic N, Cerf L (2018) Mining high-utility patterns in uncertain tensors. In: KES’18: proceedings of the 22nd international conference on knowledge-based and intelligent information & engineering systems. Elsevier, pp 403–412Google Scholar
  6. Goyal V, Sureka A, Patel D (2015) Efficient skyline itemsets mining. In: C3S2E’15: proceedings of the eighth international C* conference on computer science & software engineering. ACM Press, pp 119–124Google Scholar
  7. Lin JC-W, Yang L, Fournier-Viger P, Dawar S, Goyal V, Sureka A, Vo B (2016) A more efficient algorithm to mine skyline frequent-utility patterns. In: ICGEC’16: proceedings of the tenth international conference on genetic and evolutionary computing, pp 127–135Google Scholar
  8. Négrevergne B, Dries A, Guns T, Nijssen S (2013) Dominance programming for itemset mining. In: ICDM’13: proceedings of the 13th international conference on data mining. IEEE Computer Society, pp 557–566Google Scholar
  9. Papadopoulos AN, Lyritsis A, Manolopoulos Y (2008) SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Discov 17(1):57–76MathSciNetCrossRefGoogle Scholar
  10. Soulet A, Crémilleux B (2005) Exploiting virtual patterns for automatically pruning the search space. In: KDID’05: Proceedings of the fourth international workshop on knowledge discovery in inductive databases. Springer, pp 202–221Google Scholar
  11. Soulet A, Crémilleux B (2009) Mining constraint-based patterns using automatic relaxation. Intell Data Anal 13(1):109–133CrossRefGoogle Scholar
  12. Soulet A, Raïssi C, Plantevit M, Crémilleux B (2011) Mining dominant patterns in the sky. In: ICDM’11: proceedings of the 11th international conference on data mining. IEEE Computer Society, pp 655–664Google Scholar
  13. Ugarte W, Boizumault P, Loudni S, Crémilleux B (2014a) Computing skypattern cubes. In: ECAI’14: proceedings of the 21st European conference on artificial intelligence. IOS Press, pp 903–908Google Scholar
  14. Ugarte W, Boizumault P, Loudni S, Crémilleux B, Lepailleur A (2014b) Mining (soft-) skypatterns using dynamic CSP. In: CPAIOR’14: proceedings of the 11th international conference on integration of AI and OR techniques in constraint programming. Springer, pp 71–87Google Scholar
  15. Ugarte W, Boizumault P, Crémilleux B, Lepailleur A, Loudni S, Plantevit M, Raïssi C, Soulet A (2017) Skypattern mining: from pattern condensed representations to dynamic constraint satisfaction problems. Artif Intell 244:48–69MathSciNetCrossRefzbMATHGoogle Scholar
  16. van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. In: ECML PKDD’13: proceeding of the European conference on machine learning and knowledge discovery in databases. Springer, pp 272–287Google Scholar
  17. Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–663CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematics and Operational ResearchUniversity of MonsMonsBelgium
  2. 2.Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR, 5220, U1206LyonFrance
  3. 3.Department of Computer ScienceUniversidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations