# Mining skypatterns in fuzzy tensors

- 205 Downloads

**Part of the following topical collections:**

## Abstract

Many data mining tasks rely on pattern mining. To identify the patterns of interest in a dataset, an analyst may define several measures that score, in different ways, the relevance of a pattern. Until recently, most algorithms have only handled constraints in an efficient way, i.e., every measure had to be associated with a user-defined threshold, which can be tricky to determine. *Skypatterns* were introduced to allow analysts to simply define the measures of interest, and to get as a result a set of globally optimal and semantically relevant patterns. Skypatterns are Pareto-optimal patterns: no other pattern scores better on one of the chosen measures and scores at least as well on every remaining measure. This article tackles the search of the skypatterns in a more general context than the 0/1 (aka Boolean) matrix: the *fuzzy tensor*. The proposed solution supports a large class of measures. After explaining why and how their common mathematical property enables a safe pruning of the search space, an algorithm is presented. It builds upon multidupehack, a generalist pattern mining framework, which is now able to efficiently list skypatterns in addition to enforcing constraints on them. Experiments on two real-world fuzzy tensors illustrate the versatility of the proposal. Other experiments show it is typically more than one order of magnitude faster than the state-of-the-art algorithms, which can only mine 0/1 matrices.

## Keywords

Pattern mining Skypattern Fuzzy tensor Search space pruning## Notes

### Acknowledgements

We would like to thank Willy Ugarte, Bruno Crémilleux, Chedy Raïssi and Benjamin Négrevergne for providing the source codes of their algorithms and for their valuable comments.

## References

- Bistarelli S, Bonchi F (2007) Soft constraint based pattern mining. Data Knowl Eng 62(1):118–137CrossRefGoogle Scholar
- Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: ICDE’01: proceedings of the 17th international conference on data engineering. IEEE Computer Society, pp 421–430Google Scholar
- Cerf L, Meira Jr. W (2014) Complete discovery of high-quality patterns in large numerical tensors. In: ICDE’14: proceedings of the 30th international conference on data engineering. IEEE Computer Society, pp 448–459Google Scholar
- Cerf L, Besson J, Robardet C, Boulicaut J-F (2009) Closed patterns meet $n$-ary relations. ACM Trans Knowl Discov Data 3(1):1–36CrossRefGoogle Scholar
- Coussat A, Nadisic N, Cerf L (2018) Mining high-utility patterns in uncertain tensors. In: KES’18: proceedings of the 22nd international conference on knowledge-based and intelligent information & engineering systems. Elsevier, pp 403–412Google Scholar
- Goyal V, Sureka A, Patel D (2015) Efficient skyline itemsets mining. In: C3S2E’15: proceedings of the eighth international C* conference on computer science & software engineering. ACM Press, pp 119–124Google Scholar
- Lin JC-W, Yang L, Fournier-Viger P, Dawar S, Goyal V, Sureka A, Vo B (2016) A more efficient algorithm to mine skyline frequent-utility patterns. In: ICGEC’16: proceedings of the tenth international conference on genetic and evolutionary computing, pp 127–135Google Scholar
- Négrevergne B, Dries A, Guns T, Nijssen S (2013) Dominance programming for itemset mining. In: ICDM’13: proceedings of the 13th international conference on data mining. IEEE Computer Society, pp 557–566Google Scholar
- Papadopoulos AN, Lyritsis A, Manolopoulos Y (2008) SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Discov 17(1):57–76MathSciNetCrossRefGoogle Scholar
- Soulet A, Crémilleux B (2005) Exploiting virtual patterns for automatically pruning the search space. In: KDID’05: Proceedings of the fourth international workshop on knowledge discovery in inductive databases. Springer, pp 202–221Google Scholar
- Soulet A, Crémilleux B (2009) Mining constraint-based patterns using automatic relaxation. Intell Data Anal 13(1):109–133CrossRefGoogle Scholar
- Soulet A, Raïssi C, Plantevit M, Crémilleux B (2011) Mining dominant patterns in the sky. In: ICDM’11: proceedings of the 11th international conference on data mining. IEEE Computer Society, pp 655–664Google Scholar
- Ugarte W, Boizumault P, Loudni S, Crémilleux B (2014a) Computing skypattern cubes. In: ECAI’14: proceedings of the 21st European conference on artificial intelligence. IOS Press, pp 903–908Google Scholar
- Ugarte W, Boizumault P, Loudni S, Crémilleux B, Lepailleur A (2014b) Mining (soft-) skypatterns using dynamic CSP. In: CPAIOR’14: proceedings of the 11th international conference on integration of AI and OR techniques in constraint programming. Springer, pp 71–87Google Scholar
- Ugarte W, Boizumault P, Crémilleux B, Lepailleur A, Loudni S, Plantevit M, Raïssi C, Soulet A (2017) Skypattern mining: from pattern condensed representations to dynamic constraint satisfaction problems. Artif Intell 244:48–69MathSciNetCrossRefzbMATHGoogle Scholar
- van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. In: ECML PKDD’13: proceeding of the European conference on machine learning and knowledge discovery in databases. Springer, pp 272–287Google Scholar
- Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–663CrossRefGoogle Scholar