# Exact and Approximate Minimal Pattern Mining

## Abstract

Condensed representations have been studied extensively for 15 years. In particular, the maximal patterns of the equivalence classes have received much attention with very general proposals. In contrast, the minimal patterns remained in the shadows in particular because they are too numerous and they are difficult to extract. In this paper, we present a generic framework for *exact* and *approximate* minimal patterns mining by introducing the concept of minimizable set system. This framework based on set systems addresses various languages such as itemsets or strings, and at the same time, different metrics such as frequency. For instance, the free, \(\delta \)-free and the essential patterns are naturally handled by our approach, just as the minimal strings. Then, for any minimizable set system, we introduce a fast minimality checking method that is easy to incorporate in a depth-first search algorithm for mining the \(\delta \)-minimal patterns. We demonstrate that it is polynomial-delay and polynomial-space. Experiments on traditional benchmarks complete our study by showing that our approach is competitive with the best proposals.

## Keywords

Cover Operator Critical Object Pattern Mining Memory Consumption Condensed Representation## Notes

### Acknowledgments

This article has been partially funded by the Hybride project (ANR-11-BS02-0002).

## References

- Arimura, H., & Uno, T. (2009). Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In
*SDM*(pp. 1087–1098). SIAM.Google Scholar - Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2000). Approximation of frequency queries by means of free-sets. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.),
*PKDD*. LNCS (Vol. 1910, pp. 75–85). Heidelberg: Springer.Google Scholar - Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries.
*Data Mining and Knowledge Discovery*,*7*(1), 5–22.MathSciNetCrossRefGoogle Scholar - Calders, T., & Goethals, B. (2003). Minimal k-free representations of frequent sets. In
*Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003)*(pp. 71–82). Heidelberg: Springer.Google Scholar - Calders, T., & Goethals, B. (2005). Depth-first non-derivable itemset mining. In
*SDM*(pp. 250–261).Google Scholar - Calders, T., Rigotti, C., & Boulicaut, J. F. (2004). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.),
*Constraint-based mining and inductive databases*. Lecture notes in computer science (Vol. 3848, pp. 64–80). Heidelberg: Springer.Google Scholar - Casali, A., Cicchetti, R., & Lakhal, L. (2005). Essential patterns: A perfect cover of frequent patterns. In A. M. Tjoa & J. Trujillo (Eds.),
*DaWaK*. Lecture notes in computer science (Vol. 3589, pp. 428–437). Heidelberg: Springer.Google Scholar - Crémilleux, B., & Boulicaut, J.-F. (2003). Simplest rules characterizing classes generated by \(\delta \)-free sets. In M. Bramer, A. Preece, & F. Coenen (Eds.),
*Research and development in intelligent systems XIX*(pp. 33–46). London: Springer.CrossRefGoogle Scholar - Eiter, T., & Gottlob, G. (2002). Hypergraph transversal computation and related problems in logic and AI. In S. Flesca, S. Greco, G. Ianni, & N. Leone (Eds.),
*JELIA*. Lecture notes in computer science (Vol. 2424, pp. 549–564). Heidelberg: Springer.Google Scholar - Gao, C., Wang, J., He, Y., & Zhou, L. (2008). Efficient mining of frequent sequence generators. In
*WWW*(pp. 1051–1052). ACM.Google Scholar - Gasmi, G., Yahia, S. B., Nguifo, E. M., & Bouker, S. (2007). Extraction of association rules based on literalsets. In Y. Song, J. Eder, & T. M. Nguyen (Eds.),
*DaWaK*. Lecture notes in computer science (Vol. 4654, pp. 293–302). Heidelberg: Springer.Google Scholar - Giacometti, A., Li, D. H., Marcel, P., & Soulet, A. (2013). 20 years of pattern mining: a bibliometric survey.
*SIGKDD Explorations*,*15*(1), 41–50.CrossRefGoogle Scholar - Hamrouni, T. (2012). Key roles of closed sets and minimal generators in concise representations of frequent patterns.
*Intelligent Data Analysis*,*16*(4), 581–631.Google Scholar - Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.),
*Discovery science*. Lecture notes in computer science (Vol. 3735, pp. 124–136). Heidelberg: Springer.Google Scholar - Jelassi, M. N., Largeron, C., & Yahia, S. B. (2014). Efficient unveiling of multi-members in a social network.
*Journal of Systems and Software*,*94*, 30–38.CrossRefGoogle Scholar - Kryszkiewicz, M. (2005). Generalized disjunction-free representation of frequent patterns with negation.
*Journal of Experimental and Theoretical Artificial Intelligence*,*17*(1–2), 63–82.CrossRefzbMATHGoogle Scholar - Li, J., Li, H., Wong, L., Pei, J. & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In
*AAAI*(pp. 409–414).Google Scholar - Liu, B., Hsu, W. & Ma, Y. (1998). Integrating classification and association rule mining. In
*KDD*(pp. 80–86).Google Scholar - Liu, G., Li, J., & Wong, L. (2008). A new concise representation of frequent itemsets using generators and a positive border.
*Knowledge and Information Systems*,*17*(1), 35–56.MathSciNetCrossRefGoogle Scholar - Lo, D., Khoo, S. -C., & Li, J. (2008). Mining and ranking generators of sequential patterns. In
*SDM*(pp. 553–564). SIAM.Google Scholar - Lo, D., Khoo, S.-C., & Wong, L. (2009). Non-redundant sequential rules-theory and algorithm.
*Information Systems*,*34*(4–5), 438–453.CrossRefGoogle Scholar - Mannila, H. & Toivonen, H. (1996). Multiple uses of frequent sets and condensed representations (extended abstract). In E. Simoudis, J. Han & U. M. Fayyad (Eds.),
*Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA*(pp. 189–194). AAAI Press.Google Scholar - Murakami, K. & Uno, T. (2013). Efficient algorithms for dualizing large-scale hypergraphs. In
*ALENEX*(pp. 1–13).Google Scholar - Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices.
*Information Systems*,*24*(1), 25–46.CrossRefzbMATHGoogle Scholar - Rioult, F., Zanuttini, B., & Crémilleux, B. (2010). Nonredundant generalized rules and their impact in classification. In Z. W. Ras & L.-S. Tsay (Eds.),
*Advances in intelligent information systems*. Studies in computational intelligence (Vol. 265, pp. 3–25). Heidelberg: Springer.Google Scholar - Soulet, A., & Crémilleux, B. (2008). Adequate condensed representations of patterns.
*Data Mining and Knowledge Discovery*,*17*(1), 94–110.MathSciNetCrossRefGoogle Scholar - Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of EPs and patterns quantified by frequency-based measures. In
*Post-proceedings of knowledge discovery in inductive databases, pise*. Heidelberg: Springer.Google Scholar - Soulet, A., & Rioult, F. (2014). Efficiently depth-first minimal pattern mining. In V. S. Tseng., T. B. Ho., Z. Zhou., A. L. P. Chen., & H. Kao (Eds.),
*Proceedings 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014, Part I, Tainan, Taiwan, May 13–16, 2014*. Lecture notes in computer science (Vol. 8443, pp. 28–39). Heidelberg: Springer.Google Scholar - Szathmary, L., Valtchev, P., Napoli, A., & Godin, R. (2009). Efficient vertical mining of frequent closures and generators. In
*IDA*. LNCS (Vol. 5772, pp. 393–404). Heidelberg: Springer.Google Scholar - Zaki, M.J. (2000). Generating non-redundant association rules. In
*KDD*(pp. 34–43).Google Scholar - Zeng, Z., Wang, J., Zhang, J., & Zhou, L. (2009). FOGGER: an algorithm for graph generator discovery. In
*EDBT*(pp. 517–528).Google Scholar