Advertisement

Exact and Approximate Minimal Pattern Mining

  • Arnaud Soulet
  • François Rioult
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 665)

Abstract

Condensed representations have been studied extensively for 15 years. In particular, the maximal patterns of the equivalence classes have received much attention with very general proposals. In contrast, the minimal patterns remained in the shadows in particular because they are too numerous and they are difficult to extract. In this paper, we present a generic framework for exact and approximate minimal patterns mining by introducing the concept of minimizable set system. This framework based on set systems addresses various languages such as itemsets or strings, and at the same time, different metrics such as frequency. For instance, the free, \(\delta \)-free and the essential patterns are naturally handled by our approach, just as the minimal strings. Then, for any minimizable set system, we introduce a fast minimality checking method that is easy to incorporate in a depth-first search algorithm for mining the \(\delta \)-minimal patterns. We demonstrate that it is polynomial-delay and polynomial-space. Experiments on traditional benchmarks complete our study by showing that our approach is competitive with the best proposals.

Keywords

Cover Operator Critical Object Pattern Mining Memory Consumption Condensed Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This article has been partially funded by the Hybride project (ANR-11-BS02-0002).

References

  1. Arimura, H., & Uno, T. (2009). Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In SDM (pp. 1087–1098). SIAM.Google Scholar
  2. Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2000). Approximation of frequency queries by means of free-sets. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.), PKDD. LNCS (Vol. 1910, pp. 75–85). Heidelberg: Springer.Google Scholar
  3. Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1), 5–22.MathSciNetCrossRefGoogle Scholar
  4. Calders, T., & Goethals, B. (2003). Minimal k-free representations of frequent sets. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003) (pp. 71–82). Heidelberg: Springer.Google Scholar
  5. Calders, T., & Goethals, B. (2005). Depth-first non-derivable itemset mining. In SDM (pp. 250–261).Google Scholar
  6. Calders, T., Rigotti, C., & Boulicaut, J. F. (2004). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Heidelberg: Springer.Google Scholar
  7. Casali, A., Cicchetti, R., & Lakhal, L. (2005). Essential patterns: A perfect cover of frequent patterns. In A. M. Tjoa & J. Trujillo (Eds.), DaWaK. Lecture notes in computer science (Vol. 3589, pp. 428–437). Heidelberg: Springer.Google Scholar
  8. Crémilleux, B., & Boulicaut, J.-F. (2003). Simplest rules characterizing classes generated by \(\delta \)-free sets. In M. Bramer, A. Preece, & F. Coenen (Eds.), Research and development in intelligent systems XIX (pp. 33–46). London: Springer.CrossRefGoogle Scholar
  9. Eiter, T., & Gottlob, G. (2002). Hypergraph transversal computation and related problems in logic and AI. In S. Flesca, S. Greco, G. Ianni, & N. Leone (Eds.), JELIA. Lecture notes in computer science (Vol. 2424, pp. 549–564). Heidelberg: Springer.Google Scholar
  10. Gao, C., Wang, J., He, Y., & Zhou, L. (2008). Efficient mining of frequent sequence generators. In WWW (pp. 1051–1052). ACM.Google Scholar
  11. Gasmi, G., Yahia, S. B., Nguifo, E. M., & Bouker, S. (2007). Extraction of association rules based on literalsets. In Y. Song, J. Eder, & T. M. Nguyen (Eds.), DaWaK. Lecture notes in computer science (Vol. 4654, pp. 293–302). Heidelberg: Springer.Google Scholar
  12. Giacometti, A., Li, D. H., Marcel, P., & Soulet, A. (2013). 20 years of pattern mining: a bibliometric survey. SIGKDD Explorations, 15(1), 41–50.CrossRefGoogle Scholar
  13. Hamrouni, T. (2012). Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intelligent Data Analysis, 16(4), 581–631.Google Scholar
  14. Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery science. Lecture notes in computer science (Vol. 3735, pp. 124–136). Heidelberg: Springer.Google Scholar
  15. Jelassi, M. N., Largeron, C., & Yahia, S. B. (2014). Efficient unveiling of multi-members in a social network. Journal of Systems and Software, 94, 30–38.CrossRefGoogle Scholar
  16. Kryszkiewicz, M. (2005). Generalized disjunction-free representation of frequent patterns with negation. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 63–82.CrossRefzbMATHGoogle Scholar
  17. Li, J., Li, H., Wong, L., Pei, J. & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In AAAI (pp. 409–414).Google Scholar
  18. Liu, B., Hsu, W. & Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).Google Scholar
  19. Liu, G., Li, J., & Wong, L. (2008). A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, 17(1), 35–56.MathSciNetCrossRefGoogle Scholar
  20. Lo, D., Khoo, S. -C., & Li, J. (2008). Mining and ranking generators of sequential patterns. In SDM (pp. 553–564). SIAM.Google Scholar
  21. Lo, D., Khoo, S.-C., & Wong, L. (2009). Non-redundant sequential rules-theory and algorithm. Information Systems, 34(4–5), 438–453.CrossRefGoogle Scholar
  22. Mannila, H. & Toivonen, H. (1996). Multiple uses of frequent sets and condensed representations (extended abstract). In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA (pp. 189–194). AAAI Press.Google Scholar
  23. Murakami, K. & Uno, T. (2013). Efficient algorithms for dualizing large-scale hypergraphs. In ALENEX (pp. 1–13).Google Scholar
  24. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1), 25–46.CrossRefzbMATHGoogle Scholar
  25. Rioult, F., Zanuttini, B., & Crémilleux, B. (2010). Nonredundant generalized rules and their impact in classification. In Z. W. Ras & L.-S. Tsay (Eds.), Advances in intelligent information systems. Studies in computational intelligence (Vol. 265, pp. 3–25). Heidelberg: Springer.Google Scholar
  26. Soulet, A., & Crémilleux, B. (2008). Adequate condensed representations of patterns. Data Mining and Knowledge Discovery, 17(1), 94–110.MathSciNetCrossRefGoogle Scholar
  27. Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of EPs and patterns quantified by frequency-based measures. In Post-proceedings of knowledge discovery in inductive databases, pise. Heidelberg: Springer.Google Scholar
  28. Soulet, A., & Rioult, F. (2014). Efficiently depth-first minimal pattern mining. In V. S. Tseng., T. B. Ho., Z. Zhou., A. L. P. Chen., & H. Kao (Eds.), Proceedings 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014, Part I, Tainan, Taiwan, May 13–16, 2014. Lecture notes in computer science (Vol. 8443, pp. 28–39). Heidelberg: Springer.Google Scholar
  29. Szathmary, L., Valtchev, P., Napoli, A., & Godin, R. (2009). Efficient vertical mining of frequent closures and generators. In IDA. LNCS (Vol. 5772, pp. 393–404). Heidelberg: Springer.Google Scholar
  30. Zaki, M.J. (2000). Generating non-redundant association rules. In KDD (pp. 34–43).Google Scholar
  31. Zeng, Z., Wang, J., Zhang, J., & Zhou, L. (2009). FOGGER: an algorithm for graph generator discovery. In EDBT (pp. 517–528).Google Scholar

Copyright information

© Springer International Publishing Switzerland 2017

Authors and Affiliations

  1. 1.LIUniversité François Rabelais ToursToursFrance
  2. 2.GREYCUniversité de Caen NormandieCaenFrance

Personalised recommendations