A Survey on Condensed Representations for Frequent Sets

Calders, Toon; Rigotti, Christophe; Boulicaut, Jean-François

doi:10.1007/11615576_4

Toon Calders²¹,
Christophe Rigotti²² &
Jean-François Boulicaut²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

383 Accesses
61 Citations

Abstract

Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, association-based classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 1993, Washington, D.C., USA, pp. 207–216. ACM Press, New York (May 1993)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. Int. Conf. on Very Large Data Bases VLDB 1994, Santiago de Chile, Chile, pp. 487–499. Morgan Kaufmann, San Francisco (September 1994)
Google Scholar
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000)
Chapter Google Scholar
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2(2), 66–75 (2000)
Article Google Scholar
Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 1998, Seattle, USA, pp. 85–93. ACM Press, New York (June 1998)
Google Scholar
Bayardo, R.J., Goethals, B., Zaki, M.J. (eds.): Proc. Int. Workshop on Frequent Itemset Mining Implementations FIMI 2004, Brighton, UK (November 2004)
Google Scholar
Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: a case study on human SAGE data. Genome Biology 12 (2002)
Google Scholar
Besson, J., Robardet, C., Boulicaut, J.-F., Rome, S.: Constraint-based bi-set mining for biologically relevant pattern discovery in microarray data. Intelligent Data Analysis 9(1), 59–82 (2005)
Google Scholar
Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Proc. IEEE Int. Conf. on Data Mining ICDM 2004, Brighton, UK, pp. 35–42. IEEE Computer Press, Los Alamitos (November 2004)
Google Scholar
Boulicaut, J.-F.: Inductive databases and multiple uses of frequent itemsets: the cInQ approach. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 1–23. Springer, Heidelberg (2004)
Chapter Google Scholar
Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 62–73. Springer, Heidelberg (2000)
Chapter Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by mean of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Chapter Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)
Article MathSciNet Google Scholar
Boulicaut, J.-F., Jeudy, B.: Mining free itemsets under constraints. In: Proc. Int. Database Engineering and Application Symposium IDEAS 2001, Grenoble, F, pp. 322–329. IEEE Computer Press, Los Alamitos (July 2001)
Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 1997, Tucson, USA, pp. 255–264. ACM Press, New York (May 1997)
Google Scholar
Bykowski, A., Daurel, T., Méger, N., Rigotti, C.: Integrity constraints over association rules. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 306–323. Springer, Heidelberg (2004)
Chapter Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. ACM Symposium on Principles of Database Systems PODS 2001, Santa Barbara, CA, USA, pp. 267–273. ACM Press, New York (May 2001)
Google Scholar
Bykowski, A., Rigotti, C.: DBC: A condensed representation of frequent patterns for efficient mining. Information Systems 28(8), 949–977 (2003)
Article Google Scholar
Calders, T.: Deducing bounds on the support of itemsets. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 214–233. Springer, Heidelberg (2004)
Chapter Google Scholar
Calders, T., Goethals, B.: Mining all non derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Chapter Google Scholar
Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 71–82. Springer, Heidelberg (2003)
Chapter Google Scholar
Calders, T., Goethals, B.: Depth-first non derivable itemset mining. In: Proc. SIAM Int. Conf. on Data Mining SDM 2005, Newport Beach, USA (2005)
Google Scholar
Crémilleux, B., Boulicaut, J.-F.: Simplest rules characterizing classes generated by delta-free sets. In: Proc. BCS Int. Conf. on Knowledge Based Systems and Applied Artificial Intelligence ES 2002, Cambridge, UK, pp. 33–46. Springer, Heidelberg (December 2002)
Google Scholar
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2003)
Article Google Scholar
De Raedt, L.: Towards query evaluation in inductive databases using version spaces. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 117–134. Springer, Heidelberg (2004)
Chapter Google Scholar
De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering. In: Proc. IEEE Int. Conf. on Data Mining ICDM 2002, Maebashi City, JP, pp. 123–130. IEEE Computer Press, Los Alamitos (December 2002)
Google Scholar
Galambos, J., Simonelli, I.: Bonferroni-type Inequalities with Applications. Springer, Heidelberg (1996)
MATH Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer, Heidelberg (1999)
MATH Google Scholar
Giacometti, A., Laurent, D., Diop, C.T.: Condensed representations for sets of mining queries. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 250–269. Springer, Heidelberg (2004)
Chapter Google Scholar
Goethals, B., Muhonen, J., Toivonen, H.: Mining non derivable association rules. In: Proc. SIAM Int. Conf. on Data Mining SDM 2005, Newport Beach, USA (April 2005)
Google Scholar
Goethals, B., Zaki, M.J. (eds.): Proc. Int. Workshop on Frequent Itemset Mining Implementations FIMI 2003, Melbourne, Florida, USA (November 2003)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 2000, Dallas, Texas, USA, pp. 1–12. ACM Press, New York (May 2000)
Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Article Google Scholar
Jeudy, B., Boulicaut, J.-F.: Using condensed representations for interactive association rule mining. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 225–236. Springer, Heidelberg (2002)
Chapter Google Scholar
Kifer, D., Gehrke, J., Bucila, C., White, W.M.: How to quickly find a witness. In: Proc. ACM Symposium on Principles of Database Systems PODS 2003, San Diego, USA, pp. 272–283. ACM Press, New York (June 2003)
Google Scholar
Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining ICDM 2001, San Jose, USA, pp. 305–312. IEEE Computer Press, Los Alamitos (November 2001)
Google Scholar
Kryszkiewicz, M., Gajek, M.: Concise representation of frequent patterns based on generalized disjunction-free generators. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 159–171. Springer, Heidelberg (2002)
Chapter Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rules mining. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining KDD 1998, New York, USA, pp. 80–86. AAAI Press, Menlo Park (1998)
Google Scholar
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining KDD 1996, Portland, USA, pp. 189–194. AAAI Press, Menlo Park (1996)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Mielikäinen, T.: Summarization Techniques for Pattern Collections in Data Mining. PhD thesis, University of Helsinki, Department of Computer Science. Ph.D. thesis Report A-2005-1 (2005)
Google Scholar
Novelli, N., Cicchetti, R.: Mining functional and embedded dependencies using free sets. In: Actes Bases de Données Avancées BDA 2000, pp. 201–220 (2000)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Pruning closed itemset lattices for association rules. In: Actes Bases de Données Avancées BDA 1998, Hammamet, Tunisie (October 1998)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)
Article Google Scholar
Pei, J., Han, J., Mao, R.: CLOSET an efficient algorithm for mining frequent closed itemsets. In: Proc. SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD 2000, Dallas, USA (May 2000)
Google Scholar
Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Knowledge Discovery in Databases, pp. 229–248. AAAI Press, Menlo Park (1991)
Google Scholar
Rückert, U., Kramer, S.: Generalized version space trees. In: Proc. Int. Workshop on Inductive Databases KDID 2003, Cavtat-Dubrovnik, HR, pp. 119–129. Rudjer Boskovic Institute, Zagreb (2003)
Google Scholar
Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. Int. Conf. on Very Large Data Bases VLDB 1995, Zürich, CH, pp. 432–444. Morgan Kaufmann, San Francisco (September 1995)
Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Proc. Int. Conf. on Very Large Data Bases VLDB 1996, Mumbay, India, pp. 134–145. Morgan Kaufmann, San Francisco (September 1996)
Google Scholar
Wang, J., Han, J.: BIDE: Efficient mining of frequent closed sequences. In: Proc. IEEE Int. Conf. on Data Engineering ICDE 2004, Boston, USA, pp. 79–90. IEEE Computer Press, Los Alamitos (April 2004)
Google Scholar
Wille, R.: Restructuring lattice theory: An approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)
Google Scholar
Xu, A., Lei, H.: LCGMiner: Levelwise closed graph pattern mining from large databases. In: Proc. Int. Conf. on Scientific and Statistical Database Management SSDBM 2004, Santorini Island, EL, pp. 421–422. IEEE Computer Press, Los Alamitos (June 2004)
Google Scholar
Zaki, M.J.: Generating non-redundant association rules. In: Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining SIGKDD 2000, Boston, USA, pp. 34–43. ACM Press, New York (August 2000)
Google Scholar
Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: Proc. SIAM Int. Conf. on Data Mining SDM 2002, Arlington, USA (April 2002)
Google Scholar
Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: Proc. SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD 1998, pp. 1–8 (June 1998)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Antwerp, Belgium
Toon Calders
INSA Lyon, LIRIS CNRS UMR 5205, France
Christophe Rigotti & Jean-François Boulicaut

Authors

Toon Calders
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Rigotti
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt
HIIT, Helsinki University of Technology and, University of Helsinki, Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calders, T., Rigotti, C., Boulicaut, JF. (2006). A Survey on Condensed Representations for Frequent Sets. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_4

Download citation

DOI: https://doi.org/10.1007/11615576_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics