Skip to main content

A Survey on Condensed Representations for Frequent Sets

  • Conference paper
Constraint-Based Mining and Inductive Databases

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

Abstract

Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, association-based classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 1993, Washington, D.C., USA, pp. 207–216. ACM Press, New York (May 1993)

    Chapter  Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. Int. Conf. on Very Large Data Bases VLDB 1994, Santiago de Chile, Chile, pp. 487–499. Morgan Kaufmann, San Francisco (September 1994)

    Google Scholar 

  3. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2(2), 66–75 (2000)

    Article  Google Scholar 

  5. Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 1998, Seattle, USA, pp. 85–93. ACM Press, New York (June 1998)

    Google Scholar 

  6. Bayardo, R.J., Goethals, B., Zaki, M.J. (eds.): Proc. Int. Workshop on Frequent Itemset Mining Implementations FIMI 2004, Brighton, UK (November 2004)

    Google Scholar 

  7. Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: a case study on human SAGE data. Genome Biology 12 (2002)

    Google Scholar 

  8. Besson, J., Robardet, C., Boulicaut, J.-F., Rome, S.: Constraint-based bi-set mining for biologically relevant pattern discovery in microarray data. Intelligent Data Analysis 9(1), 59–82 (2005)

    Google Scholar 

  9. Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Proc. IEEE Int. Conf. on Data Mining ICDM 2004, Brighton, UK, pp. 35–42. IEEE Computer Press, Los Alamitos (November 2004)

    Google Scholar 

  10. Boulicaut, J.-F.: Inductive databases and multiple uses of frequent itemsets: the cInQ approach. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 1–23. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Boulicaut, J.-F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 62–73. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  12. Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by mean of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  13. Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)

    Article  MathSciNet  Google Scholar 

  14. Boulicaut, J.-F., Jeudy, B.: Mining free itemsets under constraints. In: Proc. Int. Database Engineering and Application Symposium IDEAS 2001, Grenoble, F, pp. 322–329. IEEE Computer Press, Los Alamitos (July 2001)

    Google Scholar 

  15. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 1997, Tucson, USA, pp. 255–264. ACM Press, New York (May 1997)

    Google Scholar 

  16. Bykowski, A., Daurel, T., Méger, N., Rigotti, C.: Integrity constraints over association rules. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 306–323. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. ACM Symposium on Principles of Database Systems PODS 2001, Santa Barbara, CA, USA, pp. 267–273. ACM Press, New York (May 2001)

    Google Scholar 

  18. Bykowski, A., Rigotti, C.: DBC: A condensed representation of frequent patterns for efficient mining. Information Systems 28(8), 949–977 (2003)

    Article  Google Scholar 

  19. Calders, T.: Deducing bounds on the support of itemsets. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 214–233. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Calders, T., Goethals, B.: Mining all non derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  21. Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 71–82. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  22. Calders, T., Goethals, B.: Depth-first non derivable itemset mining. In: Proc. SIAM Int. Conf. on Data Mining SDM 2005, Newport Beach, USA (2005)

    Google Scholar 

  23. Crémilleux, B., Boulicaut, J.-F.: Simplest rules characterizing classes generated by delta-free sets. In: Proc. BCS Int. Conf. on Knowledge Based Systems and Applied Artificial Intelligence ES 2002, Cambridge, UK, pp. 33–46. Springer, Heidelberg (December 2002)

    Google Scholar 

  24. De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2003)

    Article  Google Scholar 

  25. De Raedt, L.: Towards query evaluation in inductive databases using version spaces. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 117–134. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering. In: Proc. IEEE Int. Conf. on Data Mining ICDM 2002, Maebashi City, JP, pp. 123–130. IEEE Computer Press, Los Alamitos (December 2002)

    Google Scholar 

  27. Galambos, J., Simonelli, I.: Bonferroni-type Inequalities with Applications. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  28. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  29. Giacometti, A., Laurent, D., Diop, C.T.: Condensed representations for sets of mining queries. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 250–269. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  30. Goethals, B., Muhonen, J., Toivonen, H.: Mining non derivable association rules. In: Proc. SIAM Int. Conf. on Data Mining SDM 2005, Newport Beach, USA (April 2005)

    Google Scholar 

  31. Goethals, B., Zaki, M.J. (eds.): Proc. Int. Workshop on Frequent Itemset Mining Implementations FIMI 2003, Melbourne, Florida, USA (November 2003)

    Google Scholar 

  32. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. ACM Int. Conf. on Management of Data SIGMOD 2000, Dallas, Texas, USA, pp. 1–12. ACM Press, New York (May 2000)

    Google Scholar 

  33. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)

    Article  Google Scholar 

  34. Jeudy, B., Boulicaut, J.-F.: Using condensed representations for interactive association rule mining. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 225–236. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  35. Kifer, D., Gehrke, J., Bucila, C., White, W.M.: How to quickly find a witness. In: Proc. ACM Symposium on Principles of Database Systems PODS 2003, San Diego, USA, pp. 272–283. ACM Press, New York (June 2003)

    Google Scholar 

  36. Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining ICDM 2001, San Jose, USA, pp. 305–312. IEEE Computer Press, Los Alamitos (November 2001)

    Google Scholar 

  37. Kryszkiewicz, M., Gajek, M.: Concise representation of frequent patterns based on generalized disjunction-free generators. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 159–171. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  38. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rules mining. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining KDD 1998, New York, USA, pp. 80–86. AAAI Press, Menlo Park (1998)

    Google Scholar 

  39. Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining KDD 1996, Portland, USA, pp. 189–194. AAAI Press, Menlo Park (1996)

    Google Scholar 

  40. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  41. Mielikäinen, T.: Summarization Techniques for Pattern Collections in Data Mining. PhD thesis, University of Helsinki, Department of Computer Science. Ph.D. thesis Report A-2005-1 (2005)

    Google Scholar 

  42. Novelli, N., Cicchetti, R.: Mining functional and embedded dependencies using free sets. In: Actes Bases de Données Avancées BDA 2000, pp. 201–220 (2000)

    Google Scholar 

  43. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Pruning closed itemset lattices for association rules. In: Actes Bases de Données Avancées BDA 1998, Hammamet, Tunisie (October 1998)

    Google Scholar 

  44. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)

    Article  Google Scholar 

  45. Pei, J., Han, J., Mao, R.: CLOSET an efficient algorithm for mining frequent closed itemsets. In: Proc. SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD 2000, Dallas, USA (May 2000)

    Google Scholar 

  46. Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Knowledge Discovery in Databases, pp. 229–248. AAAI Press, Menlo Park (1991)

    Google Scholar 

  47. Rückert, U., Kramer, S.: Generalized version space trees. In: Proc. Int. Workshop on Inductive Databases KDID 2003, Cavtat-Dubrovnik, HR, pp. 119–129. Rudjer Boskovic Institute, Zagreb (2003)

    Google Scholar 

  48. Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. Int. Conf. on Very Large Data Bases VLDB 1995, Zürich, CH, pp. 432–444. Morgan Kaufmann, San Francisco (September 1995)

    Google Scholar 

  49. Toivonen, H.: Sampling large databases for association rules. In: Proc. Int. Conf. on Very Large Data Bases VLDB 1996, Mumbay, India, pp. 134–145. Morgan Kaufmann, San Francisco (September 1996)

    Google Scholar 

  50. Wang, J., Han, J.: BIDE: Efficient mining of frequent closed sequences. In: Proc. IEEE Int. Conf. on Data Engineering ICDE 2004, Boston, USA, pp. 79–90. IEEE Computer Press, Los Alamitos (April 2004)

    Google Scholar 

  51. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)

    Google Scholar 

  52. Xu, A., Lei, H.: LCGMiner: Levelwise closed graph pattern mining from large databases. In: Proc. Int. Conf. on Scientific and Statistical Database Management SSDBM 2004, Santorini Island, EL, pp. 421–422. IEEE Computer Press, Los Alamitos (June 2004)

    Google Scholar 

  53. Zaki, M.J.: Generating non-redundant association rules. In: Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining SIGKDD 2000, Boston, USA, pp. 34–43. ACM Press, New York (August 2000)

    Google Scholar 

  54. Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: Proc. SIAM Int. Conf. on Data Mining SDM 2002, Arlington, USA (April 2002)

    Google Scholar 

  55. Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: Proc. SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD 1998, pp. 1–8 (June 1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Calders, T., Rigotti, C., Boulicaut, JF. (2006). A Survey on Condensed Representations for Frequent Sets. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_4

Download citation

  • DOI: https://doi.org/10.1007/11615576_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31331-1

  • Online ISBN: 978-3-540-31351-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics