Advertisement

Data Mining and Knowledge Discovery

, Volume 14, Issue 1, pp 171–206 | Cite as

Non-derivable itemset mining

  • Toon Calders
  • Bart Goethals
Open Access
Article

Abstract

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds on the support of an itemset. Based on these deduction rules, we construct a condensed representation of all frequent itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation. We also present connections between our proposal and recent other proposals for condensed representations of frequent itemsets. Experiments on real-life datasets show the effectiveness of the NDI representation, making the search for frequent non-derivable itemsets a useful and tractable alternative to mining all frequent itemsets.

Keywords

Data mining Itemsets Condensed representation 

References

  1. Agrawal R, Imilienski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Washington, DC, pp 207–216Google Scholar
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. VLDB Int. Conf. Very Large Data Bases, Santiago, Chile, pp 487–499Google Scholar
  3. Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor 2(2):66–75CrossRefGoogle Scholar
  4. Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington, pp 85–93Google Scholar
  5. Bonferroni C (1936) Teoria statistica della classi e calcolo della probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze 8:1–62Google Scholar
  6. Boulicaut J, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowledge Discovery 4:5–22CrossRefMathSciNetGoogle Scholar
  7. Boulicaut J-F, Bykowski A (2000) Frequent closures as a concise representation for binary data mining. In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 62–73Google Scholar
  8. Boulicaut, J.-F., A. Bykowski, and C. Rigotti (2000). Approximation of frequency queries by means of free-sets. In Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 75–85.Google Scholar
  9. Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Proc. PODS Int. Conf. Principles of Database Systems, pp 267–273Google Scholar
  10. Bykowski A, Rigotti C (2003) DBC: a condensed representation of frequent patterns for efficient mining. J Inform Syst 28(8):949–977CrossRefGoogle Scholar
  11. Calders T (2003a) Axiomatization and deduction rules for the frequency of itemsets. Ph. D. thesis, University of Antwerp, BelgiumGoogle Scholar
  12. Calders T (2003b) Deducing bounds on the support of itemsets. In: Database technologies for data mining, vol 2682 of LNCS, pp 214–233, SpringerGoogle Scholar
  13. Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp 74–85. SpringerGoogle Scholar
  14. Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L (eds) Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, vol 2838 of Lecture Notes in Computer Science, pp 71–82. Springer-Verlag.Google Scholar
  15. Calders T, Goethals B (2005a) Depth-first non-derivable itemset mining. In: Proc. SIAM Int. Conf. on Data MiningGoogle Scholar
  16. Calders T, Goethals B (2005b) Quick inclusion–exclusion. In: Proceedings ECML-PKDD 2005 Workshop Knowledge Discovery in Inductive Databases, vol 3933 of LNCS, pp 86–103. SpringerGoogle Scholar
  17. Dexters N, Calders T (2004) Theoretical bounds on the size of condensed representations. In: Proceedings ECML-PKDD 2004 Workshop Knowledge Discovery in Inductive Databases, pp 25–36Google Scholar
  18. Dobra A (2002) Statistical tools for disclosure limitation in multi-way contingency tables. Ph. D. thesis, Department of Statistics, Carnegie Mellon UniversityGoogle Scholar
  19. Dobra A, Fienberg S (2000) Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc Nat Acad Sci 97(22):11885–11892zbMATHCrossRefMathSciNetGoogle Scholar
  20. Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals. UNECE Stat J 18:363–371Google Scholar
  21. Fienberg SE (1998) Fréchet and bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical data protection (SDP-98), pp 115–129. EurostatGoogle Scholar
  22. Fréchet M (1951) Sur les tableaux de correlation dont les marges sont donnés. Ann Univ Lyon Sect A, Series 3 14:53–77Google Scholar
  23. Galambos J, Simonelli I (1996) Bonferroni-type inequalities with applications. SpringerGoogle Scholar
  24. Goethals B, Muhonen J, Toivonen H (2005) Nonderivable association rules. In: Proc. SIAM Int. Conf. on Data MiningGoogle Scholar
  25. Goethals B, Zaki M (2004) Advances in frequent itemset mining implementations: report on fimi’03. SIGKDD Explor Newslett 6(1):109–117CrossRefGoogle Scholar
  26. Groth D, Robertson E (2001) Discovering frequent itemsets in the presence of highly frequent items. In: In Proceedings Workshop on Rule Based Data Mining, in Conjunction with the 14th International Conference On Applications of PrologGoogle Scholar
  27. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Dallas, TX, pp 1–12Google Scholar
  28. Jaroszewicz S, Simivici DA (2002) Support approximations using bonferroni-type inequalities. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp 212–224Google Scholar
  29. Jaroszewicz S, Simivici DA, Rosenberg I (2002) An inclusion-exclusion result for boolean polynomials and its applications in data mining. In: Proc. of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining ConferenceGoogle Scholar
  30. Jordan C, (1927) The foundations of the theory of probability. Mat Phys Lapok 34:109–136Google Scholar
  31. Kahn J, Linial N, Samorodnitsky A (1996) Inclusion-exclusion: Exact and approximate. Combinatorica 16:465–477zbMATHCrossRefMathSciNetGoogle Scholar
  32. Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining, pp 305–312Google Scholar
  33. Kryszkiewicz M, Gajek M (2002a) Concise representation of frequent patterns based on generalized disjunction-free generators In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 159–171Google Scholar
  34. Kryszkiewicz M, Gajek M (2002b) Why to apply generalized disjunction-free generators representation of frequent patterns? In: Proc. International Syposium on Methodologies for Intelligent Systems, pp 382–392Google Scholar
  35. Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in DatabasesGoogle Scholar
  36. Melkman AA, Shimony SE (1997) A note on approximate inclusion-exclusion. Discrete Appl Math 73:23–26zbMATHCrossRefMathSciNetGoogle Scholar
  37. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc. ICDT Int. Conf. Database Theory, pp 398–416Google Scholar
  38. Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594CrossRefGoogle Scholar
  39. Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TXGoogle Scholar
  40. Zaki M, (2000, May/June). Scalable algorithms for association mining. IEEE Trans Knowledge Data Eng 12(3):372–390Google Scholar
  41. Zaki M, Hsiao C (1999) ChARM: an efficient algorithm for closed association rule mining. In: Technical Report 99-10, Computer Science, Rensselaer Polytechnic InstituteGoogle Scholar
  42. Zaki M, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp 283–286. AAAI PressGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Eindhoven Technical UniversityEindhovenThe Netherlands
  2. 2.University of AntwerpAntwerpBelgium

Personalised recommendations