Journal of Computer Science and Technology

, Volume 23, Issue 1, pp 77–102 | Cite as

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Regular Paper

Abstract

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.

Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.

In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.

Keywords

generalized association rules frequent generalized itemsets redundancy avoidance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

References

  1. [1]
    Hipp J, Myka A, Wirth R, Güntzer U. A new algorithm for faster mining of generalized association rules. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Nantes, France, 1998, pp.74–82.Google Scholar
  2. [2]
    Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Paris, France, 2004, pp.60–63.Google Scholar
  3. [3]
    Srikant R, Agrawal R. Mining generalized association rules. In Proc. International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995, pp.407–419.Google Scholar
  4. [4]
    Sriphaew K, Theeramunkong T. A new method for finding generalized frequent itemsets in generalized association rule mining. In Proc. International Symposium on Computers and Communications (ISCC), Taormina, Italy, 2002, pp.1040–1045.Google Scholar
  5. [5]
    Sriphaew K, Theeramunkong T. Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Transactions on Information and Systems, March 2004, E87-D(3).Google Scholar
  6. [6]
    Sriphaew K, Theeramunkong T. Mining generalized closed frequent itemsets of generalized association rules. In Proc. International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES), Oxford, United Kingdom, 2003, pp.476–484.Google Scholar
  7. [7]
    Bayardo Jr R J. Efficiently mining long patterns from databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Seattle, WA, 1998, pp.85–93.Google Scholar
  8. [8]
    Agarwal R C, Aggarwal C C, Prasad V V V. A tree projection algorithm for generation of frequent item sets. Journal of Parallel Distributed Computing, 2001, 61(3): 350–371.MATHCrossRefGoogle Scholar
  9. [9]
    Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Dallas, TX, 2000, pp.1–12.Google Scholar
  10. [10]
    Lin D I, Kedem Z M. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowledge and Data Engineering (TKDE), 2002, 14(3): 553–566.CrossRefGoogle Scholar
  11. [11]
    Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In Proc. International Conference on Database Theory (ICDT), Jerusalem, Israel, 1999, pp.398–416.Google Scholar
  12. [12]
    Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Dallas, TX, 2000, pp.21–30.Google Scholar
  13. [13]
    Wang K, Tang L, Han J, Liu J. Top down FP-growth for association rule mining. In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan, 2002, pp.334–340.Google Scholar
  14. [14]
    Agrawal R, Imielinski T, Swami A M. Mining association rules between sets of items in large databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Washington DC, 1993, pp.207–216.Google Scholar
  15. [15]
    Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.108–118.Google Scholar
  16. [16]
    Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. International Conference on Data Engineering (ICDE), Heidelberg, Germany, 2001, pp.443–452.Google Scholar
  17. [17]
    Gouda K, Zaki M J. Efficiently mining maximal frequent itemsets. In Proc. International Conference on Data Mining (ICDM), San Jose, CA, 2001, pp.163–170.Google Scholar
  18. [18]
    Xin D, Han J, Yan X, Cheng H. Mining compressed frequent-pattern sets. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.709–720.Google Scholar
  19. [19]
    Yan X, Cheng H, Han J, Xin D. Summarizing itemset patterns: A profile-based approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, 2005, pp.314–323.Google Scholar
  20. [20]
    Calders T, Goethals B. Depth-first non-derivable itemset mining. In Proc. the SIAM International Conference on Data Mining (SDM), Newport Beach, CA, 2005.Google Scholar
  21. [21]
    Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoretic approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Philadelphia, PA, 2006, pp.227–236.Google Scholar
  22. [22]
    Xiong H, Tan P N, Kumar V. Hyperclique pattern discovery. Data Mining and Knowledge Discovery, 2006, 13(2): 219–242.CrossRefMathSciNetGoogle Scholar
  23. [23]
    Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y K, Dubey P. Cache-conscious frequent pattern mining on a modern processor. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.577–588.Google Scholar
  24. [24]
    Han J, Fu Y. Mining multiple-level association rules in large databases. IEEE Trans. Knowledge and Data Engineering (TKDE), 1999, 11(5): 798–805.CrossRefGoogle Scholar
  25. [25]
    Huang Y F, Wu C M. Mining generalized association rules using pruning techniques. In Proc. International Conference on Data Mining (ICDM), Maebashi City, Japan, 2002, pp.227–234.Google Scholar
  26. [26]
    Aggarwal C C, Yu P S. Online generation of association rules. In Proc. International Conference on Data Engineering (ICDE), Orlando, FL, 1998, pp.402–411.Google Scholar
  27. [27]
    Zaki M J. Generating non-redundant association rules. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.34–43.Google Scholar
  28. [28]
    Lui C L, Chung K F. Discovery of generalized association rules with multiple minimum supports. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Lyon, France, 2000, pp.510–515.Google Scholar
  29. [29]
    Tseng M C, Lin W Y. Mining generalized association rules with multiple minimum supports. In Proc. International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Munich, Germany, 2001, pp.11–20.Google Scholar
  30. [30]
    Newman D J, Asuncion A. UCI machine learning repository. University of California, Irvine, 2007, http:mlearn.ics.uci.edu/MLRepository.html.
  31. [31]
    Synthetic Data Generation Code for Associations and Sequential Patterns (IBM Almaden Research Center). http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html.
  32. [32]
    Kunkle D, Zhang D, Cooperman G. Efficient mining of max frequent patterns in a generalized environment. In Proc. International Conference on Information and Knowledge Management (CIKM), Arlington, VA, 2006, pp.810–811.Google Scholar

Copyright information

© Science Press, Beijing, China and Springer Science + Business Media, LLC, USA 2008

Authors and Affiliations

  1. 1.College of Computer and Information ScienceNortheastern UniversityBostonU.S.A.

Personalised recommendations