Advertisement

Data Mining and Knowledge Discovery

, Volume 16, Issue 2, pp 221–249 | Cite as

Effective elimination of redundant association rules

  • James Cheng
  • Yiping Ke
  • Wilfred Ng
Article

Abstract

It is well-recognized that the main factor that hinders the applications of Association Rules (ARs) is the huge number of ARs returned by the mining process. In this paper, we propose an effective solution that presents concise mining results by eliminating the redundancy in the set of ARs. We adopt the concept of δ tolerance to define the set of δ-Tolerance ARs (δ-TARs), which is a concise representation for the set of ARs. The notion of δ-tolerance is a relaxation on the closure defined on the support of frequent itemsets, thus allowing us to effectively prune the redundant ARs. We devise a set of inference rules, with which we prove that the set of δ-TARs is a non-redundant representation of ARs. In addition, we prove that the set of ARs that is derived from the δ-TARs by the inference rules is sound and complete. We also develop a compact tree structure called the δ-TAR tree, which facilitates the efficient generation of the δ-TARs and derivation of other ARs. Experimental results verify the efficiency of using the δ-TAR tree to generate the δ-TARs and to query the ARs. The set of δ-TARs is shown to be significantly smaller than the state-of-the-art concise representations of ARs. In addition, the approximation on the support and confidence of the ARs derived from the δ-TARs are highly accurate.

keywords

Association rules Redundancy elimination Concise representation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal CC and Yu PS (2001). A new approach to online generation of association rules. IEEE Trans Knowl Data Eng 13(4): 527–540 CrossRefGoogle Scholar
  2. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM conference on the management of data (SIGMOD)Google Scholar
  3. Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Computational Logic, pp 972–986Google Scholar
  4. Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM conference on the management of data (SIGMOD), pp 85–93Google Scholar
  5. Boulicaut J-F, Bykowski A and Rigotti C (2003). Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Disc (DMKD) 7(1): 5–22 CrossRefMathSciNetGoogle Scholar
  6. Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD), pp 74–85Google Scholar
  7. Ceglar A and Roddick JF (2006). Association mining. ACM Comput Surv (CSUR) 38(2): 5 CrossRefGoogle Scholar
  8. Cheng J, Ke Y, Ng W (2006) δ-Tolerance closed frequent itemsets. In: Proceedings of the 6th IEEE international conference on data mining (ICDM)Google Scholar
  9. Cheng J, Ke Y, Ng W (2007a) FG-Index: towards verification-free query processing on graph databases. In: Proceedings of the 26th ACM conference on the management of data (SIGMOD), pp 857–872Google Scholar
  10. Cheng J, Ke Y, Ng W (2007b) Maintaining frequent closed itemsets over a sliding window. J Intell Inform Syst (JIIS) (to appear)Google Scholar
  11. FIMI Dataset Repository (2003) The FIMI frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/, Accessed on May 2006
  12. Fonseca BM, Golgher PB, Pôssas B, Ribeiro-Neto BA, Ziviani N (2005) Concept-based interactive query expansion. In: Proceedings of the ACM CIKM international conference on information and knowledge management, pp 696–703Google Scholar
  13. Geurts K, Wets G, Brijs T, Vanhoof K (2003) Profiling high frequency accident locations using association rules. In: Proceedings of the 82nd annual transportation research board, p 18Google Scholar
  14. Goethals B, Muhonen J, Toivonen H (2005) Mining non-derivable association rules. In: Proceedings of the SIAM international conference on data mining (SDM)Google Scholar
  15. Jeudy B, Boulicaut J-F (2002) Using condensed representations for interactive association rule mining. In: Proceedings of the 6th European conferences on principles and practice of knowledge discovery in databases (PKDD), pp 225–236Google Scholar
  16. Kryszkiewicz M (1998) Representative association rules and minimum condition maximum consequence association rules. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD), pp 361–369Google Scholar
  17. Kumar N, Gangopadhyay A and Karabatis G (2007). Supporting mobile decision making with association rules and multi-layered caching. Decis Support Syst 43(1): 16–30 CrossRefGoogle Scholar
  18. Li G, Hamilton HJ (2004) Basic association rules. In: Proceedings of the SIAM international conference on data mining (SDM)Google Scholar
  19. Li W, Han J, Pei J (2001) Cmar: accurate and efficient classification based on multiple class- association rules. In: ICDM ’01: Proceedings of the 2001 IEEE international conference on data mining, Washington, DC, USA, pp 369–376. IEEE Computer Society. ISBN 0-7695-1119-8Google Scholar
  20. Liu B, Hsu W, Ma Y (2001) Identifying non-actionable association rules. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  21. Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Disc (DMKD) 15(2):107–296Google Scholar
  22. Palshikar GK, Kale MS and Apte MM (2007). Association rules mining using heavy itemsets. Data Knowl Eng 61(1): 93–113 CrossRefGoogle Scholar
  23. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the international conference on database theory (ICDT)Google Scholar
  24. Pasquier N, Taouil R, Bastide Y, Stumme G and Lakhal L (2005). Generating a condensed representation for association rules. J Intell Inform Syst (JIIS) 24(1): 29–60 MATHCrossRefGoogle Scholar
  25. Thabtah FA and Cowling PI (2007). A greedy classification algorithm based on association rule. Appl Soft Comput 7(3): 1102–1111 CrossRefGoogle Scholar
  26. Yang Q, Li T and Wang K (2004). Building association-rule based sequential classifiers for web- document prediction. Data Min Knowl Disc (DMKD) 8(3): 253–273 CrossRefMathSciNetGoogle Scholar
  27. Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: Proceedings of the SDMGoogle Scholar
  28. Zaki MJ (2004). Mining non-redundant association rules. Data Min Knowl Disc 9(3): 223–248 CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyKowloon, Hong KongChina

Personalised recommendations