Applied Intelligence

, Volume 34, Issue 1, pp 74–86 | Cite as

Building a highly-compact and accurate associative classifier

Article

Abstract

Associative classification has aroused significant research attention in recent years due to its advantage in rule forms with satisfactory accuracy. However, the rules in associative classifiers derived from typical association rule mining (e.g., Apriori-type) may easily become too many to be understood and even be sometimes redundant or conflicting. To deal with these issues of concern, a recently proposed approach (i.e., GARC) appears to be superior to other existing approaches (e.g., C4.5-type, NN, SVM, CBA) in two respects: one is its classification accuracy that is equally satisfactory; the other is the compactness that the generated classifier is constituted with much fewer rules. Along with this line of methodological thinking, this paper presents a novel GARC-type approach, namely GEAR, to build an associative classifier with three distinctive and desirable features. First, the rules in the GEAR classifier are more intuitively appealing; second, the GEAR classification accuracy is improved or at least as good as others; and third, the GEAR classifier is significantly more compact in size. In doing so, a number of notions including rule redundancy and compact set are provided, together with related properties that could be incorporated into the rule mining process as algorithmic pruning strategies. The experimental results with benchmarking datasets also reveal that GEAR outperforms GARC and other approaches in an effective manner.

Keywords

Associative classification Compact set Information gain GARC Association rule 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceeding of 1993 ACM-SIGMOD international conference on management of data, Washington, DC, pp 207–216 Google Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: Proceeding of the 20th VLDB conference, Santiago, Chile, pp 487–499 Google Scholar
  3. 3.
    Ali K, Manganaris K, Srikant R (1997) Partial classification using association rules. In: Proceeding of the third international conference on knowledge discovery and data mining, Newport Beach, California, pp 115–118 Google Scholar
  4. 4.
    Bertsimas D, Freund RM (2000) Data, model, and decisions: the fundamentals of management science. South-Western College Publishing, Cincinnati Google Scholar
  5. 5.
    Berzal F, Cubero J, Sanchez D (2004) ART: a hybrid classification model. Mach Learn 54:67–92 MATHCrossRefGoogle Scholar
  6. 6.
    Chen GQ, Liu HY, Wei Q (2006) A new approach to classification based on association rule mining. Decis Support Syst 42:674–689 CrossRefGoogle Scholar
  7. 7.
    Chen GQ, Wei Q (2002) Fuzzy association rules and the extended mining algorithms. Inf Sci 147:201–228 MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Chen GQ, Zhang X, Yu L (2006) An improved GARC algorithm for building accurate and more understandable associative classifiers. Dyn Contin Discrete Impuls Syst, Ser B, Appl Algorithm 13:161–165 Google Scholar
  9. 9.
    Coenen F, Leng P (2007) The effect of threshold values on association rule based classification accuracy. Data Knowl Eng 60(2):345–360 CrossRefGoogle Scholar
  10. 10.
    Cohen W (1995) Fast effective rule induction. In: Proc 1995 int conf machine learning, Tahoe City, CA, pp 115–123 Google Scholar
  11. 11.
    Conover WJ (1999) Practical nonparametric statistics, 3rd edn. Wiley, New York Google Scholar
  12. 12.
    Dong G, Zhang X, Wong L et al (1999) CAEP: Classification by aggregating emerging patterns. In: Proceedings of the second international conference on discovery science, Tokyo, Japan, pp 30–42 Google Scholar
  13. 13.
    Esposito F et al (eds) (2006) ISMIS 2006, LNAI 4203, pp 591–600 Google Scholar
  14. 14.
    Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifier. Mach Learn 29:131–163 MATHCrossRefGoogle Scholar
  15. 15.
    Geng L, Hamilton HJ (2006) Interesting measures for data mining: A survey. ACM Comput Surv 38(3), Article 9 Google Scholar
  16. 16.
    Harshbarger TR (1977) Introductory statistics: a decision map, 2nd edn. Macmillan, New York, p 376 MATHGoogle Scholar
  17. 17.
    Hong TP, Lin KY, Chien BC (2003) Mining fuzzy multiple-level association rules from quantitative data. Appl Intell 18:79–90 MATHCrossRefGoogle Scholar
  18. 18.
    Kaya M, Alhajj R (2006) Utilizing genetic algorithms to optimize membership functions for fuzzy weighted association rules mining. Appl Intell 24:7–15 CrossRefGoogle Scholar
  19. 19.
    Kaya M, Alhajj R (2007) Online mining of fuzzy multidimensional weighted association rules. Appl Intell. doi: 10.1007/s10489-007-0078-7 Google Scholar
  20. 20.
    Kulkarni AD, Cavanaugh CD (2000) Fuzzy neural network models for classification. Appl Intell 12:207–215 Google Scholar
  21. 21.
    Li W, Han JW, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of ICDM 2001, San Jose, California, pp 369–376 Google Scholar
  22. 22.
    Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz P, Piatetsky-Shapiro G (eds) Proceeding of the fourth int conference on knowledge discovery & data mining, New York, pp 80–86 Google Scholar
  23. 23.
    Liu B, Ma YM, Wong CK (2001) Classification using Association Rules: Weaknesses and Enhancements. In: Kumar V et al (eds), Data mining for scientific and engineering applications, p 591 Google Scholar
  24. 24.
    Liu B, Ma YM, Wong CK (2003) Scoring the data using association rules. Appl Intell 18:119–135 MATHCrossRefGoogle Scholar
  25. 25.
    Meretakis D, Wüthrich B (1999) Extending naïve Bayes classifiers using long itemsets. In: Proceedings of 5th international conference on knowledge discovery and data mining, San Diego, California Google Scholar
  26. 26.
    Merz CJ, Murphy P (1996) UCI repository of machine learning databases. http://www.cs.uci.edu/~mlearn/MLRepository.html
  27. 27.
    Mueller A (1995) Fast sequential and parallel algorithms for association rule mining: a comparison. CS-TR-3515 Google Scholar
  28. 28.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo Google Scholar
  29. 29.
    Quinlan JR, Cameron-Jones RM (1993) FOIL: A midterm report. In: Proc 1993 European conf machine learning, pp 3–20 Google Scholar
  30. 30.
    Shafer J, Agrawal R, Mehta M (1996) SPRINT: a scalable parallel classifier for data mining. In: Proceeding of the 22nd VLDB conference, India, pp 544–555 Google Scholar
  31. 31.
    Sharpe PK, Glover RP (1999) Efficient GA based techniques for classification. Appl Intell 11:277–284 CrossRefGoogle Scholar
  32. 32.
    Shie JD, Chen SM (2008) Feature subset selection based on fuzzy entropy measures for handling classification problems. Appl Intell 28:69–82 CrossRefGoogle Scholar
  33. 33.
    Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Discov 22(2):39–68 CrossRefGoogle Scholar
  34. 34.
    Skowron A, Wang H, Wojna A, Bazan J (2006) Multimodal classification: case studies, transactions on rough sets V: journal subline. Lecture Notes in Computer Science, vol 4100. Springer, Heidelberg, pp 224–239 Google Scholar
  35. 35.
    Vapnik VN (1998) Statistical learning theory. Wiley, New York MATHGoogle Scholar
  36. 36.
    Wang K, Zhou SQ, He Y (2000) Growing decision tree on support-less association rules. In: KDD’00, Boston, MA, USA Google Scholar
  37. 37.
    Yan P, Cornelis C, Zhang X, Chen GQ (2006) Mining positive and negative association rules from large databases. In: IEEE international conference on cybernetics and intelligent systems, CIS 2006 Google Scholar
  38. 38.
    Yin XX, Han JW (2001) CPAR: classification based on predictive association rules. In: ICDM’01, San Jose, CA, Nov Google Scholar
  39. 39.
    Yu L, Chen GQ, Janssens D et al (2004) Dilated chi-square: a novel interestingness measure to build accurate and compact decision list. In: Intelligent information processing, pp 233–237 Google Scholar
  40. 40.
    Zaiane OR, Antonie ML (2005) On pruning and tuning rules for associative classifiers. In: Knowledge-based intelligent information and engineering systems. Lecture notes in computer science, vol 3683. Springer, Berlin/Heidelberg CrossRefGoogle Scholar
  41. 41.
    Zaki MJ (2004) Mining non-redundant association rules. Data Min Knowl Discov 223–248 Google Scholar
  42. 42.
  43. 43.

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Management Science and Engineering, School of Economics and ManagementTsinghua UniversityBeijingChina

Personalised recommendations