Skip to main content

Classification Using Association Rules: Weaknesses and Enhancements

  • Chapter
Data Mining for Scientific and Engineering Applications

Part of the book series: Massive Computing ((MACO,volume 2))

Abstract

Existing classification and rule learning algorithms in machine learning mainly use heuristic/greedy search to find a subset of regularities (e.g., a decision tree or a set of rules) in data for classification. In the past few years, extensive research was done in the database community on learning rules using exhaustive search under the name of association rule mining. The objective there is to find all rules in data that satisfy the user-specified minimum support and minimum confidence. Although the whole set of rules may not be used directly for accurate classification, effective and efficient classifiers have been built using the rules. This paper aims to improve such an exhaustive search based classification system CBA (Classification Based on Associations). The main strength of this system is that it is able to use the most accurate rules for classification. However, it also has weaknesses. This paper proposes two new techniques to deal with these weaknesses. This results in remarkably accurate classifiers. Experiments on a set of 34 benchmark datasets show that on average the new techniques reduce the error of CBA by 17% and is superior to CBA on 26 of the 34 datasets. They reduce the error of the decision tree classifier C4.5 by 19%, and improve performance on 29 datasets. Similar good results are also achieved against the existing classification systems, RIPPER, LB and a Naïve-Bayes classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings of VLDB-94, 1994.

    Google Scholar 

  2. K. Ali, S. Manganaris and R. Srikant. Partial Classification Using Association Rules. In Proceedings of KDD-97, 115–118, 1997.

    Google Scholar 

  3. K. Ali, and Pazzani M. Error Reduction through Learning Multiple Descriptions. Machine Learning, 24:3, 1996.

    Google Scholar 

  4. Bayardo, R. J. Brute-force mining of high-confidence classification rules. In Proceedings of KDD-97, 1997.

    Google Scholar 

  5. P. Chan, and J. S. Stolfo. Experiments on multistrategy learning by meta-learning. Proc. Second Intl. Conf. Info. Know. Manag., 314–323, 1993.

    Google Scholar 

  6. P. Clark, and T. Niblitt. The CN2 Induction Algorithm. Machine Learning 3(1), 1989.

    Google Scholar 

  7. W. Cohen. Fast Effective Rule Induction. In Proceedings of ICML-95, 1995.

    Google Scholar 

  8. W. Cohen, and Y. Singer. A Simple, Fast, and Effective Rule Learner. In Proceedings of AAAI-99, 1999.

    Google Scholar 

  9. P. Domingos, and M. Pazzani. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29, 1997.

    MATH  Google Scholar 

  10. G. Dong, X. Zhang, L. Wong, and J. Li. CAEP: Classification by Aggregating Emerging Patterns. In Proceedings of Discovery-Science-99, 1999.

    Google Scholar 

  11. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and Unsupervised Discretization of Continuous Features. In Proceedings of ICML-95, 1995.

    Google Scholar 

  12. R. Duda, and P. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.

    MATH  Google Scholar 

  13. U. Fayyad, and K. Irani. Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In Proceedings of IJCAI-93, 1022–1027, 1993.

    Google Scholar 

  14. Y. Freund, and R. Schapire. Experiments with a New Boosting Algorithm. In Proceedings of ICML-96, 1996.

    Google Scholar 

  15. J. Furnkranz, and G. Widmer. Incremental Reduced Error Pruning. ICML-94, 1994.

    Google Scholar 

  16. R. Kohavi. Scaling up the Accuracy of Naïve-Bayes Classifiers: A Decision-tree Hybrid. In Proceedings of KDD-96, 1996.

    Google Scholar 

  17. R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger. MLC++: A Machine-learning Library in C++. Tools with artificial intelligence, 740–743, 1994.

    Google Scholar 

  18. N. Littlestone, and M. Warmuth. The weighted majority algorithm. Tech. report, UCSC-CRL-89–16: UC. Santa Cruz, 1989.

    MATH  Google Scholar 

  19. B. Liu, W. Hsu, and Y. Ma. Integrating Classification and Association Rule Mining. In Proceedings of KDD-98, 1998.

    Google Scholar 

  20. B. Liu, W. Hsu, and Y. Ma. Mining Association Rules with Multiple Minimum Supports. In Proceedings of KDD-99, 1999.

    Google Scholar 

  21. B. Liu, Y. Ma and C-K. Wong. Improving an Exhaustive Search Based Rule Learner Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), 2000.

    Google Scholar 

  22. H. Lu, and H-Y. Liu. Decision Tables: Scalable Classification Exploring RDBMS Capabilities. VLDB-2000, 2000.

    Google Scholar 

  23. D. Meretkis, and B. Wuthrich. Extending Naïve Bayes Classifiers Using Long Itemsets. In Proceedings of KDD-99, 1999.

    Google Scholar 

  24. C. J. Merz, and P. Murphy. UCI Repository of Machine Learning Database. [http://www.cs.uci.edu/~mlearn], 1996.

    Google Scholar 

  25. R. Michalski. Pattern Recognition as Rule-guided Induction Inference. IEEE action On Pattern Analysis and Machine Intelligence 2, 349–361, 1980.

    Article  MATH  Google Scholar 

  26. P. Murphy and M. Pazzani. Exploring the Decision Forest: an Empirical Investigation of Occam’s Razor in Decision Tree Induction.J. of AI Research 1:257–275, 1994.

    MATH  Google Scholar 

  27. J. R. Quinlan. C4.5: Program for Machine Learning. Morgan Kaufmann, 1992.

    Google Scholar 

  28. J. R. Quinlan. Combining Instance-based and Model-Based Learning. In Proceedings of ICML-94, 1994.

    Google Scholar 

  29. R. Rymon. SE-tree Outperforms Decision Trees in Noisy Domains. In Proceedings of KDD-96, 331–336, 1996.

    Google Scholar 

  30. K. Wang, S. Zhou, and Y. He. Growing Decision Trees on Support-less Association Rules. In Proceedings of KDD-2000, 2000.

    Google Scholar 

  31. G. Webb. Systematic Search for Categorical Attribute-value Data-driven Machine Learning. In Proceedings of Australian conference on Artificial Intelligence, 1993.

    Google Scholar 

  32. D. Wolpert. Stacked Generalization. Neural networks, 5:241–259, 1992.

    Article  Google Scholar 

  33. Z. Zheng and G. Webb. Stochastic Attribute Selection Committees with Multiple Boosting: Learning More Accurate and More Stable Classifier Committees. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-99), 1999.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Liu, B., Ma, Y., Wong, CK. (2001). Classification Using Association Rules: Weaknesses and Enhancements. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1733-7_30

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-0114-7

  • Online ISBN: 978-1-4615-1733-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics