Abstract
Existing classification and rule learning algorithms in machine learning mainly use heuristic/greedy search to find a subset of regularities (e.g., a decision tree or a set of rules) in data for classification. In the past few years, extensive research was done in the database community on learning rules using exhaustive search under the name of association rule mining. The objective there is to find all rules in data that satisfy the user-specified minimum support and minimum confidence. Although the whole set of rules may not be used directly for accurate classification, effective and efficient classifiers have been built using the rules. This paper aims to improve such an exhaustive search based classification system CBA (Classification Based on Associations). The main strength of this system is that it is able to use the most accurate rules for classification. However, it also has weaknesses. This paper proposes two new techniques to deal with these weaknesses. This results in remarkably accurate classifiers. Experiments on a set of 34 benchmark datasets show that on average the new techniques reduce the error of CBA by 17% and is superior to CBA on 26 of the 34 datasets. They reduce the error of the decision tree classifier C4.5 by 19%, and improve performance on 29 datasets. Similar good results are also achieved against the existing classification systems, RIPPER, LB and a Naïve-Bayes classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings of VLDB-94, 1994.
K. Ali, S. Manganaris and R. Srikant. Partial Classification Using Association Rules. In Proceedings of KDD-97, 115–118, 1997.
K. Ali, and Pazzani M. Error Reduction through Learning Multiple Descriptions. Machine Learning, 24:3, 1996.
Bayardo, R. J. Brute-force mining of high-confidence classification rules. In Proceedings of KDD-97, 1997.
P. Chan, and J. S. Stolfo. Experiments on multistrategy learning by meta-learning. Proc. Second Intl. Conf. Info. Know. Manag., 314–323, 1993.
P. Clark, and T. Niblitt. The CN2 Induction Algorithm. Machine Learning 3(1), 1989.
W. Cohen. Fast Effective Rule Induction. In Proceedings of ICML-95, 1995.
W. Cohen, and Y. Singer. A Simple, Fast, and Effective Rule Learner. In Proceedings of AAAI-99, 1999.
P. Domingos, and M. Pazzani. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29, 1997.
G. Dong, X. Zhang, L. Wong, and J. Li. CAEP: Classification by Aggregating Emerging Patterns. In Proceedings of Discovery-Science-99, 1999.
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and Unsupervised Discretization of Continuous Features. In Proceedings of ICML-95, 1995.
R. Duda, and P. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
U. Fayyad, and K. Irani. Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In Proceedings of IJCAI-93, 1022–1027, 1993.
Y. Freund, and R. Schapire. Experiments with a New Boosting Algorithm. In Proceedings of ICML-96, 1996.
J. Furnkranz, and G. Widmer. Incremental Reduced Error Pruning. ICML-94, 1994.
R. Kohavi. Scaling up the Accuracy of Naïve-Bayes Classifiers: A Decision-tree Hybrid. In Proceedings of KDD-96, 1996.
R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger. MLC++: A Machine-learning Library in C++. Tools with artificial intelligence, 740–743, 1994.
N. Littlestone, and M. Warmuth. The weighted majority algorithm. Tech. report, UCSC-CRL-89–16: UC. Santa Cruz, 1989.
B. Liu, W. Hsu, and Y. Ma. Integrating Classification and Association Rule Mining. In Proceedings of KDD-98, 1998.
B. Liu, W. Hsu, and Y. Ma. Mining Association Rules with Multiple Minimum Supports. In Proceedings of KDD-99, 1999.
B. Liu, Y. Ma and C-K. Wong. Improving an Exhaustive Search Based Rule Learner Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), 2000.
H. Lu, and H-Y. Liu. Decision Tables: Scalable Classification Exploring RDBMS Capabilities. VLDB-2000, 2000.
D. Meretkis, and B. Wuthrich. Extending Naïve Bayes Classifiers Using Long Itemsets. In Proceedings of KDD-99, 1999.
C. J. Merz, and P. Murphy. UCI Repository of Machine Learning Database. [http://www.cs.uci.edu/~mlearn], 1996.
R. Michalski. Pattern Recognition as Rule-guided Induction Inference. IEEE action On Pattern Analysis and Machine Intelligence 2, 349–361, 1980.
P. Murphy and M. Pazzani. Exploring the Decision Forest: an Empirical Investigation of Occam’s Razor in Decision Tree Induction.J. of AI Research 1:257–275, 1994.
J. R. Quinlan. C4.5: Program for Machine Learning. Morgan Kaufmann, 1992.
J. R. Quinlan. Combining Instance-based and Model-Based Learning. In Proceedings of ICML-94, 1994.
R. Rymon. SE-tree Outperforms Decision Trees in Noisy Domains. In Proceedings of KDD-96, 331–336, 1996.
K. Wang, S. Zhou, and Y. He. Growing Decision Trees on Support-less Association Rules. In Proceedings of KDD-2000, 2000.
G. Webb. Systematic Search for Categorical Attribute-value Data-driven Machine Learning. In Proceedings of Australian conference on Artificial Intelligence, 1993.
D. Wolpert. Stacked Generalization. Neural networks, 5:241–259, 1992.
Z. Zheng and G. Webb. Stochastic Attribute Selection Committees with Multiple Boosting: Learning More Accurate and More Stable Classifier Committees. In Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-99), 1999.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Liu, B., Ma, Y., Wong, CK. (2001). Classification Using Association Rules: Weaknesses and Enhancements. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_30
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1733-7_30
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-0114-7
Online ISBN: 978-1-4615-1733-7
eBook Packages: Springer Book Archive