Abstract
A hierarchical classification framework is proposed for discriminating rare classes in imprecise domains, characterized by rarity (of both classes and cases), noise and low class separability. The devised framework couples the rules of a rule-based classifier with as many local probabilistic generative models. These are trained over the coverage of the corresponding rules to better catch those globally rare cases/classes that become less rare in the coverage. Two novel schemes for tightly integrating rule-based and probabilistic classification are introduced, that classify unlabeled cases by considering multiple classifier rules as well as their local probabilistic counterparts. An intensive evaluation shows that the proposed framework is competitive and often superior in accuracy w.r.t. established competitors, while overcoming them in dealing with rare classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of Int. Conf. on Very Large Data Bases, pp. 487–499 (1994)
Antonie, M.-L., Zaïane, O.R.: Text document categorization by term association. In: Proc. on IEEE Int. Conf. on Data Mining, pp. 19–26 (2002)
Arunasalam, B., Chawla, S.: CCCS: A top-down association classifier for imbalanced class distribution. In: Proc. of ACM SIGKDD KDD, pp. 517–522 (2006)
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Cesario, E., Folino, F., Locane, A., Manco, G., Ortale, R.: Boosting text segmentation via progressive classification. Knowledge and Information Systems 15(3), 285–320 (2008)
Coenen, F.: LUCS KDD implementations of CBA and CMAR (2004)
Cohen, W.W.: Fast effective rule induction. In: Proc. of Int. Conf. on Machine Learning, pp. 115–123 (1995)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proc. of Int. Conf. on Machine Learning, pp. 144–151 (1998)
Han, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of ACM SIGMOD Int. Conf. on Management of data, pp. 1–12 (2000)
Holte, R.C., Acker, L., Porter, B.: Concept learning and the problem of small disjuncts. In: Proc. of Int. Joint Conf. on Artificial Intelligence, pp. 813–818 (1989)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proc. of IEEE Int. Conf. on Data Mining, pp. 369–376 (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proc. of ACM SIGKDD Int. Conf. on Kwnoledge Discovery and Data Mining, pp. 80–86 (1998)
Liu, B., Ma, Y., Wong, C.K.: Improving an association rule based classifier. In: Proc. of Principles of Data Mining and Knowledge Discovery, pp. 504–509 (2000)
Thabtah, F.: A review of associative classification mining. The Knowledge Engineering Review 22(1), 37–65 (2007)
Webb, G., Boughton, J., Wang, Z.: Not so naive bayes: Aggregating one-dependence estimators. Machine Learning 58(1), 5–24 (2005)
Weiss, G.M.: Mining with rarity: A unifying framework. ACM SIGKDD Explorations 6(1), 7–19 (2004)
Xin, X., Han, J.: CPAR: Classification based on predictive association rules. In: Proc. of SIAM Int. Conf. on Data Mining, pp. 331–335 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Costa, G., Guarascio, M., Manco, G., Ortale, R., Ritacco, E. (2009). Rule Learning with Probabilistic Smoothing. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-03730-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)