A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes

Vilalta, Ricardo; Rish, Irina

doi:10.1007/978-3-540-39857-8_40

A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes

Ricardo Vilalta¹⁰ &
Irina Rish¹¹

Conference paper

2230 Accesses
20 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2837))

Abstract

We propose a method to improve the probability estimates made by Naive Bayes to avoid the effects of poor class conditional probabilities based on product distributions when each class spreads into multiple regions. Our approach is based on applying a clustering algorithm to each subset of examples that belong to the same class, and to consider each cluster as a class of its own. Experiments on 26 real-world datasets show a significant improvement in performance when the class decomposition process is applied, particularly when the mean number of clusters per class is large.

Download to read the full chapter text

Chapter PDF

References

Blake, C.L., Merz, C.J.: UCI, Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley, Chichester (2001)
MATH Google Scholar
Friedman, N., Geiger, D., Goldzmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar
Garg, A., Roth, D.: Understanding Probabilistic Classifiers. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 179–191. Springer, Heidelberg (2001)
Chapter Google Scholar
Holte, R.C., Acker, L.E., Porter, B.W.: Concept Learning and the Problem of Small Disjuncts. In: Eleventh International Joint Conference on Artificial Intelligence, pp. 813–818. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Kohavi, R., Becker, B., Sommerfield, D.: Improving Simple Bayes. In: European Conference on Machine Learning (1997)
Google Scholar
Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tee Hybrid. In: International Conference on Knowledge Discovery and Data Mining (1996)
Google Scholar
Lewis, P.M.: Approximating Probability Distributions to Reduce Storage Requirements. Information and Control 2, 214–225 (1959)
Article MATH MathSciNet Google Scholar
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. John Wiley and Sons, Chichester (1997)
MATH Google Scholar
Rish I., Hellerstein, J., Jayram, T.: An Analysis of Naive Bayes on Low-Entropy Distributions. IBM T.J. Watson Research Center, RC91994 (2001)
Google Scholar
Webb, G.I., Pazzani, M.J.: Adjusted Probability Naive Bayes Induction. In: Tenth Australian Joint Conference on Artificial Intelligence, pp. 285–295. Springer, Heidelberg (1998)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Academic Press, London (2000)
Google Scholar
Zadrozny, B., Elkan, C.: Obtaining Calibrated Probability Estimates From Decision Trees and Naive Bayesian Classifiers. In: International Conference on Machine Learning (2001)
Google Scholar
Zhang, H., Ling, C.X.: Geometric Properties of Naive Bayes in Nominal Domains. In: European Conference on Machine Learning, pp. 588–599 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, 4800 Calhoun Rd., Houston, TX, 77204-3010, USA
Ricardo Vilalta
IBM T.J. Watson Research Center, 19 Skyline Dr., Hawthorne, N.Y., 10532, USA
Irina Rish

Authors

Ricardo Vilalta
View author publications
You can also search for this author in PubMed Google Scholar
Irina Rish
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vilalta, R., Rish, I. (2003). A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-39857-8_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics