On Decision Boundaries of Naïve Bayes in Continuous Domains

Elomaa, Tapio; Rousu, Juho

doi:10.1007/978-3-540-39804-2_15

Tapio Elomaa¹⁰ &
Juho Rousu¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2838))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2263 Accesses
3 Citations

Abstract

Naïve Bayesian classifiers assume the conditional independence of attribute values given the class. Despite this in practice often violated assumption, these simple classifiers have been found efficient, effective, and robust to noise.

Discretization of continuous attributes in naïve Bayesian classifiers has achieved a lot of attention recently. Continuous attributes need not necessarily be discretized, but it unifies their handling with nominal attributes and can lead to improved classifier performance.

We show that optimal partitioning results from decision tree learning carry over to Naïve Bayes as well. In particular, it sets decision boundaries on borders of segments with equal class frequency distribution. An optimal univariate discretization with respect to the Naïve Bayes rule can be found in linear time but, unfortunately, optimal multivariate optimization is intractable.

Download to read the full chapter text

Chapter PDF

T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes

Article 27 April 2016

Panagiotis Tzirakis & Christos Tjortjis

BEST: a decision tree algorithm that handles missing values

Article 18 April 2020

Cédric Beaulac & Jeffrey S. Rosenthal

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

Article 05 January 2019

Ghattas Badih, Michel Pierre & Boyer Laurent

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Pacific Grove, CA (1984)
Google Scholar
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Chapter Google Scholar
Chlebus, B.S., Nguyen, S.H.: On finding optimal discretizations for two attributes. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 537–544. Springer, Heidelberg (1998)
Chapter Google Scholar
Chu, C.-N., Huang, H.-J., Wong, T.-T.: Why discretization works for naïve Bayesian classifiers. In: Langley, P. (ed.) Proc. Seventeenth International Conference on Machine Learning, pp. 399–406. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Article MATH Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. Twelfth International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)
MATH Google Scholar
Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)
Article MATH Google Scholar
Elomaa, T., Rousu, J.: Generalizing boundary points. In: Proc. Seventeenth National Conf. on Artificial Intelligence, pp. 570–576. MIT Press, Cambridge (2000)
Google Scholar
Elomaa, T., Rousu, J.: Fast minimum error discretization. In: Sammut, C., Hoffmann, A. (eds.) Proc. Nineteenth International Conference on Machine Learning, pp. 131–138. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Elomaa, T., Rousu, J.: Necessary and sufficient pre-processing in numerical range discretization. Knowl. Information Systems, 5 (2003) (in press)
Google Scholar
Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8, 87–102 (1992)
MATH Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Gama, J.: Iterative Bayes. Theor. Comput. Sci. 292, 417–430 (2003)
Article MATH MathSciNet Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proc. Eleventh Annual Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Kononenko, I.: Naive Bayesian classifier and continuous attributes. Informatica 16, 1–8 (1992)
Google Scholar
Kontkanen, P., Myllymõki, P., Silander, T., Tirri, H.: A Bayesian approach to discretization. In: European Symposium on Intelligent Techniques. ELITE Foundation, Aachen, pp. 265–268 (1997)
Google Scholar
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proc. Tenth National Conference on Artificial Intelligence, pp. 223–228. MIT Press, Cambridge (1992)
Google Scholar
Langley, P., Sage, S.: Tractable average-case analysis of naive Bayesian classifiers. In: Bratko, I., Džeroski, S. (eds.) Proc. Sixteenth International Conference on Machine Learning, pp. 220–228. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Ling, C.X., Zhang, H.: The representational power of discrete Bayesian networks. J. Mach. Learn. Res. 3, 709–721 (2002)
Article MathSciNet Google Scholar
Peot, M.A.: Geometric implications of the naive Bayes assumption. In: Horvitz, E., Jensen, F. (eds.) Proc. Twelfth Annual Conference on Uncertainty in Artificial Intelligence, pp. 414–419. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Rousu, J.: Optimal multivariate discretization for naive Bayesian classifiers is NP-hard. Tech. Rep. C-2003-8, Dept. of Computer Science, Univ. of Helsinki (2003)
Google Scholar
Wong, A.K.C., Chiu, D.K.Y.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 9, 796–805 (1987)
Article Google Scholar
Wu, X.: A Bayesian discretizer for real-valued attributes. Computer J. 39, 688–691 (1996)
Article Google Scholar
Yang, Y., Webb, G.I.: Proportional k-interval discretization for naive-Bayes classifiers. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 564–575. Springer, Heidelberg (2001)
Chapter Google Scholar
Yang, Y., Webb, G.I.: No n-disjoint discretization for naive-Bayes classifiers. In: Sammut, C., Hoffmann, A. (eds.) Proc. Nineteenth International Conference on Machine Learning, pp. 666–673. Morgan Kaufmann, San Francisco (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Finland
Tapio Elomaa & Juho Rousu

Authors

Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar
Juho Rousu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elomaa, T., Rousu, J. (2003). On Decision Boundaries of Naïve Bayes in Continuous Domains. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-39804-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

On Decision Boundaries of Naïve Bayes in Continuous Domains

Abstract

Chapter PDF

Similar content being viewed by others

T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes

BEST: a decision tree algorithm that handles missing values

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Decision Boundaries of Naïve Bayes in Continuous Domains

Abstract

Chapter PDF

Similar content being viewed by others

T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes

BEST: a decision tree algorithm that handles missing values

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation