Proportional k-Interval Discretization for Naive-Bayes Classifiers

Yang, Ying; Webb, Geoffrey I.

doi:10.1007/3-540-44795-4_48

Ying Yang³ &
Geoffrey I. Webb³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Included in the following conference series:

European Conference on Machine Learning

3061 Accesses
53 Citations

Abstract

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.

Download to read the full chapter text

Chapter PDF

EF_Unique: An Improved Version of Unsupervised Equal Frequency Discretization Method

Article 03 March 2018

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Multi-dimensional Bayesian Network Classifier Trees

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Wong, A. K. C., Chiu, D. K. Y.: Synthesizing Statistical Knowledge from Incomplete Mixedmode Data, IEEE Transaction on Pattern Analysis and Machine Intelligence 9, 796–805, 1987
Article Google Scholar
Catlett, Jason: Megainduction: Machine Learning on Very Large Databases, University of Sydney, Australia, 1991
Google Scholar
Catlett, Jason: On Changing Continuous Attributes into Ordered Discrete Attributes, Proceedings of the European Working Session on Learning, 164–178, 1991
Google Scholar
Chmielewski, M. R., Grzymala-Busse, J. W.: Global Discretization of Continuous Attributes as Preprocessing for Machine Learning, Third International Workshop on Rough Sets and Soft Computing, 294–301, 1994
Google Scholar
Pfahringer, Bernhard: Compression-Based Discretization of Continuous Attributes, Proceedings of the Twelfth International Conference on Machine Learning, 1995
Google Scholar
Fayyad, Usama M., Irani, Keki B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1022–1027, 1993
Google Scholar
Kohavi, R., Wolpert, D.: Bias Plus Variance Decomposition for Zero-One Loss Functions, Proceedings of the 13th International Conference on Machine Learning, 275–283, 1996
Google Scholar
Domingos, Pedro, Pazzani, Michael: On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Machine Learning 29, 103–130, 1997
Article MATH Google Scholar
Yang, Yiming, Liu, Xin: A Re-examination of Text Categorization Methods, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 42–49, 1999
Google Scholar
Johnson, Richard, Bhattacharyya, Gouri: Statistics: Principles and Methods, 12–13, 1985
Google Scholar
Cestnik, B.: Estimating Probabilities: A Crucial Task in Machine Learning, Proceedings of the European Conference on Artificial Intelligence, 147–149, 1990
Google Scholar
Dougherty, James, Kohavi, Ron, Sahami, Mehran: Supervised and Unsupervised Discretization of Continuous Features, Proceedings of the Twelfth International Conference on Machine Learning, 194–202, 1995
Google Scholar
Hsu, Chun-Nan, Huang, Hung-Ju, Wong, Tzu-Tsung: Why Discretization works for Naive Bayesian Classifiers, Machine Learning, Proceedings of the Seventeenth International Conference, 309–406, 2000
Google Scholar
Quinlan, J. Ross: C4.5: Programs for Machine Learning, 1993
Google Scholar
Blake, C. L., Merz, C. J.: UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html], Department of Information and Computer Science, University of California, Irvine, 1998
Google Scholar
Bay, S. D.: The UCI KDD Archive [http://kdd.ics.uci.edu], Department of Information and Computer Science, University of California, Irvine, 1999
Google Scholar
Domingos, Pedro, Pazzani, Michael: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier, Proceedings of the Thirteenth International Conference on Machine Learning, 105–112, 1996
Google Scholar
Webb, Geoffrey I.: MultiBoosting: A Technique for Combining Boosting and Wagging, Machine Learning, 40-2, 159–196, 2000
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Deakin University, Vic3217, Australia
Ying Yang & Geoffrey I. Webb

Authors

Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y., Webb, G.I. (2001). Proportional k-Interval Discretization for Naive-Bayes Classifiers. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_48

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_48
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Proportional k-Interval Discretization for Naive-Bayes Classifiers

Abstract

Chapter PDF

Similar content being viewed by others

EF_Unique: An Improved Version of Unsupervised Equal Frequency Discretization Method

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Multi-dimensional Bayesian Network Classifier Trees

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Proportional k-Interval Discretization for Naive-Bayes Classifiers

Abstract

Chapter PDF

Similar content being viewed by others

EF_Unique: An Improved Version of Unsupervised Equal Frequency Discretization Method

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Multi-dimensional Bayesian Network Classifier Trees

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation