European Conference on Machine Learning

ECML 2001: Machine Learning: ECML 2001 pp 564-575

Proportional k-Interval Discretization for Naive-Bayes Classifiers

  • Ying Yang
  • Geoffrey I. Webb
Conference paper

DOI: 10.1007/3-540-44795-4_48

Volume 2167 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Yang Y., Webb G.I. (2001) Proportional k-Interval Discretization for Naive-Bayes Classifiers. In: De Raedt L., Flach P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science, vol 2167. Springer, Berlin, Heidelberg

Abstract

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Ying Yang
    • 1
  • Geoffrey I. Webb
    • 1
  1. 1.School of Computing and MathematicsDeakin UniversityAustralia