Improved Algorithms for Univariate Discretization of Continuous Features

  • Jussi Kujala
  • Tapio Elomaa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4702)


In discretization of a continuous variable its numerical value range is divided into a few intervals that are used in classification. For example, Naïve Bayes can benefit from this processing. A commonly-used supervised discretization method is Fayyad and Irani’s recursive entropy-based splitting of a value range. The technique uses ent-mdl as a model selection criterion to decide whether to accept the proposed split.

We argue that theoretically the method is not always close to ideal for this application. Empirical experiments support our finding. We give a statistical rule that does not use the ad-hoc rule of Fayyad and Irani’s approach to increase its performance. This rule, though, is quite time consuming to compute. We also demonstrate that a very simple Bayesian method performs better than ent-mdl as a model selection criterion.


Continuous Feature Improve Algorithm Discretization Algorithm Model Selection Criterion Bayesian Model Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Prieditis, A., Russell, S. (eds.) Proc. 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco, CA (1995)Google Scholar
  2. 2.
    Hsu, C.N., Huang, H.J., Wong, T.T.: Implications of the Dirichlet assumption for discretization of continuous variables in naive Bayesian classifiers. Machine Learning 53, 235–263 (2003)zbMATHCrossRefGoogle Scholar
  3. 3.
    Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Yang, Y., Webb, G.I.: A comparative study of discretization methods for naive-Bayes classifiers. In: Proc. Pacific Rim Knowledge Acquisition Workshop (PKAW), pp. 159–173 (2002)Google Scholar
  5. 5.
    Elomaa, T., Rousu, J.: Efficient multisplitting revisited: Optima-preserving elimination of partition candidates. Data Mining and Knowledge Discovery 8, 97–126 (2004)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco, CA (1993)Google Scholar
  7. 7.
    Wong, A., Chiu, D.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis 9, 796–805 (1987)CrossRefGoogle Scholar
  8. 8.
    Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) Machine Learning - EWSL-91. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  9. 9.
    Hand, D.J., Yu, K.: Idiot Bayes? not so stupid after all. International Statistical Review 69, 385–398 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI-01 workshop on “Empirical Methods in AI” (2001)Google Scholar
  11. 11.
    Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)zbMATHCrossRefGoogle Scholar
  12. 12.
    Chlebus, B.S., Nguyen, S.H.: On finding optimal discretizations for two attributes. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 537–544. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  13. 13.
    Elomaa, T., Rousu, J.: On decision boundaries of naïve Bayes in continuous domains. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 144–155. Springer, Heidelberg (2003)Google Scholar
  14. 14.
    Călinescu, G., Dumitrescu, A., Karloff, H., Wan, P.J.: Separating points by axis-parallel lines. International Journal of Computational Geometry & Applications 15, 575–590 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Elomaa, T., Kujala, J., Rousu, J.: Approximation algorithms for minimizing empirical error by axis-parallel hyperplanes. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 547–555. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    John, G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proc. 11th Annual Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)Google Scholar
  17. 17.
    Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)zbMATHGoogle Scholar
  18. 18.
    Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Simoudis, E., Han, J.W., Fayyad, U. (eds.) Proc. 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119. AAAI Press, Menlo Park, CA (1996)Google Scholar
  19. 19.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York (1991)zbMATHGoogle Scholar
  20. 20.
    Kerber, R.: Chimerge: Discretization of numeric attributes. In: Proc. 10th National Conference on Artificial Intelligence, pp. 123–128. MIT Press, Cambridge (1992)Google Scholar
  21. 21.
    Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jussi Kujala
    • 1
  • Tapio Elomaa
    • 1
  1. 1.Institute of Software Systems, Tampere University of Technology, P.O. Box 553, FI-33101 TampereFinland

Personalised recommendations