Ranking the Uniformity of Interval Pairs

  • Jussi Kujala
  • Tapio Elomaa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5211)


We study the problem of finding the most uniform partition of the class label distribution on an interval. This problem occurs, e.g., in supervised discretization of continuous features, where evaluation heuristics need to find the location of the best place to split the current feature. The weighted average of empirical entropies of the interval label distributions is often used in this task. We observe that this rule is suboptimal, because it prefers short intervals too much. Therefore, we proceed to study alternative approaches. A solution that is based on compression turns out to be the best in our empirical experiments. We also study how these alternative methods affect the performance of classification algorithms.


Concave Function Split Point Empirical Frequency Label Distribution Minimum Description Length Principle 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)Google Scholar
  2. 2.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)Google Scholar
  3. 3.
    Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41, 77–93 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Simoudis, E., Han, J.W., Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 114–119. AAAI Press, Menlo Park (1996)Google Scholar
  5. 5.
    Hand, D.J., Yu, K.: Idiot Bayes? not so stupid after all. International Statistical Review 69, 385–398 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)CrossRefGoogle Scholar
  7. 7.
    Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)zbMATHCrossRefGoogle Scholar
  8. 8.
    Bouckaert, R.R.: Naive Bayes classifiers that perform well with continuous variables. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1089–1094. Springer, Heidelberg (2004)Google Scholar
  9. 9.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Prieditis, A., Russell, S. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)Google Scholar
  10. 10.
    Elomaa, T., Rousu, J.: Fast minimum training error discretization. In: Sammut, C., Hoffmann, A.G. (eds.) Machine Learning, Proceedings of the Nineteenth International Conference, pp. 131–138. Morgan Kaufmann, San Francisco (2002)Google Scholar
  11. 11.
    Paninski, L.: Estimation of entropy and mutual information. Neural Computation 15, 1191–1253 (2003)zbMATHCrossRefGoogle Scholar
  12. 12.
    Bialek, W., Nemenman, I. (eds.): Estimation of Entropy and Information of Undersampled Probability Distributions – Theory, Algorithms, and Applications to the Neural Code. Satellite of the Neural Information Processing Systems Conference (NIPS 2003) (2003)Google Scholar
  13. 13.
    Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: Saitta, L. (ed.) Machine Learning, Proceedings of the Thirteenth International Conference, pp. 275–283. Morgan Kaufmann, San Francisco (1996)Google Scholar
  14. 14.
    Domingos, P.: A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 564–569. MIT Press, Cambridge (2000)Google Scholar
  15. 15.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York (1991)zbMATHGoogle Scholar
  16. 16.
    Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)Google Scholar
  17. 17.
    Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1034–1040. Morgan Kaufmann, San Francisco (1995)Google Scholar
  18. 18.
    Wilks, S.S.: Mathematical Statistics. John Wiley & Sons, New York (1962)zbMATHGoogle Scholar
  19. 19.
    Zhu, M., Lu, A.Y.: The counter-intuitive non-informative prior for the Bernoulli family. Journal of Statistics Education 12 (2004)Google Scholar
  20. 20.
    Kass, R.E., Wasserman, L.: The selection of prior distributions by formal rules. Journal of the American Statistical Association 91, 1343–1370 (1996)zbMATHCrossRefGoogle Scholar
  21. 21.
    Gelman, A.: Prior distribution. In: Encyclopedia Environmetrics, vol. 3, pp. 1634–1637. John Wiley & Sons, Chichester (2002)Google Scholar
  22. 22.
    Kearns, M.J., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences 58, 109–128 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Kujala, J., Elomaa, T.: Improved algorithms for univariate discretization of continuous features. In: Kok, J.N., Koronacki, J., López de Mántaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 188–199. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jussi Kujala
    • 1
  • Tapio Elomaa
    • 1
  1. 1.Department of Software SystemsTampere University of TechnologyTampereFinland

Personalised recommendations