Ranking the Uniformity of Interval Pairs

Kujala, Jussi; Elomaa, Tapio

doi:10.1007/978-3-540-87479-9_60

Jussi Kujala¹ &
Tapio Elomaa¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5211))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5629 Accesses

Abstract

We study the problem of finding the most uniform partition of the class label distribution on an interval. This problem occurs, e.g., in supervised discretization of continuous features, where evaluation heuristics need to find the location of the best place to split the current feature. The weighted average of empirical entropies of the interval label distributions is often used in this task. We observe that this rule is suboptimal, because it prefers short intervals too much. Therefore, we proceed to study alternative approaches. A solution that is based on compression turns out to be the best in our empirical experiments. We also study how these alternative methods affect the performance of classification algorithms.

Download to read the full chapter text

Chapter PDF

Ultrametricity indices for the Euclidean and Boolean hypercubes

Article 01 October 2016

P. E. Bradley

Phi-divergence Test Statistics Applied to Latent Class Models for Binary Data

Order Distances and Split Systems

Article Open access 20 September 2021

Vincent Moulton & Andreas Spillner

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41, 77–93 (2004)
Article MATH MathSciNet Google Scholar
Kohavi, R., Sahami, M.: Error-based and entropy-based discretization of continuous features. In: Simoudis, E., Han, J.W., Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 114–119. AAAI Press, Menlo Park (1996)
Google Scholar
Hand, D.J., Yu, K.: Idiot Bayes? not so stupid after all. International Statistical Review 69, 385–398 (2001)
Article MATH Google Scholar
Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)
Article Google Scholar
Domingos, P., Pazzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Bouckaert, R.R.: Naive Bayes classifiers that perform well with continuous variables. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 1089–1094. Springer, Heidelberg (2004)
Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Prieditis, A., Russell, S. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Elomaa, T., Rousu, J.: Fast minimum training error discretization. In: Sammut, C., Hoffmann, A.G. (eds.) Machine Learning, Proceedings of the Nineteenth International Conference, pp. 131–138. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Paninski, L.: Estimation of entropy and mutual information. Neural Computation 15, 1191–1253 (2003)
Article MATH Google Scholar
Bialek, W., Nemenman, I. (eds.): Estimation of Entropy and Information of Undersampled Probability Distributions – Theory, Algorithms, and Applications to the Neural Code. Satellite of the Neural Information Processing Systems Conference (NIPS 2003) (2003)
Google Scholar
Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: Saitta, L. (ed.) Machine Learning, Proceedings of the Thirteenth International Conference, pp. 275–283. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Domingos, P.: A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 564–569. MIT Press, Cambridge (2000)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York (1991)
MATH Google Scholar
Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Google Scholar
Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1034–1040. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Wilks, S.S.: Mathematical Statistics. John Wiley & Sons, New York (1962)
MATH Google Scholar
Zhu, M., Lu, A.Y.: The counter-intuitive non-informative prior for the Bernoulli family. Journal of Statistics Education 12 (2004)
Google Scholar
Kass, R.E., Wasserman, L.: The selection of prior distributions by formal rules. Journal of the American Statistical Association 91, 1343–1370 (1996)
Article MATH Google Scholar
Gelman, A.: Prior distribution. In: Encyclopedia Environmetrics, vol. 3, pp. 1634–1637. John Wiley & Sons, Chichester (2002)
Google Scholar
Kearns, M.J., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences 58, 109–128 (1999)
Article MATH MathSciNet Google Scholar
Kujala, J., Elomaa, T.: Improved algorithms for univariate discretization of continuous features. In: Kok, J.N., Koronacki, J., López de Mántaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 188–199. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Systems, Tampere University of Technology, P.,O. Box 553, FI-33101, Tampere, Finland
Jussi Kujala & Tapio Elomaa

Authors

Jussi Kujala
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Walter Daelemans Bart Goethals Katharina Morik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kujala, J., Elomaa, T. (2008). Ranking the Uniformity of Interval Pairs. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_60

Download citation

DOI: https://doi.org/10.1007/978-3-540-87479-9_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87478-2
Online ISBN: 978-3-540-87479-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ranking the Uniformity of Interval Pairs

Abstract

Chapter PDF

Similar content being viewed by others

Ultrametricity indices for the Euclidean and Boolean hypercubes

Phi-divergence Test Statistics Applied to Latent Class Models for Binary Data

Order Distances and Split Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Ranking the Uniformity of Interval Pairs

Abstract

Chapter PDF

Similar content being viewed by others

Ultrametricity indices for the Euclidean and Boolean hypercubes

Phi-divergence Test Statistics Applied to Latent Class Models for Binary Data

Order Distances and Split Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation