Data discretization unification

  • Ruoming Jin
  • Yuri Breitbart
  • Chibuike Muoh
Regular Paper


Data discretization is defined as a process of converting continuous data attribute values into a finite set of intervals with minimal loss of information. In this paper, we prove that discretization methods based on informational theoretical complexity and the methods based on statistical measures of data dependency are asymptotically equivalent. Furthermore, we define a notion of generalized entropy and prove that discretization methods based on Minimal description length principle, Gini index, AIC, BIC, and Pearson’s X 2 and G 2 statistics are all derivable from the generalized entropy function. We design a dynamic programming algorithm that guarantees the best discretization based on the generalized entropy notion. Furthermore, we conducted an extensive performance evaluation of our method for several publicly available data sets. Our results show that our method delivers on the average 31% less classification errors than many previously known discretization methods.


Discretization Entropy Gini index MDLP Chi-square test G2 test 


  1. 1.
    Agresti A (1990) Categorical data analysis. Wiley, New YorkzbMATHGoogle Scholar
  2. 2.
    Auer P, Holte R, Maass W (1995) Theory and applications of agnostic pac-learning with small decision trees. In: Machine learning: proceedings of the twelth international conference. Morgan KaufmannGoogle Scholar
  3. 3.
    Bay SD (2001) Multivariate discretization for set mining. Knowl Inf Syst 3(4): 491–512zbMATHCrossRefGoogle Scholar
  4. 4.
    Breiman L, Friedman J, Olshen R, Stone C (1998) Classification and regression trees. CRC PressGoogle Scholar
  5. 5.
    Boulle M (2004) Khiops: a statistical discretization method of continuous attributes. Mach Learn 55: 53–69zbMATHCrossRefGoogle Scholar
  6. 6.
    Boulle M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1): 131–165CrossRefGoogle Scholar
  7. 7.
    Casella G, Berger RL (2001) Statistical inference, 2nd edn. Duxbury PressGoogle Scholar
  8. 8.
    Catlett J (1991) On changing continuous attributes into ordered discrete attributes. In: Proceedings of European working session on learning, pp 164–178Google Scholar
  9. 9.
    Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans Pattern Anal Mach Intell 17(7): 641–651CrossRefGoogle Scholar
  10. 10.
    Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15Google Scholar
  11. 11.
    Cover TM, Thomas JA (2006) Elements of information thoery, 2nd edn. Wiley, New YorkGoogle Scholar
  12. 12.
    Dougherty J, Kohavi R, Sahavi M (1995) Supervised and unsupervised discretization of continuous attributes. In: Proceedings of the 12th international conference on machine learning, pp 194–202Google Scholar
  13. 13.
    Elomaa T, Rousu J (2003) Necessary and sufficient pre-processing in numerical range discretization. Knowl Inf Syst 5(2): 162–182CrossRefGoogle Scholar
  14. 14.
    Elomaa T, Rousu J (2004) Efficient multisplitting revisited: optima-preserving elimination of partition candidates. Data Mining Knowl Discovery 8: 97–126CrossRefMathSciNetGoogle Scholar
  15. 15.
    Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th joint conference on artificial intelligence, pp 1022–1029Google Scholar
  16. 16.
    Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7(2): 219–269CrossRefGoogle Scholar
  17. 17.
    Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT PressGoogle Scholar
  18. 18.
    Hansen MH, Yu B (2001) Model selection and the principle of minimum description length. J Am Statist Assci 96: 454MathSciNetGoogle Scholar
  19. 19.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, HeidelbergzbMATHGoogle Scholar
  20. 20.
    Holte RC (1993) Very simple calssification rules perform well on most commonly used datasets. Mach Learn 11: 63–90zbMATHCrossRefGoogle Scholar
  21. 21.
    Johnson N, Kotz S, Balakrishnan N (1994) Continuous univariate distributions, 2nd edn. Wiley, New YorkzbMATHGoogle Scholar
  22. 22.
    Jin R, Breitbart Y (2007) Data discretization unification. Technical Report, Department of Computer Science, Kent State University.
  23. 23.
    Kerber R (1992) ChiMerge: discretization of numeric attributes. In: National conference on artificial intelligenceGoogle Scholar
  24. 24.
    Kurgan LA, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2): 145–153CrossRefGoogle Scholar
  25. 25.
    Kohavi R, Sahami M (1996) Error-based and entropy-based discretization of continuous features. In: Proceedings of the second international conference on knowledge discovery and data mining. Menlo Park. AAAI Press, pp 114–119Google Scholar
  26. 26.
    Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. Data Mining Knowl Discovery 6: 393–423CrossRefMathSciNetGoogle Scholar
  27. 27.
    Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE int’l conference on tools with artificial intelligenceGoogle Scholar
  28. 28.
    Liu X, Wang H (2005) A discretization algorithm based on a heterogeneity criterion. IEEE Trans Knowl Data Eng 17(9): 1166–1173CrossRefGoogle Scholar
  29. 29.
    Mussard S, Seyte F, Terraza M (2003) Decomposition of Gini and the generalized entropy inequality measures. Econ Bull 4(7): 1–6Google Scholar
  30. 30.
    Pfahringer B (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of 12th international conference on machine learning, pp 456–463Google Scholar
  31. 31.
    Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471zbMATHCrossRefGoogle Scholar
  32. 32.
    Simovici DA, Jaroszewicz S (2002) An axiomatization of partition entropy. IEEE Trans Inf Theory 48(7): 2138–2142zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Wallace DL (1959) Bounds on normal approximations to Student’s and the Chi-square distributions. Ann Mathe Stat 30(4): 1121–1130zbMATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Wallace DL (1960) Correction to “Bounds on Normal Approximations to Student’s and the Chi-Square Distributions”. Ann Math Statist 31(3): 810CrossRefGoogle Scholar
  35. 35.
    Wong AKC, Chiu DKY (1987) Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans Pattern Anal Mach Intell 9(6): 796–805CrossRefGoogle Scholar
  36. 36.
    Yang Y, Webb GI (2003) Weighted proportional k-interval discretization for naive–Bayes classifiers. In: Advances in knowledge discovery and data mining: 7th Pacific-Asia Conference, PAKDD, pp 501–512Google Scholar
  37. 37.
    UCI Machine Learning Repository (2007)
  38. 38.
    Weka 3 (2007) Data mining software in Java.

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  1. 1.Department of Computer ScienceKent State UniversityKentUSA

Personalised recommendations