Data Mining and Knowledge Discovery

, Volume 6, Issue 4, pp 393–423 | Cite as

Discretization: An Enabling Technique

  • Huan Liu
  • Farhad Hussain
  • Chew Lim Tan
  • Manoranjan Dash


Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.

discretization continuous feature data mining classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bailey, T.L. and Elkan, C. 1993. Estimating the accuracy of learned concepts. In Proceedings of International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers, pp. 95–112.Google Scholar
  2. Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth International Group.Google Scholar
  3. Breiman, L. and Spector, P. 1992. Submodel selection and evaluation in regression the x-random case. International Statistical Review, 60(3):291–319.Google Scholar
  4. Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In Proc. Fifth European Working Session on Learning. Berlin: Springer-Verlag, pp. 164–177.Google Scholar
  5. Chan, C.-C., Batur, C., and Srinivasan, A. 1991. Determination of quantization intervals in rule based model for dynamic. In Proceedings of the IEEE Conference on Systems, Man, and Cybernetics. Charlottesvile, Virginia, pp. 1719–1723.Google Scholar
  6. Chiu, D.K.Y., Cheung, B., and Wong, A.K.C. 1990. Information synthesis based on hierarchical maximum entropy discretization. Journal of Experimental and Theoretical Artificial Intelligence, 2:117–129.Google Scholar
  7. Chmielewski, M.R. and Grzymala-Busse, J.W. 1994. Global discretization of continuous attributes as preprocessing for machine learning. In Third International Workshop on Rough Sets and Soft Computing, pp. 294–301.Google Scholar
  8. Chou, P. 1991. Optimal partitioning for classification and regression trees. IEEE Trans. Pattern Anal. Mach. Intell, 4:340–354.Google Scholar
  9. Cerquides, J. and Mantaras, R.L. 1997. Proposal and empirical comparison of a parallelizable distance-based discretization method. In KDD97: Third International Conference on Knowledge Discovery and Data Mining, pp. 139–142.Google Scholar
  10. Dougherty, J., Kohavi, R., and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proc. Twelfth International Conference on Machine Learning. Los Altos, CA: Morgan Kaufmann, pp. 194–202.Google Scholar
  11. Domingos, B. and Pazzani, M. 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Machine Learning: Proceedings of Thirteenth International Conference, L. Saitta (Ed.). Morgan Kaufmann Internationals, 105–112.Google Scholar
  12. Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382):316–330.Google Scholar
  13. Fayyad, U. and Irani, K. 1992. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87–102.Google Scholar
  14. Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. 1022–1027.Google Scholar
  15. Fayyad, U. and Irani, K. 1996. Discretizing continuous attributes while learning bayesian networks. In Proc. Thirteenth International Conference on Machine Learning. Morgan Kaufmann, pp. 157–165.Google Scholar
  16. Fulton, T., Kasif, S., and Salzberg, S. 1995. Efficient algorithms for finding multi-way splits for decision trees. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 244–251.Google Scholar
  17. Holte, R.C., Acker, L., and Porter, B.W. 1989. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, pp. 813–818.Google Scholar
  18. Holte, R.C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–90.Google Scholar
  19. Ho, K.M. and Scott, P.D. 1997. Zeta: A global method for discretization of continuous variables. In KDD97: 3rd International Conference of Knowledge Discovery and Data Mining. Newport Beach, CA, pp. 191–194.Google Scholar
  20. John, G., Kohavi, R., and Pfleger, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Machine Learning Conference. New Brunswick, NJ: Morgan Kaufmann, pp. 121–129.Google Scholar
  21. Kerber, R. 1992. Chimerge: Discretization of numeric attributes. In Proc. AAAI-92, Ninth National Confrerence Articial Intelligence. AAAI Press/The MIT Press, pp. 123–128.Google Scholar
  22. Kontkaren, P., Myllymaki, P., Silander, T., and Tirri, H. 1998. Bayda: Software for bayesian classification and feature selection. In 4th International Conference on Knowledge Discovery and Data Mining, pp. 254–258.Google Scholar
  23. Langley, P., Iba, W., and Thompson, K. 1992. An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 223–228.Google Scholar
  24. Langley, P. and Sage, S. 1994. Induction of selective bayesian classifiers. In Proceeding Conference on Uncertainty in AI. Morgan Kaufmann, pp. 255–261.Google Scholar
  25. Liu, H. and Setiono, R. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, November 5-8, 1995, J.F. Vassilopoulos (Ed.). Herndon, Virginia, IEEE Computer Society, pp. 388–391.Google Scholar
  26. Liu, H. and Setiono, R. 1997. Feature selection and discretization. IEEE Transactios on Knowledge and Data Engineering, 9:1–4.Google Scholar
  27. Maass, W. 1994. Efficient agnostic pac-learning with simple hypotheses. In Proc. Seventh Annual ACM Conference on Computational Learning Theory. New York, NY: ACM Press, pp. 67–75.Google Scholar
  28. Mantaras, R.L. 1991. A distance based attribute selection measure for decision tree induction. Machine Learning, 103–115.Google Scholar
  29. Merz, C.J. and Murphy, P.M. 1996. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
  30. Oates, T. and Jensen, D. 1999. Large datsets lead to overly complex models: An explanation and a solution. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). AAAI Press/The MIT Press, pp. 294–298.Google Scholar
  31. Pfahringer, B. 1995a. Compression-based discretization of continuous attributes. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 456–463.Google Scholar
  32. Pfahringer, B. 1995b. A new mdl measure for robust rule induction. In ECML95: European Conference on Machine Learning (Extended abstract), pp. 331–334.Google Scholar
  33. Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1:81–106.Google Scholar
  34. Quinlan, J.R. 1988. Decision trees and multi-valued attributes. Machine Intelligence 11: Logic and the Acquisition of Knowledge, pp. 305–318.Google Scholar
  35. Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  36. Quinlan, J.R. 1996. Improved use of continuous attributes in c.45. Artificial Intelligence Research, 4:77–90.Google Scholar
  37. Richeldi, M. and Rossotto, M. 1995. Class-driven statistical discretization of continuous attributes. In Proc. of European Conference on Machine Learning. Springer Verlag, pp. 335–338.Google Scholar
  38. Schaffer, C. 1994. A conservation law for generalization performance. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, pp. 259–265.Google Scholar
  39. Simon, H.A. 1981. The Sciences of the Artificial, 2nd edn. Cambridge, MA: MIT Press.Google Scholar
  40. Shannon, C. and Weaver, W. 1949. The Mathmatical Theory of Information. Urbana: University of Illinois Press.Google Scholar
  41. Thornton, C.J. 1992. Techniques of Computational Learning: An Introduction. Chapman and Hall.Google Scholar
  42. Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.Google Scholar
  43. Van de Merckt, T. 1990. Decision trees in numerical attribute spaces. Machine Learning, 1016–1021.Google Scholar
  44. Weiss, S.M. and Indurkhya, N. 1994. Decision tree pruning: Biased or optimal. In Proceedings of the Twelfth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 626–632.Google Scholar
  45. Wang, K. and Liu, B. 1998. Concurrent discretization of multiple attributes. In Pacific-Rim International Conference on AI, pp. 250–259.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Huan Liu
    • 1
  • Farhad Hussain
    • 1
  • Chew Lim Tan
    • 1
  • Manoranjan Dash
    • 1
  1. 1.School of ComputingNational University of SingaporeSingapore

Personalised recommendations