Skip to main content
Log in

Discretization: An Enabling Technique

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bailey, T.L. and Elkan, C. 1993. Estimating the accuracy of learned concepts. In Proceedings of International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers, pp. 95–112.

  • Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth International Group.

  • Breiman, L. and Spector, P. 1992. Submodel selection and evaluation in regression the x-random case. International Statistical Review, 60(3):291–319.

    Google Scholar 

  • Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In Proc. Fifth European Working Session on Learning. Berlin: Springer-Verlag, pp. 164–177.

    Google Scholar 

  • Chan, C.-C., Batur, C., and Srinivasan, A. 1991. Determination of quantization intervals in rule based model for dynamic. In Proceedings of the IEEE Conference on Systems, Man, and Cybernetics. Charlottesvile, Virginia, pp. 1719–1723.

  • Chiu, D.K.Y., Cheung, B., and Wong, A.K.C. 1990. Information synthesis based on hierarchical maximum entropy discretization. Journal of Experimental and Theoretical Artificial Intelligence, 2:117–129.

    Google Scholar 

  • Chmielewski, M.R. and Grzymala-Busse, J.W. 1994. Global discretization of continuous attributes as preprocessing for machine learning. In Third International Workshop on Rough Sets and Soft Computing, pp. 294–301.

  • Chou, P. 1991. Optimal partitioning for classification and regression trees. IEEE Trans. Pattern Anal. Mach. Intell, 4:340–354.

    Google Scholar 

  • Cerquides, J. and Mantaras, R.L. 1997. Proposal and empirical comparison of a parallelizable distance-based discretization method. In KDD97: Third International Conference on Knowledge Discovery and Data Mining, pp. 139–142.

  • Dougherty, J., Kohavi, R., and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proc. Twelfth International Conference on Machine Learning. Los Altos, CA: Morgan Kaufmann, pp. 194–202.

    Google Scholar 

  • Domingos, B. and Pazzani, M. 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Machine Learning: Proceedings of Thirteenth International Conference, L. Saitta (Ed.). Morgan Kaufmann Internationals, 105–112.

  • Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382):316–330.

    Google Scholar 

  • Fayyad, U. and Irani, K. 1992. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87–102.

    Google Scholar 

  • Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. 1022–1027.

    Google Scholar 

  • Fayyad, U. and Irani, K. 1996. Discretizing continuous attributes while learning bayesian networks. In Proc. Thirteenth International Conference on Machine Learning. Morgan Kaufmann, pp. 157–165.

  • Fulton, T., Kasif, S., and Salzberg, S. 1995. Efficient algorithms for finding multi-way splits for decision trees. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 244–251.

    Google Scholar 

  • Holte, R.C., Acker, L., and Porter, B.W. 1989. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, pp. 813–818.

    Google Scholar 

  • Holte, R.C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–90.

    Google Scholar 

  • Ho, K.M. and Scott, P.D. 1997. Zeta: A global method for discretization of continuous variables. In KDD97: 3rd International Conference of Knowledge Discovery and Data Mining. Newport Beach, CA, pp. 191–194.

  • John, G., Kohavi, R., and Pfleger, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Machine Learning Conference. New Brunswick, NJ: Morgan Kaufmann, pp. 121–129.

    Google Scholar 

  • Kerber, R. 1992. Chimerge: Discretization of numeric attributes. In Proc. AAAI-92, Ninth National Confrerence Articial Intelligence. AAAI Press/The MIT Press, pp. 123–128.

  • Kontkaren, P., Myllymaki, P., Silander, T., and Tirri, H. 1998. Bayda: Software for bayesian classification and feature selection. In 4th International Conference on Knowledge Discovery and Data Mining, pp. 254–258.

  • Langley, P., Iba, W., and Thompson, K. 1992. An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 223–228.

  • Langley, P. and Sage, S. 1994. Induction of selective bayesian classifiers. In Proceeding Conference on Uncertainty in AI. Morgan Kaufmann, pp. 255–261.

  • Liu, H. and Setiono, R. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, November 5-8, 1995, J.F. Vassilopoulos (Ed.). Herndon, Virginia, IEEE Computer Society, pp. 388–391.

    Google Scholar 

  • Liu, H. and Setiono, R. 1997. Feature selection and discretization. IEEE Transactios on Knowledge and Data Engineering, 9:1–4.

    Google Scholar 

  • Maass, W. 1994. Efficient agnostic pac-learning with simple hypotheses. In Proc. Seventh Annual ACM Conference on Computational Learning Theory. New York, NY: ACM Press, pp. 67–75.

    Google Scholar 

  • Mantaras, R.L. 1991. A distance based attribute selection measure for decision tree induction. Machine Learning, 103–115.

  • Merz, C.J. and Murphy, P.M. 1996. UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science.

    Google Scholar 

  • Oates, T. and Jensen, D. 1999. Large datsets lead to overly complex models: An explanation and a solution. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). AAAI Press/The MIT Press, pp. 294–298.

  • Pfahringer, B. 1995a. Compression-based discretization of continuous attributes. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 456–463.

    Google Scholar 

  • Pfahringer, B. 1995b. A new mdl measure for robust rule induction. In ECML95: European Conference on Machine Learning (Extended abstract), pp. 331–334.

  • Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1:81–106.

    Google Scholar 

  • Quinlan, J.R. 1988. Decision trees and multi-valued attributes. Machine Intelligence 11: Logic and the Acquisition of Knowledge, pp. 305–318.

  • Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J.R. 1996. Improved use of continuous attributes in c.45. Artificial Intelligence Research, 4:77–90.

    Google Scholar 

  • Richeldi, M. and Rossotto, M. 1995. Class-driven statistical discretization of continuous attributes. In Proc. of European Conference on Machine Learning. Springer Verlag, pp. 335–338.

  • Schaffer, C. 1994. A conservation law for generalization performance. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, pp. 259–265.

  • Simon, H.A. 1981. The Sciences of the Artificial, 2nd edn. Cambridge, MA: MIT Press.

    Google Scholar 

  • Shannon, C. and Weaver, W. 1949. The Mathmatical Theory of Information. Urbana: University of Illinois Press.

    Google Scholar 

  • Thornton, C.J. 1992. Techniques of Computational Learning: An Introduction. Chapman and Hall.

  • Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.

    Google Scholar 

  • Van de Merckt, T. 1990. Decision trees in numerical attribute spaces. Machine Learning, 1016–1021.

  • Weiss, S.M. and Indurkhya, N. 1994. Decision tree pruning: Biased or optimal. In Proceedings of the Twelfth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 626–632.

  • Wang, K. and Liu, B. 1998. Concurrent discretization of multiple attributes. In Pacific-Rim International Conference on AI, pp. 250–259.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Hussain, F., Tan, C.L. et al. Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6, 393–423 (2002). https://doi.org/10.1023/A:1016304305535

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1016304305535

Navigation