Discretization: An Enabling Technique

Liu, Huan; Hussain, Farhad; Tan, Chew Lim; Dash, Manoranjan

doi:10.1023/A:1016304305535

Discretization: An Enabling Technique

Published: October 2002

Volume 6, pages 393–423, (2002)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Huan Liu¹,
Farhad Hussain¹,
Chew Lim Tan¹ &
…
Manoranjan Dash¹

2740 Accesses
623 Citations
3 Altmetric
Explore all metrics

Abstract

Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bailey, T.L. and Elkan, C. 1993. Estimating the accuracy of learned concepts. In Proceedings of International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers, pp. 95–112.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth International Group.
Breiman, L. and Spector, P. 1992. Submodel selection and evaluation in regression the x-random case. International Statistical Review, 60(3):291–319.
Google Scholar
Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In Proc. Fifth European Working Session on Learning. Berlin: Springer-Verlag, pp. 164–177.
Google Scholar
Chan, C.-C., Batur, C., and Srinivasan, A. 1991. Determination of quantization intervals in rule based model for dynamic. In Proceedings of the IEEE Conference on Systems, Man, and Cybernetics. Charlottesvile, Virginia, pp. 1719–1723.
Chiu, D.K.Y., Cheung, B., and Wong, A.K.C. 1990. Information synthesis based on hierarchical maximum entropy discretization. Journal of Experimental and Theoretical Artificial Intelligence, 2:117–129.
Google Scholar
Chmielewski, M.R. and Grzymala-Busse, J.W. 1994. Global discretization of continuous attributes as preprocessing for machine learning. In Third International Workshop on Rough Sets and Soft Computing, pp. 294–301.
Chou, P. 1991. Optimal partitioning for classification and regression trees. IEEE Trans. Pattern Anal. Mach. Intell, 4:340–354.
Google Scholar
Cerquides, J. and Mantaras, R.L. 1997. Proposal and empirical comparison of a parallelizable distance-based discretization method. In KDD97: Third International Conference on Knowledge Discovery and Data Mining, pp. 139–142.
Dougherty, J., Kohavi, R., and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proc. Twelfth International Conference on Machine Learning. Los Altos, CA: Morgan Kaufmann, pp. 194–202.
Google Scholar
Domingos, B. and Pazzani, M. 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Machine Learning: Proceedings of Thirteenth International Conference, L. Saitta (Ed.). Morgan Kaufmann Internationals, 105–112.
Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382):316–330.
Google Scholar
Fayyad, U. and Irani, K. 1992. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87–102.
Google Scholar
Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. 1022–1027.
Google Scholar
Fayyad, U. and Irani, K. 1996. Discretizing continuous attributes while learning bayesian networks. In Proc. Thirteenth International Conference on Machine Learning. Morgan Kaufmann, pp. 157–165.
Fulton, T., Kasif, S., and Salzberg, S. 1995. Efficient algorithms for finding multi-way splits for decision trees. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 244–251.
Google Scholar
Holte, R.C., Acker, L., and Porter, B.W. 1989. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, pp. 813–818.
Google Scholar
Holte, R.C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–90.
Google Scholar
Ho, K.M. and Scott, P.D. 1997. Zeta: A global method for discretization of continuous variables. In KDD97: 3rd International Conference of Knowledge Discovery and Data Mining. Newport Beach, CA, pp. 191–194.
John, G., Kohavi, R., and Pfleger, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Machine Learning Conference. New Brunswick, NJ: Morgan Kaufmann, pp. 121–129.
Google Scholar
Kerber, R. 1992. Chimerge: Discretization of numeric attributes. In Proc. AAAI-92, Ninth National Confrerence Articial Intelligence. AAAI Press/The MIT Press, pp. 123–128.
Kontkaren, P., Myllymaki, P., Silander, T., and Tirri, H. 1998. Bayda: Software for bayesian classification and feature selection. In 4th International Conference on Knowledge Discovery and Data Mining, pp. 254–258.
Langley, P., Iba, W., and Thompson, K. 1992. An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 223–228.
Langley, P. and Sage, S. 1994. Induction of selective bayesian classifiers. In Proceeding Conference on Uncertainty in AI. Morgan Kaufmann, pp. 255–261.
Liu, H. and Setiono, R. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, November 5-8, 1995, J.F. Vassilopoulos (Ed.). Herndon, Virginia, IEEE Computer Society, pp. 388–391.
Google Scholar
Liu, H. and Setiono, R. 1997. Feature selection and discretization. IEEE Transactios on Knowledge and Data Engineering, 9:1–4.
Google Scholar
Maass, W. 1994. Efficient agnostic pac-learning with simple hypotheses. In Proc. Seventh Annual ACM Conference on Computational Learning Theory. New York, NY: ACM Press, pp. 67–75.
Google Scholar
Mantaras, R.L. 1991. A distance based attribute selection measure for decision tree induction. Machine Learning, 103–115.
Merz, C.J. and Murphy, P.M. 1996. UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
Oates, T. and Jensen, D. 1999. Large datsets lead to overly complex models: An explanation and a solution. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). AAAI Press/The MIT Press, pp. 294–298.
Pfahringer, B. 1995a. Compression-based discretization of continuous attributes. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 456–463.
Google Scholar
Pfahringer, B. 1995b. A new mdl measure for robust rule induction. In ECML95: European Conference on Machine Learning (Extended abstract), pp. 331–334.
Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1:81–106.
Google Scholar
Quinlan, J.R. 1988. Decision trees and multi-valued attributes. Machine Intelligence 11: Logic and the Acquisition of Knowledge, pp. 305–318.
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J.R. 1996. Improved use of continuous attributes in c.45. Artificial Intelligence Research, 4:77–90.
Google Scholar
Richeldi, M. and Rossotto, M. 1995. Class-driven statistical discretization of continuous attributes. In Proc. of European Conference on Machine Learning. Springer Verlag, pp. 335–338.
Schaffer, C. 1994. A conservation law for generalization performance. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, pp. 259–265.
Simon, H.A. 1981. The Sciences of the Artificial, 2nd edn. Cambridge, MA: MIT Press.
Google Scholar
Shannon, C. and Weaver, W. 1949. The Mathmatical Theory of Information. Urbana: University of Illinois Press.
Google Scholar
Thornton, C.J. 1992. Techniques of Computational Learning: An Introduction. Chapman and Hall.
Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.
Google Scholar
Van de Merckt, T. 1990. Decision trees in numerical attribute spaces. Machine Learning, 1016–1021.
Weiss, S.M. and Indurkhya, N. 1994. Decision tree pruning: Biased or optimal. In Proceedings of the Twelfth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 626–632.
Wang, K. and Liu, B. 1998. Concurrent discretization of multiple attributes. In Pacific-Rim International Conference on AI, pp. 250–259.

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore
Huan Liu, Farhad Hussain, Chew Lim Tan & Manoranjan Dash

Authors

Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Chew Lim Tan
View author publications
You can also search for this author in PubMed Google Scholar
Manoranjan Dash
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Hussain, F., Tan, C.L. et al. Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6, 393–423 (2002). https://doi.org/10.1023/A:1016304305535

Download citation

Issue Date: October 2002
DOI: https://doi.org/10.1023/A:1016304305535

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discretization: An Enabling Technique

Abstract

Access this article

Similar content being viewed by others

Using discretization for extending the set of predictive features

Data Mining Methods and Applications

Decision Tree Induction Methods and Their Application to Big Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Discretization: An Enabling Technique

Abstract

Access this article

Similar content being viewed by others

Using discretization for extending the set of predictive features

Data Mining Methods and Applications

Decision Tree Induction Methods and Their Application to Big Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation