Skip to main content
Log in

Building Decision Trees with Constraints

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. Frequently, however, the constructed trees are complex with hundreds of nodes and thus difficult to comprehend, a fact that calls into question an often-cited benefit that decision trees are easy to interpret. In this paper, we address the problem of constructing “simple” decision trees with few nodes that are easy for humans to interpret. By permitting users to specify constraints on tree size or accuracy, and then building the “best” tree that satisfies the constraints, we ensure that the final tree is both easy to understand and has good accuracy. We develop novel branch-and-bound algorithms for pushing the constraints into the building phase of classifiers, and pruning early tree nodes that cannot possibly satisfy the constraints. Our experimental results with real-life and synthetic data sets demonstrate that significant performance speedups and reductions in the number of nodes expanded can be achieved as a result of incorporating knowledge of the constraints into the building step as opposed to applying the constraints after the entire tree is built.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Ghosh, S.P., Imielinski, T., Iyer, B.R., and Swami, A.N. 1992. An interval classifier for database mining applications. In Proceedings of the 18th International Conference on Very Large Data Bases, Vancouver, Canada, pp. 560–573.

  • Agrawal, R., Imielinski, T., and Swami, A. 1993. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925.

    Google Scholar 

  • Almuallim, H. 1996. An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence, 83:346–362.

    Google Scholar 

  • Bishop, C.M. 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press.

    Google Scholar 

  • Bohanec, M. and Bratko, I. 1994.Trading accuracy for simplicity in decision trees.Machine Learning, 15:223–250.

    Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Chapman and Hall.

  • Cheeseman, P., Kelly, J., Self, M. et al. 1988. AutoClass: A Bayesian classification system. In 5th Int'l Conf. on Machine Learning. Morgan Kaufman.

  • Fayyad, U. 1991. On the Induction of Decision Trees for Multiple Concept Learning. PhD Thesis, The University of Michigan, Ann arbor.

  • Fayyad, U. and Irani, K.B. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. of the 13th Int'l Joint Conference on Artificial Intelligence, pp. 1022–1027.

  • Fukuda, T., Morimoto, Y., and Morishita, S. 1996. Constructing efficient decision trees by using optimized numeric association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases, Bombay, India.

  • Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. 1999. BOAT—optimistic decision tree construction. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania.

  • Gehrke, J., Ramakrishnan, R., and Ganti, V. 1998. RainForest—A framework for fast decision tree construction of large datasets. In Proceedings of the 24th International Conference on Very Large Data Bases, New York, USA.

  • Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Morgan Kaufmann.

  • Hunt, E.B., Marin, J., and Stone, P.J. (Eds.). 1966. Experiments in Induction. Academic Press, New York.

    Google Scholar 

  • Krichevsky, R. and Trofimov, V. 1981. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199–207.

    Google Scholar 

  • Mehta, M., Agrawal, R., and Rissanen, J. 1996. SLIQ: A fast scalable classifier for data mining. In Proceedings of the Fifth International Conference on Extending Database Technology(EDBT'96), Avignon, France.

  • Mehta, M., Rissanen, J., and Agrawal, R. 1995. MDL-based decision tree pruning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal, Canada.

  • Mitchie, D., Spiegelhalter, D.J., and Taylor, C.C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood.

  • Murthy, S.K. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4):345–389.

    Google Scholar 

  • Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1:81–106.

    Google Scholar 

  • Quinlan, J.R. 1987. Simplifying decision trees. Journal of Man-Machine Studies,27:221–234.

    Google Scholar 

  • Quinlan, J.R. and Rivest, R.L. 1989. Inferring decision trees using minimum description length principle. Information and Computation, 80(3):227–248.

    Google Scholar 

  • Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufman.

  • Rastogi, R. and Shim, K. 1998. PUBLIC: A decision tree classifier that integrates building and pruning. In Proceedings of the 24th International Conference on Very Large Data Bases, New York, USA, pp. 404–415.

  • Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.

    Google Scholar 

  • Rissanen, J. 1978. Modeling by shortest data description. Automatica, 14:465–471.

    Google Scholar 

  • Rissanen, J. 1989. Stochastic Complexity in Statistical Inquiry. World Scientific Publ. Co.

  • Shafer, J., Agrawal, R., and Mehta, M. 1996. SPRINT:Ascalable parallel classifier for data mining. In Proceedings of the 22nd International Conference on Very Large Data Bases, Mumbai (Bombay), India.

  • Wallace, C.S. and Patrick, J.D. 1993. Coding decision trees. Machine Learning, 11:7–22.

    Google Scholar 

  • Zihed, D.A., Rakotomalala, R., and Feschet, F. 1997.Optimal multiple intervals discretization of continuous attributes for supervised learning. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, California.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyuseok Shim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garofalakis, M., Hyun, D., Rastogi, R. et al. Building Decision Trees with Constraints. Data Mining and Knowledge Discovery 7, 187–214 (2003). https://doi.org/10.1023/A:1022445500761

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022445500761

Navigation