Abstract
Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. Frequently, however, the constructed trees are complex with hundreds of nodes and thus difficult to comprehend, a fact that calls into question an often-cited benefit that decision trees are easy to interpret. In this paper, we address the problem of constructing “simple” decision trees with few nodes that are easy for humans to interpret. By permitting users to specify constraints on tree size or accuracy, and then building the “best” tree that satisfies the constraints, we ensure that the final tree is both easy to understand and has good accuracy. We develop novel branch-and-bound algorithms for pushing the constraints into the building phase of classifiers, and pruning early tree nodes that cannot possibly satisfy the constraints. Our experimental results with real-life and synthetic data sets demonstrate that significant performance speedups and reductions in the number of nodes expanded can be achieved as a result of incorporating knowledge of the constraints into the building step as opposed to applying the constraints after the entire tree is built.
Similar content being viewed by others
References
Agrawal, R., Ghosh, S.P., Imielinski, T., Iyer, B.R., and Swami, A.N. 1992. An interval classifier for database mining applications. In Proceedings of the 18th International Conference on Very Large Data Bases, Vancouver, Canada, pp. 560–573.
Agrawal, R., Imielinski, T., and Swami, A. 1993. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925.
Almuallim, H. 1996. An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence, 83:346–362.
Bishop, C.M. 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press.
Bohanec, M. and Bratko, I. 1994.Trading accuracy for simplicity in decision trees.Machine Learning, 15:223–250.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Chapman and Hall.
Cheeseman, P., Kelly, J., Self, M. et al. 1988. AutoClass: A Bayesian classification system. In 5th Int'l Conf. on Machine Learning. Morgan Kaufman.
Fayyad, U. 1991. On the Induction of Decision Trees for Multiple Concept Learning. PhD Thesis, The University of Michigan, Ann arbor.
Fayyad, U. and Irani, K.B. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. of the 13th Int'l Joint Conference on Artificial Intelligence, pp. 1022–1027.
Fukuda, T., Morimoto, Y., and Morishita, S. 1996. Constructing efficient decision trees by using optimized numeric association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases, Bombay, India.
Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. 1999. BOAT—optimistic decision tree construction. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania.
Gehrke, J., Ramakrishnan, R., and Ganti, V. 1998. RainForest—A framework for fast decision tree construction of large datasets. In Proceedings of the 24th International Conference on Very Large Data Bases, New York, USA.
Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Morgan Kaufmann.
Hunt, E.B., Marin, J., and Stone, P.J. (Eds.). 1966. Experiments in Induction. Academic Press, New York.
Krichevsky, R. and Trofimov, V. 1981. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199–207.
Mehta, M., Agrawal, R., and Rissanen, J. 1996. SLIQ: A fast scalable classifier for data mining. In Proceedings of the Fifth International Conference on Extending Database Technology(EDBT'96), Avignon, France.
Mehta, M., Rissanen, J., and Agrawal, R. 1995. MDL-based decision tree pruning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal, Canada.
Mitchie, D., Spiegelhalter, D.J., and Taylor, C.C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood.
Murthy, S.K. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4):345–389.
Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1:81–106.
Quinlan, J.R. 1987. Simplifying decision trees. Journal of Man-Machine Studies,27:221–234.
Quinlan, J.R. and Rivest, R.L. 1989. Inferring decision trees using minimum description length principle. Information and Computation, 80(3):227–248.
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufman.
Rastogi, R. and Shim, K. 1998. PUBLIC: A decision tree classifier that integrates building and pruning. In Proceedings of the 24th International Conference on Very Large Data Bases, New York, USA, pp. 404–415.
Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
Rissanen, J. 1978. Modeling by shortest data description. Automatica, 14:465–471.
Rissanen, J. 1989. Stochastic Complexity in Statistical Inquiry. World Scientific Publ. Co.
Shafer, J., Agrawal, R., and Mehta, M. 1996. SPRINT:Ascalable parallel classifier for data mining. In Proceedings of the 22nd International Conference on Very Large Data Bases, Mumbai (Bombay), India.
Wallace, C.S. and Patrick, J.D. 1993. Coding decision trees. Machine Learning, 11:7–22.
Zihed, D.A., Rakotomalala, R., and Feschet, F. 1997.Optimal multiple intervals discretization of continuous attributes for supervised learning. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, California.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Garofalakis, M., Hyun, D., Rastogi, R. et al. Building Decision Trees with Constraints. Data Mining and Knowledge Discovery 7, 187–214 (2003). https://doi.org/10.1023/A:1022445500761
Issue Date:
DOI: https://doi.org/10.1023/A:1022445500761