Abstract
We describe the Minimum Description Length (MDL) based decision tree pruning. A subtree is considered unreliable and therefore is pruned if the description length of the classification of the corresponding subsets of training instances together with the description lengths of each path in the subtree is greater than the description length of the classification of the whole subset of training instances in the current node. We compare the performance of our simple, parameterless, and well-founded MDL method with some other methods on 18 datasets. The classification accuracy using the MDL pruning is comparable to other approaches and the decision trees are nearly optimally pruned which makes our method an attractive tool for obtaining a first approximation of the target decision tree during the knowledge discovery process.
Preview
Unable to display preview. Download preview PDF.
References
I. Bratko, I. Kononenko. Learning diagnostic rules from incomplete and noisy data. In: B. Phelps (ed.) Interactions in Artificial Intelligence and Statistical Methods, Technical Press.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Clasification and Regression Trees. Wadsworth International Group, 1984.
B. Cestnik. Estimating probabilities: A crucial task in machine learning. Proc. European Conference on Artificial Intelligences ECAI-90, Stochkholm, August 1990, pp. 147–149.
B. Cestnik and I. Bratko. On estimating probabilities in tree pruning. Proc. European Working Session on Learning, (Porto, March 1991), Y. Kodratoff (ed.), Springer Verlag. pp. 138–150.
B. Cestnik, I. Kononenko, and I. Bratko. ASSISTANT 86: A knowledge elicitation tool for sophisticated users. In: I. Bratko and N. Lavrac (eds.), Progress in Machine Learning. Wilmslow, England: Sigma Press.
F. Esposito, D. Malerba, and G. Semeraro. Simplifying decision trees by pruning and grafting: new results. Proc. Europ. Conf. on Machine Learning ECML-95 (N. Lavrac and S. Wrobel, eds.), Springer Verlag, pp. 287–290.
K. Kira and L. Rendell. A practical approach to feature selection. Proc. Intern. Conf. on Machine Learning ICML-92 (Aberdeen, July 1992) D. Sleeman & P. Edwards (eds.), Morgan Kaufmann, pp. 249–256.
I. Kononenko. On biases in estimating multivalued attributes. Proc. Int. Joint Conf. on Artificial Intelligence IJCAI-95, Montreal, August 20–25 1995, pp. 1034–1040.
I. Kononenko and I. Bratko. Information based evaluation criterion for classifier’s performance. Machine Learning 6: 67–80.
I. Kononenko, I. Bratko, E. Roskar. Experiments in automatic learning of medical diagnostic rules. International School for the Synthesis of Expert’s Knowledge Workshop ISSEK-84, Bled, Slovenia, August 1984.
I. Kononenko, E. Simec Induction of decision trees using Relief F. in: G. Della Riccia, R. Kruse, and R. Viertl (eds.). Mathematical and Statistical Methods in Artificial Intelligence, Springer Verlag.
M. Kovacic. Stochastic Inductive Logic Programming. Ph.D. Thesis, University of Ljubljana, March 1995, (available at: http://ai.fri.uni-lj.si/papers/index.html).
M. Li and P. Vitanyi. An introduction to Kolmogorov Complexity and its applications, Springer Verlag, 1993.
J. Mingers. An empirical comparison of selection measures for decision tree induction. Machine Learning, 4:227–243.
P.M. Murphy and D.W. Aha. UCI Repository of machine learning databases [Machine-readable data repository]. Irvine, CA: University of California, Department of Information and Computer Science.
T. Niblett and I. Bratko. Learning decision rules in noisy domains. Proc. Expert Systems 86, Brighton, UK, December 1986.
J.R. Quinlan. Semi-autonomous acquisition of pattern-based knowledge. Machine Intelligence 10 (J. Hayes, D. Michie, and J.H. Pao, eds.), Horwood & Wiley.
J.R. Quinlan. Simplifying decision trees. Int. J. of Man-Machine Studies, 27: 221–234.
J.R. Quinlan, C4.5 programs for machine learning, Morgan Kaufmann.
J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Trans. on Information Theory, 30(4): 629–636.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kononenko, I. (1998). The minimum description length based decision tree pruning. In: Lee, HY., Motoda, H. (eds) PRICAI’98: Topics in Artificial Intelligence. PRICAI 1998. Lecture Notes in Computer Science, vol 1531. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0095272
Download citation
DOI: https://doi.org/10.1007/BFb0095272
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65271-7
Online ISBN: 978-3-540-49461-4
eBook Packages: Springer Book Archive