The problem of missing values in decision tree grafting

  • Geoffrey I. Webb
Scientific Track
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1502)

Abstract

Decision tree grafting adds nodes to inferred decision trees. Previous research has demonstrated that appropriate grafting techniques can improve predictive accuracy across a wide cross-selection of domains. However, previous decision tree grafting systems are demonstrated to have a serious deficiency for some data sets containing missing values. This problem arises due to the method for handling missing values employed by C4.5, in which the grafting systems have been embedded. This paper provides an explanation of and solution to the problem. Experimental evidence is presented of the efficacy of this solution.

Key words

Grafting Decision Tree Learning Missing Values 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ali, K., Brunk, C., & Pazzani, M. (1994). On learning multiple descriptions of a concept. In Proceedings of Tools with Artificial Intelligence, pp. 476–483 New Orleans, LA.Google Scholar
  2. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.MATHMathSciNetGoogle Scholar
  3. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth International, Belmont, Ca.MATHGoogle Scholar
  4. Dietterich, T. G., & Bakiri, G. (1994). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.Google Scholar
  5. Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of online learning and an application to boosting. In Proceedings of the Second European Conference on Machine Learning, pp. 23–37. Springer-Verlag.Google Scholar
  6. Kwok, S. W., & Carter, C. (1990). Multiple decision tress. In Shachter, R. D., Levitt, T. S., Kanal, L. N., & Lemmer, J. F. (Eds.), Uncertainty in Artificial Intelligence 4, pp. 327–335. North Holland, Amsterdam.Google Scholar
  7. Merz, C. J., & Murphy, P. M. (1998). UCI repository of machine learning databases. [Machine-readable data repository]. University of California, Department of Information and Computer Science, Irvine, CA.Google Scholar
  8. Niblett, T., & Bratko, I. (1986). Learning decision rules in noisy domains. In Bramer, M. A. (Ed.), Research and Development in Expert Systems III, pp. 25–34. Cambridge University Press, Cambridge.Google Scholar
  9. Nock, R., & Gascuel, O. (1995). On learning decision committees. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 413–420 Taho City, Ca. Morgan Kaufmann.Google Scholar
  10. Oliver, J. J., & Hand, D. J. (1995). On pruning and averaging decision trees. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 430–437. Taho City, Ca. Morgan Kaufmann.Google Scholar
  11. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.Google Scholar
  12. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.Google Scholar
  13. Webb, G. I. (1996). Further experimental evidence against the utility of Occam’s razor. Journal of Artificial Intelligence Research, 4, 397–417.MATHMathSciNetGoogle Scholar
  14. Webb, G. I. (1997). Decision tree grafting. In IJCAI-97: Fifteenth International Joint Conference on Artificial Intelligence, pp. 846–851 Nagoya, Japan. Morgan Kaufmann.Google Scholar
  15. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 1998

Authors and Affiliations

  • Geoffrey I. Webb
    • 1
  1. 1.School of Computing and MathematicsDeakin UniversityGeelongAustrlia

Personalised recommendations