Machine Learning

, Volume 19, Issue 1, pp 45–77 | Cite as

Multivariate Decision Trees

  • Carla E. Brodley
  • Paul E. Utgoff
Article

Abstract

Unlike a univariate decision tree, a multivariate decision tree is not restricted to splits of the instance space that are orthogonal to the features' axes. This article addresses several issues for constructing multivariate decision trees: representing a multivariate test, including symbolic and numeric features, learning the coefficients of a multivariate test, selecting the features to include in a test, and pruning of multivariate decision trees. We present several new methods for forming multivariate decision trees and compare them with several well-known methods. We compare the different methods across a variety of learning tasks, in order to assess each method's ability to find concise, accurate decision trees. The results demonstrate that some multivariate methods are in general more effective than others (in the context of our experimental assumptions). In addition, the experiments confirm that allowing multivariate tests generally improves the accuracy of the resulting decision tree over a univariate tree.

decision trees multivariate tests linear discriminant functions inductive learning 

References

  1. Bennett, K.P., & Mangasarian, O.L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23–34.Google Scholar
  2. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
  3. Brodley, C.E., & Utgoff, P.E. (1992). Multivariate versus univariate decision trees, (Coins Technical Report 92-8), Amherst, MA: University of Massachusetts, Department of Computer and Information Science.Google Scholar
  4. Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85.Google Scholar
  5. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disese. American Journal of Cardiology, 64, 304–310.Google Scholar
  6. Duda, R.O., & Fossum, H. (1966). Pattern classification by iteratively determined linear and piecewise linear discriminant functions. IEEE Transactions on Electronic Computers, EC-15, 220–232.Google Scholar
  7. Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley & Sons.Google Scholar
  8. Fayyad, U.M., & Irani, K.B. (1992a). On the handling of continuous-valued attribute in decision tree generation. Machine Learning, 8, 87–102.Google Scholar
  9. Fayyad, U.M., & Irani, K.B. (1992b). The attribute selection problem in decision tree generation. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 104–110). San Jose, CA: MIT Press.Google Scholar
  10. Frean, M. (1990). Small nets and short paths: Optimising neural computation. Doctoral dissertation, Center for Cognitive Science, University of Edinburgh.Google Scholar
  11. Gallant, S.I. (1986). Optimal linear discriminants. Proceedings of the International Conference on Pattern Recognition (pp. 849–852). IEEE Computer Society Press.Google Scholar
  12. Hampson, S.E., & Volper, D.J. (1986). Linear function neurons: Structure and training. Biological Cybernetics, 53, 203–217.Google Scholar
  13. Heath, D., Kasif, S., & Salzberg, S. (1993). Induction of oblique decision trees. The Thirteenth International Joint Conference on Artificial Intelligence, 1002–1007.Google Scholar
  14. Kittler, J. (1986). Feature selection and extraction. In Young & Fu (Eds.), Handbook of pattern recognition and image processing. New York: Academic Press.Google Scholar
  15. Mangasarian, O., Setiono, R., & Wolberg, W. (1990). Pattern recognition via linear programming: Theory and application to medical diagnosis. SIAM Workshop on Optimization.Google Scholar
  16. Matheus, C.J. (1990). Feature construction: An analytic framework and an application to decision trees. Doctoral dissertation, Department of Computer Science, University of Illinois, Urbana-Champaign, IL.Google Scholar
  17. Mingers, J. (1989a). An empirical comparison of selection measures for decision tree induction. Machine Learning, 3, 319–342.Google Scholar
  18. Mingers, J. (1989b). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.Google Scholar
  19. Murthy, S., Kasif, S., Salzberg, S., & Beigel, R. (1993). OCI: Randomized induction of oblique decision trees. Proceedings of the Eleventh National Conference on Artificial Intelligence, 322–327.Google Scholar
  20. Nilsson, N.J. (1965). Learning machines. New York: McGraw-Hill.Google Scholar
  21. Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 71–99.Google Scholar
  22. Pagallo, G.M. (1990). Adaptive decision tree algorithms for learning from examples. Doctoral dissertation, University of California at Santa Cruz.Google Scholar
  23. Quinlan, J.R. (1986a). Induction of decision trees. Machine Learning, 1, 81–106.Google Scholar
  24. Quinlan, J.R. (1986b). The effect of noise on concept learning. In Michalski, Carbonell & Mitchell (Eds.), Machine learning: An artificial intelligence approach. San Mateo, CA: Morgan Kaufmann.Google Scholar
  25. Quinlan, J.R. (1987). Simplifying decision trees. Internation Journal of Man-machine Studies, 27, 221–234.Google Scholar
  26. Quinlan, J.R. (1989). Unknown attribute values in induction. Proceedings of the Sixth International Workshop on Machine Learning (pp. 164–168). Ithaca, NY: Morgan Kaufmann.Google Scholar
  27. Safavian, S.R., & Langrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics, 21, 660–674.Google Scholar
  28. Sutton, R.S. (1988). NADALINE: A normalized adaptive linear element that learns efficiently, (GTE TR88-509.4), GTE Laboratories Incorporated.Google Scholar
  29. Sutton, R.S., & Matheus, C.J. (1991). Learning polynomial functions by feature construction. Machine Learning: Proceedings of the Eighth International Workshop (pp. 208–212). Evanston, IL: Morgan Kaufmann.Google Scholar
  30. Utgoff, P.E., & Brodley, C.E. (1990). An incremental method for finding multivariate splits for decision trees. Proceedings of the Seventh International Conference on Machine Learning (pp. 58–65). Austin, TX: Morgan Kaufmann.Google Scholar
  31. Utgoff, P.E., & Brodley, C.E. (1991). Linear machine decision trees, (COINS Technical Report 91-10), Amherst, MA: University of Massachusetts, Department of Computer and Information Science.Google Scholar
  32. Young, P. (1984). Recursive estimation and time-series analysis. New York: Springer-Verlag.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Carla E. Brodley
    • 1
  • Paul E. Utgoff
    • 2
  1. 1.School of Electrical EngineeringPurdue UniversityWest Lafayette
  2. 2.Department of Computer ScienceUniversity of MassachusettsAmherst

Personalised recommendations