Skip to main content
Log in

Classification trees with bivariate splits

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

We extend the recursive partitioning approach to classifier learning to use more complex types of split at each decision node. The new split types we permit are bivariate and can thus be interpreted visually in plots and tables. In order to find optimal splits of these new types, a new split criterion is introduced that allows the development of divide-and-conquer type algorithms. Two experiments are presented in which the bivariate trees—both with the Gini split criterion and with the new split criterion—are compared to a traditional tree-growing procedure. With the Gini criterion, the bivariate trees show a slight improvement in predictive accuracy and a considerable improvement in tree size over univariate trees. Under the new split criterion, accuracy is also improved, but there is no consistent improvement in tree size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sholom Weiss and Casimir Kulikowski,Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems Morgan Kaufmann: San Mateo, CA, 1990.

    Google Scholar 

  2. J.R. Quinlan, “Induction of decision trees,”Machine Learning vol. 1, no. 1, pp. 81–106, 1986.

    Article  Google Scholar 

  3. Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone,Classification and Regression Trees Wadsworth: Belmont, CA, 1984.

    Google Scholar 

  4. Robert Detrano, Cleveland heart disease data, Cardiology III-C V.A. Medical Center 5901 E. 7th Street, Long Beach, CA 90028. From the UCI Machine Learning repository.

  5. W.Y. Loh and N. Vanichsetakul, “Tree-structured classification via generalized discriminant analysis,”J. Am. Statist. Assoc. vol. 83, no. 403, pp. 715–725, 1988.

    Google Scholar 

  6. Paul E. Utgoff, “Perceptron trees: A case study in hybrid concept representation,” inProc. AAAI, 1988, pp. 601–606.

  7. Giula Pagallo, “Learning DNF by decision trees,” inEleventh Int. Joint Conf. Artif. Intell. vol. 1, pp. 639–644, 1989.

    Google Scholar 

  8. J.R. Quinlan, “Simplifying decision trees,” inProceedings of Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, 1986.

  9. Richard S. Forsyth, Bupa liver disorders, 8 Grosvenor Avenue, Mapperley Park, Nottingham NG3 5DX, 0602-621676, 1990. From the UCI Machine Learning repository.

  10. B. German, Glass data, Central Research Establishment, Home Office Forensic Science Service, Aldermaston, Reading, Berkshire RG7 4PN. From the UCI Machine Learning repository.

  11. Statlib, Liver disease diagnoses. From Carnegie Mellon University Statistics Library.

  12. M. Zwitter and M. Soklic, Lymphography data, University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. From the UCI Machine Learning repository.

  13. National Institute of Diabetes and Digestive and Kidney Diseases, Pima Indians diabetes data, from the UCI Machine Learning repository, 1990.

  14. Bojan Cestnik, Hapatitis data, Jozef Stefan Institute, Jamova 39, 61000 Ljubljana, Yugoslavia. From the UCI Machine Learning repository.

  15. Chiharu Sano, Japanese credit screening (examples and domain theory). From the UCI Machine Learning repository.

  16. Evlin Kinney, Echocardiogram data, The Reed Institute, P.O. Box 402603, Miami, FL 33140-0603. From the UCI Machine Learning repository.

  17. Mary McLeish and Matt Cecile, Horse colic database, Department of Computer Science, University of Guelph, Guelph, Ontario, Canada N1G 2W1, mdmcleish@water.waterloo.edu. From the UCI Machine Learning repository.

  18. Jason Catlett, Real-valued version of the multiplexor function. Private correspondence.

  19. David W. Aha, Tic-tac-toe endgame database. From the UCI Machine Learning repository, 1991.

  20. M. Forina, Wine recognition data. From the UCI Machine Learning repository.

  21. M. Hollander and D. Wolfe,Non-parametric Statistical Methods Wiley: New York, 1973.

    Google Scholar 

  22. David J. Lubinsky, “Bivariate splits and consistent split criteria in dichotomous classification trees,” Ph.D. thesis, Department of Computer Science, Rutgers University, New Brunswick, NJ, 1994.

    Google Scholar 

  23. John Bentley,Programming Pearls ACM, New York, 1980.

    Google Scholar 

  24. Robert Messenger and Lewis Mandell, “A model search technique for predictive nominal scale multivariate analysis,”J. Am. Statis. Assoc. vol. 67, pp. 768–772, 1972.

    Google Scholar 

  25. David J. Lubinsky, “The use of additive split criteria in speeding up classification trees,” inFourth Int. Workshop Artif. Intell. Statist., Fort Lauderdale, FL, 1993.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Much of this work was completed while the author was an employee of AT&T Bell Laboratories.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lubinsky, D. Classification trees with bivariate splits. Appl Intell 4, 283–296 (1994). https://doi.org/10.1007/BF00872094

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00872094

Key words

Navigation