Machine Learning

, Volume 40, Issue 3, pp 203–228 | Cite as

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

  • Tjen-Sien Lim
  • Wei-Yin Loh
  • Yu-Shan Shih
Article

Abstract

Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classification accuracy, training time, and (in the case of trees) number of leaves. Classification accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called POLYCLSSS at the top, although it is not statistically significantly different from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is QUEST with linear splits, which ranks fourth and fifth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. POLYCLASS, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The QUEST and logistic regression algorithms are substantially faster. Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed. But C4.5 tends to produce trees with twice as many leaves as those from IND-CART and QUEST.

classification tree decision tree neural net statistical classifier 

References

  1. Agresti, A. (1990). Categorical Data Analysis. New York, NY: John Wiley & Sons.Google Scholar
  2. Aronis, J. M. & Provost, F. J. (1997). Increasing the efficiency of data mining algorithms with breadth-first marker propagation. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 119–122). Menlo Park, CA: AAAI Press.Google Scholar
  3. Auer, P., Holte, R. C., & Maass, W. (1995). Theory and applications of agnostic PAC-learning with small decision trees. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 21–29). San Francisco, CA: Morgan Kaufmann.Google Scholar
  4. Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language. Wadsworth.Google Scholar
  5. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. New York, NY: Oxford University Press.Google Scholar
  6. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1994). Classification and Regression Trees. New York, NY: Chapman and Hall.Google Scholar
  7. Breslow, L. A. & Aha, D. W. (1997). Simplifying decision trees: A survey. Knowledge Engineering Review, 12, 1–40.Google Scholar
  8. Brodley, C. E. & Utgoff, P. E. (1992). Multivariate versus univariate decision trees. Technical Report 92–8, Department of Computer Science, University of Massachusetts, Amherst, MA.Google Scholar
  9. Brodley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.Google Scholar
  10. Brown, D. E., Corruble, V., & Pittard, C. L. (1993). A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recognition, 26, 953–961.Google Scholar
  11. Bull, S. (1994). Analysis of attitudes toward workplace smoking restrictions. In N. Lange, L. Ryan, L. Billard, D. Brillinger, L. Conquest, & J. Greenhouse (Eds.), Case Studies in Biometry (pp. 249–271). New York, NY: John Wiley & Sons.Google Scholar
  12. Buntine, W. (1992). Learning classification trees. Statistics and Computing, 2, 63–73.Google Scholar
  13. Buntine, W. & Caruana, R. (1992). Introduction to IND Version 2.1 and Recursive Partitioning. NASA Ames Research Center, Moffet Field, CA.Google Scholar
  14. Clark, L. A. & Pregibon, D. (1993). Tree-based models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical Models in S (pp. 377–419). New York, NY: Chapman & Hall.Google Scholar
  15. Cohen, W.W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 115–123). San Francisco, CA: Morgan Kaufmann.Google Scholar
  16. Curram, S. P. & Mingers, J. (1994). Neural networks, decision tree induction and discriminant analysis: An empirical comparison. Journal of the Operational Research Society, 45, 440–450.Google Scholar
  17. Friedman, J. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141.Google Scholar
  18. Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.Google Scholar
  19. Hand, D. J. (1997). Construction and Assessment of Classification Rules. Chichester, England: John Wiley & Sons.Google Scholar
  20. Harrison, D. & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.Google Scholar
  21. Hastie, T., Buja, A., & Tibshirani, R. (1995). Penalized discriminant analysis. Annals of Statistics, 23, 73–102.Google Scholar
  22. Hastie, T. & Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58, 155–176.Google Scholar
  23. Hastie, T., Tibshirani, R., & Buja, A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association, 89, 1255–1270.Google Scholar
  24. Hollander, M. & Wolfe, D. A. (1994). Nonparametric Statistical Methods, (2nd ed.). New York, NY: John Wiley & Sons.Google Scholar
  25. Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.Google Scholar
  26. Johnson, R. A. & Wichern, D. W. (1992). Applied Multivariate Statistical Analysis, (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  27. Kohonen, T. (1995). Self-Organizing Maps. Heidelberg: Springer-Verlag.Google Scholar
  28. Kooperberg, C., Bose, S., & Stone, C. J. (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117–127.Google Scholar
  29. Lerman, C., Molyneaux, J.W., Pangemanan, S., & Iswarati. (1991). The determinants of contraceptive method and service point choice. In Secondary Analysis of the 1987 National Indonesia Contraceptive Prevalence Survey, Volume 1: Fertility and Family Planning. Honolulu, HI: East-West Population Institute.Google Scholar
  30. Loh, W.-Y. & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.Google Scholar
  31. Loh, W.-Y. & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83, 715–728.Google Scholar
  32. Mangasarian, O. L. & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. Siam News, 23, 1–18.Google Scholar
  33. Merz, C. J. & Murphy, P. M. (1996). UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine, CA (http://www.ics.uci.edu/~mlearn/MLRepository.html).Google Scholar
  34. Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (Eds.). (1994). Machine Learning, Neural and Statistical Classification, Ellis Horwood: London.Google Scholar
  35. Ruppert G. Miller, Jr. (1981). Simultaneous Statistical Inference, (2nd ed.). New York: Springer-Verlag.Google Scholar
  36. Müller, W. & Wysotzki, F. (1994). Automatic construction of decision trees for classification. Annals of Operations Research, 52, 231–247.Google Scholar
  37. Müller, W. & Wysotzki, F. (1997). The decision-tree algorithm CAL5 based on a statistical approach to its splitting algorithm. In G. Nakhaeizadeh & C. C. Taylor (Eds.), Machine Learning and Statistics: The Interface (pp. 45–65). New York, NY: John Wiley & Sons.Google Scholar
  38. Murthy, S. K., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–33.Google Scholar
  39. Neter, J., Wasserman, W., & Kutner, M. H. (1990). Applied Linear Statistical Models, (3rd ed.). Boston, MA: Irwin.Google Scholar
  40. Oates, T. & Jensen, D. (1997). The effects of training set size on decision tree complexity. In D. H. Fisher, Jr. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (pp. 254–262). San Francisco, CA: Morgan Kaufmann.Google Scholar
  41. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  42. Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.Google Scholar
  43. Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.Google Scholar
  44. Sarle, W. S. (1994). Neural networks and statistical models. In Proceedings of the Nineteenth Annual SAS Users Groups International Conference, Cary, NC (pp. 1538–1550). SAS Institute, Inc. (ftp://ftp.sas.com/pub/neural/neural1.ps).Google Scholar
  45. SAS Institute, Inc. SAS/STAT User's Guide, Version 6, (Vols. 1 & 2). SAS Institute, Inc., Cary, NC, 1990.Google Scholar
  46. Shavlik, J. W., Mooney, R. J., & Towell G. G. (1991). Symbolic and neural learning algorithms: an empirical comparison. Machine Learning, 6, 111–144.Google Scholar
  47. Venables, W. N. & Ripley, B. D. (1997). Modern Applied Statistics with S-Plus, (2nd ed.). NewYork, NY: Springer.Google Scholar
  48. Wolberg, W. H., Tanner, M. A., & Loh, W.-Y. (1988). Diagnostic schemes for fine needle aspirates of breast masses. Analytical and Quantitative Cytology and Histology, 10, 225–228.Google Scholar
  49. Wolberg, W. H., Tanner, M. A., & Loh, W.-Y. (1989). Fine needle aspiration for breast mass diagnosis. Archives of Surgery, 124, 814–818.Google Scholar
  50. Wolberg, W. H., Tanner, M. A., Loh, W.-Y., & Vanichsetakul, N. (1987). Statistical approach to fine needle aspiration diagnosis of breast masses. Acta Cytologica, 31, 737–741.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Tjen-Sien Lim
    • 1
  • Wei-Yin Loh
    • 2
  • Yu-Shan Shih
    • 3
  1. 1.Department of StatisticsUniversity of WisconsinMadisonUSA
  2. 2.Department of StatisticsUniversity of WisconsinMadisonUSA
  3. 3.Department of MathematicsNational Chung Cheng UniversityChiayiTaiwan, R.O.C.

Personalised recommendations