Robust Linear Discriminant Trees

  • George H. John
Part of the Lecture Notes in Statistics book series (LNS, volume 112)


We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the linear discriminants, and second, a novel method for removing outliers called iterative re-faltering which boosts performance on many datasets. These two ideas are presented in the context of a single learning algorithm called DT-SEPIR, which is compared with the CART and OC1 algorithms.


Splitting Function Cardinality Measure Pruning Method Splitting Criterion Regularization Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bennett, K. P. & Mangasarian, O. L. (1992), Neural network training via linear programming, in P. M. Pardalos, ed., “Advances in Optimization and Parallel Computing”, North Holland, Amsterdam, pp. 56–67.Google Scholar
  2. Bichsel, M. & Seitz, P. (1989), “Minimum class entropy: A maximum information approach to layered networks”, Neural Networks 2, 133–141.CrossRefGoogle Scholar
  3. Brennan, L., Friedman, J., Olshen, R. & Stone, C. (1984), Classification and Regression Trees, Chapman & Hall, New York.Google Scholar
  4. Brent, R. P. (1991), “Fast training algorithms for neural networks”, IEEE Transactions on Neural Networks 2(3), 346–354.MathSciNetCrossRefGoogle Scholar
  5. Brodley, C. E. & Utgoff, P. E. (1995), “Multivariate decision trees”, Machine Learning 19 45–76.zbMATHGoogle Scholar
  6. Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society B 39 1–38.MathSciNetzbMATHGoogle Scholar
  7. Duda, R. & Hart, P. (1973), Pattern Classification and Scene Analysis, Wiley.zbMATHGoogle Scholar
  8. Fayyad, U. M. & Irani, K. B. (1992), The attribute selection problem in decision tree generation, in “AAAI-92: Proceedings of the Tenth National Conference on Artificial Intelligence”, AAAI Press/MIT Press, pp. 104–110.Google Scholar
  9. Friedman, J. H. (1977), “A recursive partitioning decision rule for nonparametric classification”, IEEE Transactions on Computers pp. 404–408.Google Scholar
  10. Guyon, I., Boser, B. & Vapnik, V. (1993), Automatic capacity tuning of very large VC-dimension classifiers, in S. J. Hanson, J. Cowan & C. L. Giles, eds, “Advances in Neural Information Processing Systems”, Vol. 5, Morgan Kaufmann, pp. 147–154.Google Scholar
  11. Hastie, T. J. & Tibshirani, R. J. (1990), Generalized Additive Models, Chapman and Hall.zbMATHGoogle Scholar
  12. Heath, D., Kasif, S. & Salzberg, S. (1993), Induction of oblique decision trees, in R. Bajcsy, ed., “Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence”, Morgan Kaufmann.Google Scholar
  13. Henrichon, Jr., E. G. & Fu, K.-S. (1969), “A nonparametric partitioning procedure for pattern classification”, IEEE Transactions on Computers C-18 (7), 614–624.zbMATHCrossRefGoogle Scholar
  14. Huber, P. J. (1977), Robust Statistical Procedures, Society for Industrial and Applied Mathematics, Pittsburgh, PA.Google Scholar
  15. John, G. H. (1994), Finding multivariate splits in decision trees using function optimization, in “AAAI-94: Proceedings of the Twelfth National Conference on Artificial Intelligence”, AAAI Press/MIT Press, p. 1463.Google Scholar
  16. John, G. H. (1995), Robust decision trees: Removing outliers in databases, in “First International Conference on Knowledge Discovery and Data Mining (KDD-95)”, AAAI Press, Menlo Park, CA, pp. 174–179.Google Scholar
  17. Jordan, M. I. & Jacobs, R. A. (1993), Supervised learning and divide-and-conquer: A statistical approach, in P. Utgoff, ed., “Proceedings of the Tenth International Conference on Machine Learning”, Morgan Kaufmann.Google Scholar
  18. Koutsougeras, C. & Papachristou, C. A. (1988), Training of a neural network for pattern classification based on an entropy measure, in “IEEE International Conference on Neural Networks”, IEEE Press, pp. 247–254.CrossRefGoogle Scholar
  19. Lin, Y. K. & Fu, K. S. (1983), “Automatic classification of cervical cells using a binary tree classifier”, Pattern Recognition 16(1), 69–80.CrossRefGoogle Scholar
  20. Loh, W.-Y. & Vanichsetakul, N. (1988), “Tree-structured classification via generalized discriminant analysis”, Journal of the American Statistical Association 83(403), 715–725.MathSciNetzbMATHCrossRefGoogle Scholar
  21. Michie, D., Spiegelhalter, D. J. & Taylor, C. C. (1994), Machine Learning, Neural and Statistical Classification,Prentice Hall.Google Scholar
  22. Morgan, J. N. & Messenger, R. C. (1973), THAID: a sequential analysis program for the analysis of nominal scale dependent variables, University of Michigan.zbMATHGoogle Scholar
  23. Murphy, P. M. & Aha, D. W. (1994), “UCI repository of machine learning databases”, Available by anonymous ftp to in the pub/machine-learningdatabases directory.Google Scholar
  24. Murthy, S. K., Salzberg, S. & Kasif, S. (1993), “OC1”, Available by anonymous ftp in Scholar
  25. Murthy, S., Kasif, S. & Salzberg, S. (1994), “A system for induction of oblique decision trees”, Journal of Artificial Intelligence Research 2,1–32.zbMATHGoogle Scholar
  26. Qing-Yun, S. & Fu, K. S. (1983), “A method for the design of binary-tree classifiers”, Pattern Recognition 16(6), 593–603.zbMATHCrossRefGoogle Scholar
  27. Quinlan, J. R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106.Google Scholar
  28. Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann.Google Scholar
  29. Sahami, M. (1993), Learning non-linearly separable boolean functions with linear threshold unit trees and madaline-style networks, in “AAAI-93: Proceedings of the Eleventh National Conference on Artificial Intelligence”, AAAI Press/MIT Press, pp. 335–341.Google Scholar
  30. Sankar, A. & Mammon, R. J. (1991), Optimal pruning of neural tree networks for improved generalization, in “IJCNN-91-SEATTLE: International Joint Conference on Neural Networks”, IEEE Press, Seattle, WA, pp. II: 219–224.CrossRefGoogle Scholar
  31. Sethi, I. K. (1990), “Entropy nets: from decision trees to neural networks”, Proceedings of the IEEE 78(10),1605–1613.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • George H. John
    • 1
  1. 1.Computer Science DepartmentStanford UniversityUSA

Personalised recommendations