Learning from Data pp 375-385 | Cite as

# Robust Linear Discriminant Trees

Chapter

## Abstract

We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called *soft entropy* which is used to identify optimal coefficients for the linear discriminants, and second, a novel method for removing outliers called *iterative re-faltering* which boosts performance on many datasets. These two ideas are presented in the context of a single learning algorithm called DT-SEPIR, which is compared with the CART and OC1 algorithms.

## Keywords

Splitting Function Cardinality Measure Pruning Method Splitting Criterion Regularization Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Preview

Unable to display preview. Download preview PDF.

## References

- Bennett, K. P. & Mangasarian, O. L. (1992), Neural network training via linear programming,
*in*P. M. Pardalos, ed., “Advances in Optimization and Parallel Computing”, North Holland, Amsterdam, pp. 56–67.Google Scholar - Bichsel, M. & Seitz, P. (1989), “Minimum class entropy: A maximum information approach to layered networks”,
*Neural Networks***2**, 133–141.CrossRefGoogle Scholar - Brennan, L., Friedman, J., Olshen, R. & Stone, C. (1984),
*Classification and Regression Trees*, Chapman & Hall, New York.Google Scholar - Brent, R. P. (1991), “Fast training algorithms for neural networks”,
*IEEE Transactions on Neural Networks***2**(3), 346–354.MathSciNetCrossRefGoogle Scholar - Brodley, C. E. & Utgoff, P. E. (1995), “Multivariate decision trees”,
*Machine Learning***19**45–76.zbMATHGoogle Scholar - Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm”,
*Journal of the Royal Statistical Society B***39**1–38.MathSciNetzbMATHGoogle Scholar - Duda, R. & Hart, P. (1973),
*Pattern Classification and Scene Analysis*, Wiley.zbMATHGoogle Scholar - Fayyad, U. M.
*&*Irani, K. B. (1992), The attribute selection problem in decision tree generation,*in*“AAAI-92: Proceedings of the Tenth National Conference on Artificial Intelligence”, AAAI Press/MIT Press, pp. 104–110.Google Scholar - Friedman, J. H. (1977), “A recursive partitioning decision rule for nonparametric classification”,
*IEEE Transactions on Computers pp*. 404–408.Google Scholar - Guyon, I., Boser, B.
*&*Vapnik, V. (1993), Automatic capacity tuning of very large VC-dimension classifiers,*in*S. J. Hanson, J. Cowan & C. L. Giles, eds, “Advances in Neural Information Processing Systems”, Vol. 5, Morgan Kaufmann, pp. 147–154.Google Scholar - Hastie, T. J. & Tibshirani, R. J. (1990),
*Generalized Additive Models*, Chapman and Hall.zbMATHGoogle Scholar - Heath, D., Kasif, S. & Salzberg, S. (1993), Induction of oblique decision trees,
*in*R. Bajcsy, ed., “Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence”, Morgan Kaufmann.Google Scholar - Henrichon, Jr., E. G.
*&*Fu, K.-S. (1969), “A nonparametric partitioning procedure for pattern classification”,*IEEE Transactions on Computers***C-18 (7)**, 614–624.zbMATHCrossRefGoogle Scholar - Huber, P. J. (1977),
*Robust Statistical Procedures*, Society for Industrial and Applied Mathematics, Pittsburgh, PA.Google Scholar - John, G. H. (1994), Finding multivariate splits in decision trees using function optimization,
*in*“AAAI-94: Proceedings of the Twelfth National Conference on Artificial Intelligence”, AAAI Press/MIT Press, p. 1463.Google Scholar - John, G. H. (1995), Robust decision trees: Removing outliers in databases,
*in*“First International Conference on Knowledge Discovery and Data Mining (KDD-95)”, AAAI Press, Menlo Park, CA, pp. 174–179.Google Scholar - Jordan, M. I. & Jacobs, R. A. (1993), Supervised learning and divide-and-conquer: A statistical approach,
*in*P. Utgoff, ed., “Proceedings of the Tenth International Conference on Machine Learning”, Morgan Kaufmann.Google Scholar - Koutsougeras, C.
*&*Papachristou, C. A. (1988), Training of a neural network for pattern classification based on an entropy measure,*in*“IEEE International Conference on Neural Networks”, IEEE Press, pp. 247–254.CrossRefGoogle Scholar - Lin, Y. K. & Fu, K. S. (1983), “Automatic classification of cervical cells using a binary tree classifier”,
*Pattern Recognition***16**(1), 69–80.CrossRefGoogle Scholar - Loh, W.-Y. & Vanichsetakul, N. (1988), “Tree-structured classification via generalized discriminant analysis”,
*Journal of the American Statistical Association*83(403), 715–725.MathSciNetzbMATHCrossRefGoogle Scholar - Michie, D., Spiegelhalter, D. J.
*&*Taylor, C. C. (1994),*Machine Learning*,*Neural and Statistical Classification*,Prentice Hall.Google Scholar - Morgan, J. N.
*&*Messenger, R. C. (1973),*THAID: a sequential analysis program for the analysis of nominal scale dependent variables*, University of Michigan.zbMATHGoogle Scholar - Murphy, P. M. & Aha, D. W. (1994), “UCI repository of machine learning databases”, Available by anonymous ftp to ics.uci.edu in the pub/machine-learningdatabases directory.Google Scholar
- Murthy, S. K., Salzberg, S. & Kasif, S. (1993), “OC1”, Available by anonymous ftp in blaze.cs.jhu.edu:pub/ocl.Google Scholar
- Murthy, S., Kasif, S. & Salzberg, S. (1994), “A system for induction of oblique decision trees”,
*Journal of Artificial Intelligence Research***2**,1–32.zbMATHGoogle Scholar - Qing-Yun, S. & Fu, K. S. (1983), “A method for the design of binary-tree classifiers”,
*Pattern Recognition***16**(6), 593–603.zbMATHCrossRefGoogle Scholar - Quinlan, J. R. (1993), C4.5:
*Programs for Machine Learning*, Morgan Kaufmann.Google Scholar - Sahami, M. (1993), Learning non-linearly separable boolean functions with linear threshold unit trees and madaline-style networks,
*in*“AAAI-93: Proceedings of the Eleventh National Conference on Artificial Intelligence”, AAAI Press/MIT Press, pp. 335–341.Google Scholar - Sankar, A. & Mammon, R. J. (1991), Optimal pruning of neural tree networks for improved generalization,
*in*“IJCNN-91-SEATTLE: International Joint Conference on Neural Networks”, IEEE Press, Seattle, WA, pp. II: 219–224.CrossRefGoogle Scholar - Sethi, I. K. (1990), “Entropy nets: from decision trees to neural networks”,
*Proceedings of the IEEE***78**(10),1605–1613.CrossRefGoogle Scholar

## Copyright information

© Springer-Verlag New York, Inc. 1996