Data Mining and Knowledge Discovery

, Volume 2, Issue 4, pp 345–389 | Cite as

Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

  • Sreerama K. Murthy


Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art.

classification tree-structured classifiers data compaction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. AAAI-92: Proc. of the Tenth National Conf. on Artificial Intelligence, San Jose, CA, 12-16th, July 1992. AAAI Press / The MIT Press.Google Scholar
  2. AAAI-93: Proc. of the Eleventh National Conf. on Artificial Intelligence, Washington, DC, 11-15th, July 1993. AAAI Press / The MIT Press.Google Scholar
  3. AAAI-94: Proc. of the Twelfth National Conf. on Artificial Intelligence, volume 1, Seattle, WA, 31st July-4thAugust 1994. AAAI Press / The MIT Press.Google Scholar
  4. Aczel, J. and J. Daroczy. On measures of information and their characterizations. Academic Pub., New York, 1975.Google Scholar
  5. Aha, David W. and Richard L. Bankert. A comparitive evaluation of sequential feature selection algorithms. In (AI&Stats-95), pages 1-7.Google Scholar
  6. AI&Stats-93: Preliminary Papers of the Fourth Int. Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, 3rd-6th, January 1993. Society for AI and Statistics.Google Scholar
  7. AI&Stats-95: Preliminary Papers of the Fifth Int. Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, 4-7th, January 1995. Society for AI and Statistics.Google Scholar
  8. Aldrich, C, D.W. Moolman, F. S. Gouws, and G. P. J. Schmitz. Machine learning strategies for control of flotation plants. Control Eng. Practice, 5(2):263-269, February 1997.CrossRefGoogle Scholar
  9. Ali, Kamal M. and Michael J. Pazzani. On the link between error correlation and error reduction in decision tree ensembles. Technical Report ICS-TR-95-38, University of California, Irvine, Department of Information and Computer Science, September 1995.Google Scholar
  10. Almuallim, Hussein and Thomas G. Dietterich. Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69:279-305, 1994.CrossRefGoogle Scholar
  11. Argentiero, Peter, Roland Chin, and Paul Beaudet. An automated approach to the design of decision tree classifiers. IEEE Trans.on Pattern Analysis and Machine Intelligence, PAMI-4(1):51-57, January 1982.Google Scholar
  12. Atlas, Les, Ronald Cole, Yeshwant Muthuswamy, Alan Lipman, Jerome Connor, Dong Park, Muhammed El-Sharkawi, and Robert J Marks II. A performance comparison of trained multilayer perceptrons and trained classification trees. Proc. of the IEEE, 78(10):1614-1619, 1990.CrossRefGoogle Scholar
  13. Aytug, Haldun, Siddhartha Bhattacharya, Gary J Koehler, and Jane L Snowdon. A review of machine learning in scheduling. IEEE Trans. on Eng. Management, 41(2):165-171, May 1994.CrossRefGoogle Scholar
  14. Bahl, L., P.F. Brown, P.V. de Souza, and R. L. Mercer. A tree-based statistical language model for natural language speech recognition. IEEE Trans. on Accoustics, Speech and Signal Processing, 37(7):1001-1008, 1989.Google Scholar
  15. Baker, Eard and A. K. Jain. On feature ordering in practice and some finite sample effects. In Proc. of the Third Int. Joint Conf. on Pattern Recognition, pages 45-49, San Diego, CA, 1976.Google Scholar
  16. Baker, F.A., David L Verbyla, C. S. Hodges Jr., and E. W. Ross. Classification and regression tree analysis for assessing hazard of pine mortality caused by hetero basidion annosum. Plant Disease, 77(2):136, February 1993.Google Scholar
  17. Belson, W.A. Matching and prediction on the principle of biological classification. Applied Statistics, 8:65-75, 1959.Google Scholar
  18. Ben-Bassat, Moshe. Myopic policies in sequential classification. IEEE Trans. on Computing, 27(2):170-174, February 1978.Google Scholar
  19. Ben-Bassat, Moshe. Use of distance measures, information measures and error bounds on feature evaluation. In Krishnaiah and Kanal (Krishnaiah and Kanal, 1987), pages 773-791.Google Scholar
  20. Bennett, K.P. and O.L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23-34, 1992.Google Scholar
  21. Bennett, K.P. and O. L. Mangasarian. Multicategory discrimination via linear programming. Optimization Methods and Software, 3:29-39, 1994.Google Scholar
  22. Bennett, Kristin P. Decision tree construction via linear programming. In Proc. of the 4th Midwest Artificial Intelligence and Cognitive Science Society Conf., pages 97-101, 1992.Google Scholar
  23. Bennett, Kristin P. Global tree optimization: A non-greedy decision tree algorithm. In Proc. of Interface 94: The 26th Symposium on the Interface, Research Triangle, North Carolina, 1994.Google Scholar
  24. Blum, A. and R. Rivest. Training a 3-node neural network is NP-complete. In Proc. of the 1988 Workshop on Computational Learning Theory, pages 9-18, Boston, MA, 1988. Morgan Kaufmann.Google Scholar
  25. Bohanec, Marko and Ivan Bratko. Trading accuracy for simplicity in decision trees. Machine Learning, 15:223-250, 1994.CrossRefGoogle Scholar
  26. Bowser-Chao, Marko and Debra L Dzialo. Comparison of the use of binary decision trees and neural networks in top quark detection. Physical Review D: Particles and Fields, 47(5):1900, March 1993.Google Scholar
  27. Boyce, D., A. Farhi, and R. Weishedel. Optimal Subset Selection. Springer-Verlag, 1974.Google Scholar
  28. Bramanti-Gregor, Anna and Henry W. Davis. The statistical learning of accurate heuristics. In (IJCAI-93), pages 1079-1085. Editor: Ruzena Bajcsy.Google Scholar
  29. Brandman, Y., A. Orlitsky, and J. Hennessy. A spectral lower bound technique for the size of decision trees and two-level AND/OR circuits. IEEE Trans. on Comp., 39(2):282-286, February 1990.CrossRefGoogle Scholar
  30. Breiman, Leo. Bagging predictors. Technical report, Department of Statistics, Univ. of California, Berkeley, CA, 1994.Google Scholar
  31. Breiman, Leo, Jerome Friedman, Richard Olshen, and Charles Stone. Classification and Regression Trees. Wadsworth Int. Group, 1984.Google Scholar
  32. Brent, Richard P. Fast training algorithms for multilayer neural nets. IEEE Trans. on Neural Networks, 2(3):346-354, May 1991.CrossRefGoogle Scholar
  33. Breslow, Leonard A. and David W. Aha. Simplifying decision trees: A survey. Technical Report AIC-96-014, Navy Center for Applied Research in Artificial Intelligence, Naval Research Lab.,Washington DC 20375, 1996. breslow, Scholar
  34. Brodley, Carla E. Recursive Automatic Algorithm Selection for Inductive Learning. PhD thesis, Univ. of Massachusetts, Amherst, MA, 1994.Google Scholar
  35. Brodley, Carla E. and Paul E. Utgoff. Multivariate decision trees. Machine Learning, 19:45-77, 1995.CrossRefGoogle Scholar
  36. Brown, Donald E., Vincent Corruble, and Clarence Louis Pittard. A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recognition, 26(6):953-961, 1993.CrossRefGoogle Scholar
  37. Brown, Donald E. and Clarence Louis Pittard. Classification trees with optimal multivariate splits. In Proc. of the Int. Conf. on Systems, Man and Cybernetics, volume 3, pages 475-477, Le Touquet, France, 17-20th, October 1993. IEEE, New York.Google Scholar
  38. Bucy, R.S. and R.S. Diesposti. Decision tree design by simulated annealing. Mathematical Modieling and Numerical Analysis, 27(5):515-534, 1993. A RAIRO J.Google Scholar
  39. Buluswer, Shashi D. and Bruce A. Draper. Non-parametric classification of pixels under varying illumination. SPIE: The Int. Society for Optical Eng., 2353:529-536, November 1994.Google Scholar
  40. Buntine, W and T. Niblett. A further comparison of splitting rules for decision-tree induction. Machine Learning, 8:75-85, 1992.CrossRefGoogle Scholar
  41. Buntine, W.L. Decision tree induction systems: a Bayesian analysis. In L. N. Kanal, T. S. Levitt, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 3. Elsevier Science Publishers, Amsterdam, 1989.Google Scholar
  42. Buntine, Wray. A theory of learning classification rules. PhD thesis, Univ. of Technology, Sydney, Australia, 1991.Google Scholar
  43. Buntine, Wray. Learning classification trees. Statistics and Computing, 2:63-73, 1992.Google Scholar
  44. Buntine, Wray. A guide to the literature on learning probabilistic networks from data. IEEE Trans. on Knowledge and Data Engineering, 1996.Google Scholar
  45. Callahan, Janice D. and Stephen W. Sorensen. Rule induction for group decisions with statistical data-an example. J. of the Operational Research Society, 42(3):227-234, March 1991.Google Scholar
  46. Caruana, Rich and Dayne Freitag. Greedy attribute selection. In (ML-94), pages 28-36. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  47. Casey, Richard G. and George Nagy. Decision tree design using a probabilistic model. IEEE Trans. on Information Theory, IT-30(1):93-99, January 1984.CrossRefGoogle Scholar
  48. Catlett, Jason. Megainduction. PhD thesis, Basser Department of Computer Science, Univ. of Sydney, Australia, 1991.Google Scholar
  49. Catlett, Jason. Tailoring rulesets to misclassification costs. In (AI&Stats-95), pages 88-94.Google Scholar
  50. Chai, Bing-Bing, Xinhua Zhuang, Yunxin Zhao, and Jack Sklansky. Binary linear decision tree with genetic algorithm. In Proc. of the 13th Int. Conf. on Pattern Recognition 4. IEEE Computer Society Press, Los Alamitos, CA, 1996.Google Scholar
  51. Chandrasekaran, B. From numbers to symbols to knowledge structures: Pattern Recognition and Artificial Intelligence perspectives on the classification task. volume 2, pages 547-559. Elsevier Science, Amsterdam, The Netherlands, 1986.Google Scholar
  52. Chandrasekaran, B and A. K. Jain. Quantization complexity and independent measurements. IEEE Trans. on Comp., C-23(1):102-106, January 1974.Google Scholar
  53. Chaudhuri, P., W.D. Lo, W.Y. Loh, and C. C. Yang. Generalized regression trees. Statistica Sinica, 5(2):641-666, 1995.Google Scholar
  54. Chou, Philip A. Applications of Information Theory to Pattern Recognition and the Design of Decision Trees and Trellises. PhD thesis, Stanford Univ., 1988.Google Scholar
  55. Chou, Philip A. Optimal partitioning for classification and regression trees. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(4):340-354, April 1991.CrossRefGoogle Scholar
  56. Chou, Philip A. and Robert M. Gray. On decision trees for pattern recognition. In Proc. of the IEEE Symposium on Information Theory, page 69, Ann Arbor, MI, 1986.Google Scholar
  57. Cios, Krzysztof J. and Ning Liu. A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm. IEEE Trans. on Neural Networks, 3(2):280-291, March 1992.CrossRefGoogle Scholar
  58. Cleote, I. and H. Theron. CID3: An extension of ID3 for attributes with ordered domains. South African Computer J., 4:10-16, March 1991.Google Scholar
  59. Cockett, J.R.B. and J.A. Herrera. Decision tree reduction. J. of the ACM, 37(4):815-842, October 1990.CrossRefGoogle Scholar
  60. Cohen, W.W. Efficient pruning methods for separate-and-conquer rule learning systems. In (IJCAI-93), pages 988-994. Editor: Ruzena Bajcsy.Google Scholar
  61. Comer, Douglas and Ravi Sethi. The complexity of trie index construction. J. of the ACM, 24(3):428-440, July 1977.CrossRefGoogle Scholar
  62. Cover, T.M. and J.M. Van Campenhout. On the possible orderings in the measurement selection problems. IEEE Trans. on Systems, Man and Cybernetics, SMC-7(9), 1977.Google Scholar
  63. Cox, Louis Anthony. Using causal knowledge to learn more useful decision rules from data. In (AI&Stats-95), pages 151-160.Google Scholar
  64. Cox, Louis Anthony and Yuping Qiu. Minimizing the expected costs of classifying patterns by sequential costly inspections. In AI&Statistics-93 (AI&Stats-93).Google Scholar
  65. Cox, Louis Anthony, Yuping Qiu, and Warren Kuehner. Heuristic least-cost computation of discrete classification functions with uncertain argument values. Annals of Operations Research, 21(1):1-30, 1989.Google Scholar
  66. Craven, Mark W. Extracting comprehensible models from trained neural networks. Technical Report CS-TR-96-1326, University of Wisconsin, Madison, September 1996.Google Scholar
  67. Crawford, Stuart L. Extensions to the CART algorithm. Int. J. of Man-Machine Studies, 31(2):197-217, August 1989.CrossRefGoogle Scholar
  68. Curram, Stephen P. and John Mingers. Neural networks, decision tree induction and discriminant analysis: An empirical comparison. J. of the Operational Research Society, 45(4):440-450, April 1994.Google Scholar
  69. Dago, K.T., R. Luthringer, R. Lengelle, G. Rinaudo, and J. P. Matcher. Statistical decision tree: A tool for studying pharmaco-EEG effects of CNS-active drugs. Neuropsychobiology, 29(2):91-96, 1994.Google Scholar
  70. {ie378-01}Alché-Buc, Florence, Didier Zwierski, and Jean-Pierre Nadal. Trio learning: A new strategy for building hybrid neural trees. Int. J. of Neural Systems, 5(4):259-274, December 1994.Google Scholar
  71. Das, S.K. and S. Bhambri. A decision tree approach for selecting between demand based, reorder and JIT/kanban methods for material procurement. Production Planning and Control, 5(4):342, 1994.Google Scholar
  72. Dasarathy, Belur V., editor. Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alamitos, CA, 1991.Google Scholar
  73. Dasarathy, Belur V. Minimal consistent set (MCS) identification for optimal nearest neighbor systems design. IEEE Trans. on systems, man and cybernetics, 24(3):511-517, 1994.Google Scholar
  74. Dattatreya, G. R. and Laveen N. Kanal. Decision trees in pattern recognition. In Kanal and Rosenfeld, editors, Progress in Pattern Recognition, volume 2, pages 189-239. Elsevier Science, 1985.Google Scholar
  75. Dattatreya, G. R. and V. V. S. Sarma. Bayesian and decision tree approaches to pattern recognition including feature measurement costs. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-3(3):293-298, 1981.Google Scholar
  76. Dietterich, Thomas G., Hermann Hild, and Ghulum Bakiri. A comparison of ID3 and backpropagation for english text-to-speech mapping. Machine Learning, 18:51-80, 1995.CrossRefGoogle Scholar
  77. Dietterich, Thomas G. and Eun Bae Kong. Machine learning bias, statistical bias and statistical variance of decision tree algorithms. In (ML-95).Google Scholar
  78. Dietterich, Thomas G. and Ryszard S. Michalski. A comparitive view of selected methods for learning from examples. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning, an Artificial Intelligence Approach, volume 1, pages 41-81. Morgan Kaufmann, San Mateo, CA, 1983.Google Scholar
  79. Doak, Justin. An evaluation of search algorithms for feature selection. Technical report, Graduate Group in Computer Science, Univ. of California at Davis; andSafeguards Systems Group, Los Alamos National Lab., January 1994.Google Scholar
  80. Dowe, D. L. and N. Krusel. Decision tree models of bushfire activity. AI Applications, 8(3):71-72, 1994.Google Scholar
  81. Draper, B. A., Carla E. Brodley, and Paul E. Utgoff. Goal-directed classification using linear machine decision trees. IEEE Trans. on Pattern Analysis and Machine Intelligence, 16(9):888, 1994.CrossRefGoogle Scholar
  82. Draper, N. R. and H. Smith. Applied Regression Analysis. Wiley, New York, 1966. 2nd edition in 1981.Google Scholar
  83. Duda, R. and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.Google Scholar
  84. Eades and Staples. On optimal trees. Journal of Algorithms, 2(4):369-384, 1981.Google Scholar
  85. Efron, Bradley. Estimating the error rate of a prediction rule: improvements on cross-validation. J. of American Statistical Association, 78(382):316-331, June 1983.Google Scholar
  86. Elder, IV, John F. Heuristic search for model structure. In (AI&Stats-95), pages 199-210.Google Scholar
  87. Elomaa, Tapio. In defence of C4.5: Notes on learning one-level decision trees. In (ML-94), pages 62-69. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  88. Ercil, A. Classification trees prove useful in nondestructive testing of spotweld quality. Welding J., 72(9):59, September 1993. Issue Title: Special emphasis: Rebuilding America's roads, railways and bridges.Google Scholar
  89. Esposito, Floriana, Donato Malerba, and Giovanni Semeraro. A further study of pruning methods in decision tree induction. In (AI&Stats-95), pages 211-218.Google Scholar
  90. Evans, Bob and Doug Fisher. Overcoming process delays with decision tree induction. IEEE Expert, pages 60-66, February 1994.Google Scholar
  91. Everitt, Brian. Cluster Analysis-3rd Edition. E. Arnold Press, London., 1993.Google Scholar
  92. Falconer, Judith A., Bruce J. Naughton, Dorothy D. Dunlop, Elliot J. Roth, and Dale C. Strasser. Predicting stroke inpatient rehabilitation outcome using a classification tree approach. Archives of Physical Medicine and Rehabilitation, 75(6):619, June 1994.CrossRefGoogle Scholar
  93. Famili, A. Use of decision tree induction for process optimization and knowledge refinement of an industrial process. Artificial Intelligence for Eng. Design, Analysis and Manufacturing (AI EDAM), 8(1):63-75, Winter 1994.Google Scholar
  94. Fano, R. M. Transmission of Information. MIT Press, Cambridge, MA, 1961.Google Scholar
  95. Fayyad, Usama M. and Keki B. Irani. What should be minimized in a decision tree? In AAAI-90: Proc. of the National Conf. on Artificial Intelligence, volume 2, pages 749-754. AAAI, 1990.Google Scholar
  96. Fayyad, Usama M. and Keki B. Irani. (1992a). The attribute specification problem in decision tree generation. In (AAAI-92), pages 104-110.Google Scholar
  97. Fayyad, Usama M.and Keki B. Irani. (1992b). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(2):87-102, 1992.CrossRefGoogle Scholar
  98. Fayyad, Usama M. and Keki B. Irani. Multi-interval discretization of continuous valued attributes for classification learning. In (IJCAI-93), pages 1022-1027. Editor: Ruzena Bajcsy.Google Scholar
  99. Feigenbaum, Edward A. Expert systems in the 1980s. In A. Bond, editor, State of the Art in Machine Intelligence. Pergamon-Infotech, Maidenhead, 1981.Google Scholar
  100. Feng, C., A. Sutherland, R. King, S. Muggleton, and R. Henery. Comparison of machine learning classifiers to statistics and neural networks. In (AI&Stats-93), pages 41-52.Google Scholar
  101. Fielding, A. Binary segmentation: the automatic interaction detector and related techniques for exploring data structure. In (O'Muircheartaigh and Payne, 1977), pages 221-257.Google Scholar
  102. File, P. E., P. I. Dugard, and A. S. Houston. Evaluation of the use of induction in the develeopment of a medical expert system. Comp. and Biomedical Research, 27(5):383-395, October 1994.CrossRefGoogle Scholar
  103. Fisher, Douglas. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:130-172, 1987.CrossRefGoogle Scholar
  104. Fisher, Douglas and Kathleen McKusick. An empirical comparison of ID3 and back propagation. In (IJCAI-89). Editor: N. S. Sridharan.Google Scholar
  105. Fletcher, R. and M. J. D. Powell. A rapidly convergent descent method for minimization. Computer J., 6(ISS.2):163-168, 1963.Google Scholar
  106. Foley, D. H.. Considerations of sample and feature size. IEEE Trans. on Information Theory, IT-18:618-626, 1972.CrossRefGoogle Scholar
  107. Forouraghi, F., L.W. Schmerr, and G. M. Prabhu. Induction of multivariate regression trees for design optimization. In (AAAI-94), pages 607-612.Google Scholar
  108. Foroutan, Iman. Feature Selection for Piecewise Linear Classifiers. PhD thesis, Univ. of California, Irvine, CA, 1985.Google Scholar
  109. Foroutan, Iman and Jack Sklansky. Feature selection for automatic classification of non-Gaussian data. IEEE Trans. on Systems, Man and Cybernetics, 17(2):187-198, March/April 1987.Google Scholar
  110. Forsyth, Richard S., David D. Clarke, and Richard L. Wright. Overfitting revisited: an information-theoretic approach to simplifying discrimination trees. J. of Experimental and Theoretical Artificial Intelligence, 6(3):289-302, July-September 1994.Google Scholar
  111. Friedman, Jerome H. A recursive partitioning decision rule for nonparametric classifiers. IEEE Trans. on Comp., C-26:404-408, April 1977.Google Scholar
  112. Fukanaga, Keinosuke and R. A. Hayes. Effect of sample size in classifier design. IEEE Trans. on Pattern Analysis and Machine Intelligence, 11:873-885, 1989.CrossRefGoogle Scholar
  113. Fulton, Truxton K., Simon Kasif, and Steven Salzberg. An efficient algorithm for for finding multi-way splits for decision trees. In (ML-95).Google Scholar
  114. Furnkranz, J., J. Petrak, and R. Trappl. Knowledge discovery in international conflict databases. Applied Artificial Intelligence, 11:91-118, 1997.CrossRefGoogle Scholar
  115. Garey, Michael R. and Ronald L. Graham. Performance bounds on the splitting algorithm for binary testing. Acta Informatica, 3(Fasc. 4):347-355, 1974.CrossRefGoogle Scholar
  116. Gelfand, S. B. and C. S. Ravishankar. A tree-structured piecewise-linear adaptive filter. IEEE Trans. on Information Theory, 39(6):1907-1922, November 1993.CrossRefGoogle Scholar
  117. Gelfand, Saul B., C. S. Ravishankar, and Edward J. Delp. An iterative growing and pruning algorithm for classification tree design. IEEE Transaction on Pattern Analysis and Machine Intelligence, 13(2):163-174, February 1991.CrossRefGoogle Scholar
  118. Gelsema, Edzard S. and Laveen S. Kanal, editors. Pattern Recognition in Practice IV: Multiple paradigms, Comparative studies and hybrid systems, volume 16 of Machine Intelligence and Pattern Recognition. Series editors: Kanal, L. S. and Rozenfeld, A.Elsevier, 1994.Google Scholar
  119. Gennari, G.H., Pat Langley, and Douglas Fisher. Models of incremental concept formation. Artificial Intelligence, 40(1-3):11-62, September 1989.CrossRefGoogle Scholar
  120. Gersho, Allen and Robert M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Pub., 1991.Google Scholar
  121. Gibb, W.J., D. M. Auslander, and J. C. Griffin. Selection of myocardial electrogram features for use by implantable devices. IEEE Trans. on Biomedical Eng., 40(8):727-735, August 1993.CrossRefGoogle Scholar
  122. Gillo, M.W. MAID: A Honeywell 600 program for an automatised survey analysis. Behavioral Science, 17:251-252, 1972.Google Scholar
  123. Giplin, Elizabeth A., Richard A. Olshen, Kanu Chatterjee, John Kjekshus, Arthur J. Moss, Harmut Henning, Robert Engler, A. Robert Blacky, Howard Dittrich, and John Ross Jr. Predicting 1-year outcome following acute myocardial infarction. Comp. and biomedical research, 23(1):46-63, February 1990.CrossRefGoogle Scholar
  124. Gleser, Malcolm A. and Morris F. Collen. Towards automated medical decisions. Comp. and Biomedical Research, 5(2):180-189, April 1972.CrossRefGoogle Scholar
  125. Golea, M. and M. Marchand. A growth algorithm for neural network decision trees. EuroPhysics Letters, 12(3):205-210, June 1990.Google Scholar
  126. Goodman, Rodney M. and Padhraic J. Smyth. Decision tree design from a communication theory standpoint. IEEE Trans. on Information Theory, 34(5):979-994, September 1988.CrossRefGoogle Scholar
  127. Goodman, Rodney M. and Padhriac J. Smyth. Decision tree design using information theory. Knowledge Acquisition, 2:1-19, 1990.Google Scholar
  128. Goodrich, Michael T., Vincent Mirelli, Mark Orletsky, and Jeffery Salowe. Decision tree conctruction in fixed dimensions: Being global is hard but local greed is good. Technical Report TR-95-1, Johns Hopkins Univ., Department of Computer Science, Baltimore, MD 21218, May 1995.Google Scholar
  129. Gordon, L. and R. A. Olshen. Asymptotically efficient solutions to the classification problem. Annals of Statistics, 6(3):515-533, 1978.Google Scholar
  130. Gray, N. A. B. Capturing knowledge through top-down induction of decision trees. IEEE Expert, 5(3):41-50, June 1990.CrossRefGoogle Scholar
  131. Grewe, L. and A.C. Kak. Interactive learning of a multi-attribute hash table classifier for fast object recognition. Computer Vision and Image Understanding, 61(3):387-416, May 1995.CrossRefGoogle Scholar
  132. Guo, Heng and Saul B. Gelfand. Classification trees with neural network feature extraction. IEEE Trans. on Neural Networks., 3(6):923-933, November 1992.CrossRefGoogle Scholar
  133. Guo, Y. and K.J. Dooley. Distinguishing between mean, variance and autocorrelation changes in statistical quality control. Int. J. of Production Research, 33(2):497-510, February 1995.Google Scholar
  134. Guur-Ali, Ouzden and William A. Wallace. Induction of rules subject to a quality constraint: Probabilistic inductive learning. IEEE Trans. on Knowldge and Data Eng., 5(6):979-984, December 1993. Special Issue on Learning and Discovery in Knowledge-based databases.CrossRefGoogle Scholar
  135. Hampson, S.E. and D.J. Volper. Linear function neurons: Structure and training. Biological Cybernetics, 53(4):203-217, 1986.Google Scholar
  136. Hand, D.J. Discrimination and Classification. Wiley, Chichester, UK, 1981.Google Scholar
  137. Hanisch, W. Design and optimization of a hierarchical classifier. J. of new Generation Computer Systems, 3(2):159-173, 1990.Google Scholar
  138. Hansen, L. K. and P. Salomon. Neural network ensembles. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12(10):993-1001, 1990.CrossRefGoogle Scholar
  139. Hart, A. Experience in the use of an inductive system in knowledge eng. In M. Bramer, editor, Research and Development in Expert Systems. Cambridge Univ. Press, Cambridge, MA, 1984.Google Scholar
  140. Hartmann, Carlos R. P., Pramod K. Varshney, Kishan G. Mehrotra, and Carl L. Gerberich. Application of information theory to the construction of efficient decision trees. IEEE Trans. on Information Theory, IT-28(4):565-577, July 1982.CrossRefGoogle Scholar
  141. Haskell, R. E. and A. Noui-Mehidi. Design of hierarchical classifiers. In N. A. Sherwani, E. de Doncker, and J. A. Kapenga, editors, Computing in the 90's: The First Great Lakes Computer Science Conf. Proc., pages 118-124, Berlin, 1991. Springer-Verlag. Conf. held in Kalamazoo, MI on 18th-20th, October 1989.Google Scholar
  142. Hatziargyriou, N.D., G.C. Contaxis, and N.C. Sideris. A decision tree method for on-line steady state security assessment. IEEE Trans. on Power Systems, 9(2):1052, 1994.CrossRefGoogle Scholar
  143. Heath, D. A Geometric Framework for Machine Learning. PhD thesis, Johns Hopkins Univ., Baltimore, MD, 1992.Google Scholar
  144. Heath, D., S. Kasif, and S. Salzberg. (1993a). k-DT: A multi-tree learning method. In Proc. of the Second Int. Workshop on Multistrategy Learning, pages 138-149, Harpers Ferry, WV, 1993. George Mason Univ.Google Scholar
  145. Heath, D., S. Kasif, and S. Salzberg. (1993b). Learning oblique decision trees. In (IJCAI-93), pages 1002-007. Editor: Ruzena Bajcsy.Google Scholar
  146. Helmbold, D. P. and R. E. Schapire. Predicting nearly as well as the best pruning of a decision tree. Machine Learning, pages 51-68, 1997. Earlier version in COLT95.Google Scholar
  147. Henrichon Jr., Ernest G. and King-Sun Fu. A nonparametric partitioning procedure for pattern classification. IEEE Trans. on Comp., C-18(7):614-624, July 1969.Google Scholar
  148. Herman, Gabor T. and K.T. Daniel Yeung. On piecewise-linear classification. IEEE Trans. on PAMI, 14(7):782-786, July 1992.Google Scholar
  149. Hoeffgen, Klaus-U, Hans-U Simon, and Kevin S. Van Horn. Robust trainability of single neurons. J. of Computer System Sciences, 50(1):114-125, 1995.CrossRefGoogle Scholar
  150. Holte, R. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1):63-90, 1993.CrossRefGoogle Scholar
  151. Hughes, G.E. On the mean accuracy of statistical pattern recognition. IEEE Trans. on Information Theory, IT-14(1):55-63, January 1968.CrossRefGoogle Scholar
  152. Hunt, K.J. Classification by induction: Applications to modelling and control of non-linear dynamic systems. Intelligent Systems Eng., 2(4):231-245, Winter 1993.Google Scholar
  153. Hyafil, Laurent and Ronald L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15-17, 1976.CrossRefGoogle Scholar
  154. Ibaraki, Toshihide and Saburo Muroga. Adaptive linear classifiers by linear programming. Technical Report 284, Department of Computer Science, Univ. of Illinois, Urbana-Champaign, 1968.Google Scholar
  155. Ichino, M. and Jack Sklansky. Optimum feature selection by zero-one integer programming. IEEE Trans. on Systems, Man and Cybernetics, SMC-14:737-746, September/October 1984.Google Scholar
  156. Iikura, Y. and Y. Yasuoka. Utilization of a best linear discriminant function for designing the binary decision tree. Int. Journal of Remote Sensing, 12(1):55-67, January 1991.Google Scholar
  157. IJCAI-89: Proc. of the Eleventh Int. Joint Conf. on Artificial Intelligence. Morgan Kaufmann Pub. Inc., San Mateo, CA, 1989. Editor: N. S. Sridharan.Google Scholar
  158. IJCAI-93: Proc. of the Thirteenth Int. Joint Conf. on Artificial Intelligence, volume 2, Chambery, France, 28th August-3rd September 1993. Morgan Kaufmann Pub. Inc., San Mateo, CA. Editor: Ruzena Bajcsy.Google Scholar
  159. IJCAI-95: Proc. of the Fourteenth Int. Joint Conf. on Artificial Intelligence, Montreal, Canada, 16th-21st, August 1995. Morgan Kaufmann Pub. Inc., San Mateo, CA. Editor: Chris Mellish.Google Scholar
  160. Imam, I. F. and Ryszard S. Michalski. Should decision trees be learned from examples or from decision rules? In Methodologies for Intelligent Systems: 7th Int. Symposium. ISMIS'93, volume 689 of LNCS, pages 395-404. Springer-Verlag, Trondheim, Norway, June 1993.Google Scholar
  161. Irani, Keki B., Cheng Jie, Usama M. Fayyad, and Qian Zhaogang. Applying machine learning to semiconductor manufacturing. IEEE Expert, 8(1):41-47, February 1993.CrossRefGoogle Scholar
  162. Israel, P. and C. Koutsougeras. A hybrid electro-optical architecture for classification trees and associative memory mechanisms. Int. J. on Artificial Intelligence Tools (Architectures, Languages, Algorithms), 2(3):373-393, September 1993.Google Scholar
  163. Ittner, Andreas and Michael Schlosser. Non-linear decision trees-NDT. In Int. Conf. on Machine Learning. 1996.Google Scholar
  164. Jain, A. K. and B. Chandrasekaran. Dimensionality and sample size considerations in pattern recognition. In Krishnaiah and Kanal (Krishnaiah and Kanal, 1987), pages 835-855.Google Scholar
  165. John, George H. Robust linear discriminant trees. In (AI&Stats-95), pages 285-291.Google Scholar
  166. John, George H., Ron Kohavi, and Karl Pfleger. Irrelevant features and the subset selection problem. In (ML-94), pages 121-129. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  167. Jordan, Michael I. A statistical approach to decision tree modeling. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, pages 13-20, New Brunswick, New Jersey, 1994. ACM Press.Google Scholar
  168. Jordan, Michael I. and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181-214, 1994.Google Scholar
  169. Judmaier, J., P. Meyersbach, G. Weiss, H. Wachter, and G. Reibnegger. The role of Neopterin in assessing disease activity in Crohn's disease: Classification and regression trees. The American J. of Gastroenterology, 88(5):706, May 1993.Google Scholar
  170. Kalkanis, G. The application of confidence interval error analysis to the design of decision tree classifiers. Pattern Recognition Letters, 14(5):355-361, May 1993.CrossRefGoogle Scholar
  171. Kanal, Laveen N. Patterns in pattern recognition: 1968-1974. IEEE Trans. in Information Theory, 20:697-722, 1974.CrossRefGoogle Scholar
  172. Kanal, Laveen N. Problem solving methods and search strategies for pattern recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-1:193-201, 1979.Google Scholar
  173. Kanal, Laveen N. and B. Chandrasekaran. On dimensionality and sample size in statistical pattern classification. Pattern Recognition, 3:225-234, 1971.CrossRefGoogle Scholar
  174. Kass, G.V. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):119-127, 1980.Google Scholar
  175. Kearns, Michael. Boosting theory towards practice: Recent developments in decision tree induction and the weak learning framework. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 1337-1339, Menlo Park, 1996. AAAI Press / MIT Press.Google Scholar
  176. Kearns, Michael and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459-468, Philadelphia, Pennsylvania, 1996.Google Scholar
  177. Kennedy, Davis M. Decision tree bears fruit. Products Finishing, 57(10):66, July 1993.Google Scholar
  178. Kennefick, J.D., R. R. Carvalho, S. G. Djorgovski, M. M. Wilber, E. S. Dickson, N. Weir, U. Fayyad, and J. Roden. The discovery of five quasars at z >4 using the second Palomar Sky Survey. The Astronomical J., 110(1):78, 1995.CrossRefGoogle Scholar
  179. Kerber, Randy. Chimerge: Discretization of numeric attributes. In (AAAI-92), pages 123-128.Google Scholar
  180. Kim, Byungyong and David Landgrebe. Hierarchical decision tree classifiers in high-dimensional and large class data. IEEE Trans. on Geoscience and Remote Sensing, 29(4):518-528, July 1991.CrossRefGoogle Scholar
  181. Kim, Hyunsoo and G. J. Koehler. An investigation on the conditions of pruning an induced decision tree. European J. of Operational Research, 77(1):82, August 1994.CrossRefGoogle Scholar
  182. Kim, Sung-Ho. A general property among nested, pruned subtrees of a decision support tree. Communications in Statistics-Theory and Methods, 23(4):1227-1238, April 1994.Google Scholar
  183. Kira, Kenji and Larry A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In (AAAI-92), pages 129-134.Google Scholar
  184. Kodratoff, Y. and M. Manago. Generalization and noise. Int. J. of Man-Machine Studies, 27:181-204, 1987.Google Scholar
  185. Kodratoff, Y. and S. Moscatelli. Machine learning for object recognition and scene analysis. Internationa J. of Pattern recognition and AI, 8(1):259-304, 1994.Google Scholar
  186. Kohavi, Ron. Bottom-up induction of oblivious, read-once decision graphs: Strengths and limitations. In (AAAI-94).Google Scholar
  187. Kohavi, Ron. (1995a). The power of decision tables. In The European Conference on Machine Learning, 1995.Google Scholar
  188. Kohavi, Ron. (1995b). A study of cross-validation and bootstrap for accuracy estimation and model selection. In (IJCAI-95), pages 1137-1143. Editor: Chris Mellish.Google Scholar
  189. Kohavi, Ron. (1995c). Wrappers for performance enhancements and oblivious decision graphs. Ph.D. Thesis CS-TR-95-1560, Stanford University, Department of Computer Science, September 1995.Google Scholar
  190. Kokol, P., M. Mernik, J. Zavrsnik, and K. Kancler. Decision trees based on automatic learning and their use in cardiology. Journal of Medical Systems, 18(4):201, 1994.Google Scholar
  191. Kononenko, Igor. On biases in estimating multi-valued attributes. In (IJCAI-95), pages 1034-1040. Editor: Chris Mellish.Google Scholar
  192. Kononenko, Igor and Ivan Bratko. Information based evaluation criterion for classifier's performance. Machine Learning, 6(1):67-80, January 1991.CrossRefGoogle Scholar
  193. Kors, J. A. and J. H. Van Bemmel. Classification methods for computerized interpretation of the electrocardiogram. Methods of Information in Medicine, 29(4):330-336, September 1990.Google Scholar
  194. Kovalevsky, V. A. The problem of character recognition from the point of view of mathematical statistics. In V. A. Kovalevsky, editor, Character Readers and Pattern Recognition. Spartan, New York, 1968.Google Scholar
  195. Koza, J. R. Concept formation and decision tree induction using the genetic programming paradigm. In H. P. Schwefel and R. Männer, editors, Parallel Problem Solving from Nature-Proc. of 1st Workshop, PPSN 1, volume 496 of LNCS, pages 124-128, Dortmund, Germany, October 1991. Springer-Verlag, Berlin, Germany.Google Scholar
  196. Krishnaiah, Paruchuri Rama and Laveen N. Kanal, editors. Classification, Pattern Recognition and Reduction of Dimensionality, volume 2 of Handbook of Statistics. North-Holland Publishing Company, Amsterdam, 1987.Google Scholar
  197. Krishnamoorthy, Srinivasan and Douglas Fisher. Machine learning approaches to estimating software development effort. IEEE Trans. on Software Eng., 21(2):126-137, February 1995.CrossRefGoogle Scholar
  198. Kroger, M. Optimization of classification trees: strategy and algorithm improvement. Computer Physics Communications, 99(1):81-93, December 1996.CrossRefGoogle Scholar
  199. Kubat, M., G. Pfurtscheller, and D. Flotzinger. AI-based approach to automatic sleep classification. Biological Cybernetics, 70(5):443-448, 1994.CrossRefGoogle Scholar
  200. Kulkarni, Ashok K. On the mean accuracy of hierarchical classifiers. IEEE Trans. on Comp., C-27(8):771-776, August 1978.Google Scholar
  201. Kurtz, Michael J. Astronomical object classification. In E. S. Gelsema and Laveen N. Kanal, editors, Pattern Recognition and Artificial Intelligence, pages 317-328. Elsevier Science Pub., Amsterdam, 1988.Google Scholar
  202. Kurzynski, M.W. The optimal strategy of a tree classifier. Pattern Recognition, 16:81-87, 1983.CrossRefGoogle Scholar
  203. Kurzynski, M. W. On the multi-stage Bayes classifier. Pattern Recognition, 21(4):355-365, 1988.CrossRefGoogle Scholar
  204. Kurzynski, M. W. On the identity of optimal strategies for multi-stage classifiers. Pattern Recognition Letters, 10(1):39-46, July 1989.CrossRefGoogle Scholar
  205. Kwok, S. and Carter. C. Multiple decision trees. In R.D. Schachter, T.S. Levitt, L.N. Kanal, and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, volume 4, pages 327-335. Elsevier Science, Amsterdam, 1990.Google Scholar
  206. Lagacherie, P. and S. Holmes. Addressing geographical data errors in classification tree for soil unit prediction. Int. J. of Geographical Information Science, 11(2):183-198, March 1997.CrossRefGoogle Scholar
  207. Landeweerd, G., T. Timmers, E. Gersema, M. Bins, and M. Halic. Binary tree versus single level tree classification of white blood cells. Pattern Recognition, 16:571-577, 1983.CrossRefGoogle Scholar
  208. Langley, Pat and Stephanie Sage. Scaling to domains with irrelevant features. In Thomas Petsche amd Stephen Jose Hanson Russell Greiner, editor, Computational Learning Theory and Natural Learning Systems, volume vol-IV. MIT Press, 1997.Google Scholar
  209. Lee, Seong-Whan. Noisy Hangul character recognition with fuzzy tree classifier. Proc. of SPIE, 1661:127-136, 1992. Volume title: Machine vision applications in character recognition and industrial inspection. Conf. location: San Jose, CA. 10th-12th February, 1992.Google Scholar
  210. Lehnert, Wendy, Stephen Soderland, David Aronow, Fangfang Feng, and Avinoam Shmueli. Inductive text classification for medical applications. Journal of Experimental and Theoretical Artificial Intelligence, 7(1):49-80, January-March 1995.Google Scholar
  211. Lewis, P.M. The characteristic selection problem in recognition systems. IRE Trans. on Information Theory, IT-18:171-178, 1962.CrossRefGoogle Scholar
  212. Li, Xiaobo and Richard C. Dubes. Tree classifier design with a permutation statistic. Pattern Recognition, 19(3):229-235, 1986.CrossRefGoogle Scholar
  213. Lin, Jianhia and L.A. Storer. Design and performance of tree structured vector quantizers. Information Processing and Management, 30(6):851-862, 1994.CrossRefGoogle Scholar
  214. Lin, Jianhua, J. A. Storer, and M. Cohn. Optimal pruning for tree-structured vector quantizers. Information Processing and Management, 28(6):723-733, 1992.CrossRefGoogle Scholar
  215. Lin, Jyh-Han and J. S. Vitter. Nearly optimal vector quantization via linear programming. In J. A. Storer and M. Cohn, editors, DCC 92. Data Compression Conf., pages 22-31, Los Alamitos, CA, March 24th-27th 1992. IEEE Computer Society Press.Google Scholar
  216. Lin, Y. K. and King-Sun Fu. Automatic classification of cervical cells using a binary tree classifier. Pattern Recognition, 16(1):69-80, 1983.CrossRefGoogle Scholar
  217. Liu, W. Z. and A. P. White. The importance of attribute selection measures in decision tree induction. Machine Learning, 15:25-41, 1994.CrossRefGoogle Scholar
  218. Loh, Wei-Yin and Nunta Vanichsetakul. Tree-structured classification via generalized discriminant analysis. J. of the American Statistical Association, 83(403):715-728, September 1988.Google Scholar
  219. Long,William J., John L. Griffith, Harry P. Selker, and Ralph B. D'Agostino. A comparison of logistic regression to decision tree induction in a medical domain. Comp. and Biomedical Research, 26(1):74-97, February 1993.CrossRefGoogle Scholar
  220. Loveland, D.W. Performance bounds for binary testing with arbitrary weights. Acta Informatica, 22:101-114, 1985.CrossRefGoogle Scholar
  221. Lubinsky, David. Algorithmic speedups in growing classification trees by using an additive split criterion. In (AI&Stats-93), pages 435-444.Google Scholar
  222. Lubinsky, David. (1994a). Bivariate splits and consistent split criteria in dichotomous classification trees. PhD thesis, Department of Computer Science, Rutgers Univ., New Brunswick, NJ, 1994.Google Scholar
  223. Lubinsky, David. (1994b). Classification trees with bivariate splits. Applied Intelligence: The Int. J. of Artificial Intelligence, Neural Networks and Complex Problem-Solving Technologies, 4(3):283-296, July 1994.Google Scholar
  224. Lubinsky, David. Tree structured interpretable regression. In (AI&Stats-95), pages 331-340.Google Scholar
  225. Luo, Ren C., Ralph S. Scherp, and Mark Lanzo. Object identification using automated decision tree construction approach for robotics applications. J. of Robotic Systems, 4(3):423-433, June 1987.Google Scholar
  226. Lutsko, J. F. and B. Kuijpers. Simulated annealing in the construction of near-optimal decision trees. In (AI&Stats-93).Google Scholar
  227. Magerman, David M. Natural language parsing as statistical pattern recognition. Thesis CS-TR-94-1502, Stanford University, Department of Computer Science, February 1994.Google Scholar
  228. Mangasarian, Olvi. Mathematical programming in neural networks. ORSA J. on Computing, 5(4):349-360, Fall 1993.Google Scholar
  229. Mangasarian, Olvi L. Misclassification minimization, 1994. Unpublished manuscript.Google Scholar
  230. Mangasarian, Olvi L., R. Setiono, and W. Wolberg. Pattern recognition via linear programming: Theory and application to medical diagnosis. In SIAM Workshop on Optimization, 1990.Google Scholar
  231. Màntaras, López de. Technical note: A distance-based attribute selection measure for decision tree induction. Machine Learning, 6(1):81-92, 1991.CrossRefGoogle Scholar
  232. Martin, J. Kent. Evaluating and comparing classifiers: complexity measures. In (AI&Stats-95), pages 372-378.Google Scholar
  233. Martin, J. Kent. An exact probability metric for decision tree splitting and stopping. Machine Learning, 28:257-291, 1997.CrossRefGoogle Scholar
  234. Martin, J. Kent and Daniel S. Hirschberg. The time complexity of decision tree induction. Technical Report ICS-TR-95-27, University of California, Irvine, Department of Information and Computer Science, August 1995.Google Scholar
  235. McKenzie, Dean P. and Lee Hun Low. The construction of computerized classification systems using machine learning algorithms: An overview. Comp. in Human Behaviour, 8(2/3):155-167, 1992.CrossRefGoogle Scholar
  236. McKenzie, Dean P., P. D. McGorry, C. S. Wallace, Lee Hun Low, D. L. Copolov, and B. S. Singh. Constructing a minimal diagnostic decision tree. Methods of Information in Medicine, 32(2):161-166, April 1993.Google Scholar
  237. McQueen, R.J., S. R. Garner, C.G. Nevill-Manning, and I.H. Witten. Applying machine learning to agricultural data. Comp. and Electronics in Agriculture, 12(4):275-293, June 1995.CrossRefGoogle Scholar
  238. Megiddo, Nimrod. On the complexity of polyhedral separability. Discrete and Computational Geometry, 3:325-337, 1988.Google Scholar
  239. Meisel, William S. and Demetrios A. Michalopoulos. A partitioning algorithm with application in pattern classification and the optimization of decision trees. IEEE Trans. on Comp., C-22(1):93-103, January 1973.Google Scholar
  240. Mezrich, Joseph J. When is a tree a hedge? Financial Analysts J., pages 75-81, November-December 1994.Google Scholar
  241. Michie, Donald. The superarticulatory phenomenon in the context of software manufacture. Proc. of the Royal Society of London, 405A:185-212, 1986.Google Scholar
  242. Michie, Spiegelhalter and Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994. The Statlog Project.Google Scholar
  243. Miller, A. J. Subset Selection in Regression. Chapman and Hall, 1990.Google Scholar
  244. Mingers, John. Expert systems - rule induction with statistical data. J. of the Operational Research Society, 38(1):39-47, 1987.Google Scholar
  245. Mingers, John. (1989a). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4(2):227-243, 1989.CrossRefGoogle Scholar
  246. Mingers, John. (1989b). An empirical comparison of selection measures for decision tree induction. Machine Learning, 3:319-342, 1989.CrossRefGoogle Scholar
  247. Minsky, M. and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.Google Scholar
  248. Mitchell, Tom, Rich Caruana, Dayne Freitag, John McDermott, and David Zabowski. Experience with a learning personal assistant. Communications of the ACM, July 1994.Google Scholar
  249. Miyakawa, Masahiro. Optimum decision trees - an optimal variable theorem and its related applications. Acta Informatica, 22(5):475-498, 1985.Google Scholar
  250. Miyakawa, Masahiro. Criteria for selecting a variable in the construction of efficient decision trees. IEEE Trans. on Comp., 38(1):130-141, January 1989.CrossRefGoogle Scholar
  251. Machine Learning: Proc. of the Tenth Int. Conf., Univ. of Massachusetts, Amherst, MA, 27-29th, June 1993. Morgan Kaufmann Pub. Inc. Editor: Paul E. Utgoff.Google Scholar
  252. Machine Learning: Proc. of the Eleventh Int. Conf., Rutgers Univ., New Brunswick, NJ, 10-13th, July 1994. Morgan Kaufmann Pub. Inc. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  253. Machine Learning: Proc. of the Twelfth Int. Conf., Tahoe City, CA, 10-13th, July 1995. Morgan Kaufmann Pub. Inc., San Mateo, CA. Editor: Jeffrey Schlimmer.Google Scholar
  254. Mogre, Advait, Robert McLaren, James Keller, and Raghuram Krishnapuram. Uncertainty management for rule-based systems with application to image analysis. IEEE Trans. on Systems, Man and Cybernetics, 24(3):470-481, March 1994.Google Scholar
  255. Moore, Andrew W. and Mary S. Lee. Efficient algorithms for minimizing cross validation error. In (ML-94), pages 190-198. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  256. Moret, Bernard M. E., M. G. Thomason, and R. C. Gonzalez. The activity of a variable and its relation to decision trees. ACM Trans. on Programming Language Systems, 2(4):580-595, October 1980.CrossRefGoogle Scholar
  257. Moret, Bernard M.E. Decision trees and diagrams. Computing Surveys, 14(4):593-623, December 1982.CrossRefGoogle Scholar
  258. Morgan, J. N. and R. C. Messenger. THAID: a sequential search program for the analysis of nominal scale dependent variables. Technical report, Institute for Social Research, Univ. of Michigan, Ann Arbor, MI, 1973.Google Scholar
  259. Morris, D.T. and D. Kalles. Decision trees and domain knowledge in pattern recognition. In (Gelsema and Kanal, 1994), pages 25-36.Google Scholar
  260. Mucciardi, A. N. and E. E. Gose. A comparison of seven techniques for choosing subsets of pattern recognition properties. IEEE Trans. on Comp., C-20(9):1023-1031, September 1971.Google Scholar
  261. Muller, W. and F. Wysotzki. Automatic construction of decision trees for classification. Annals of Operations Research, 52:231, 1994.Google Scholar
  262. Murphy, O. J. and R. L. McCraw. Designing storage efficient decision trees. IEEE Trans. on Comp., 40(3):315-319, March 1991.CrossRefGoogle Scholar
  263. Murphy, Patrick M. An empirical analysis of the benefit of decision tree size biases as a function of concept distribution. Submitted to the Machine Learning journal, July 1994.Google Scholar
  264. Murphy, Patrick M. and David Aha. UCI repository of machine learning databases - a machine-readable data repository. Maintained at the Department of Information and Computer Science, Univ. of California, Irvine. Anonymous FTP from in the directory pub/machine-learning-databases, 1994.Google Scholar
  265. Murphy, Patrick M. and Michael J. Pazzani. Exploring the decision forest: An empirical investigation of Occam's Razor in decision tree induction. J. of Artificial Intelligence Research, 1:257-275, 1994.Google Scholar
  266. Murthy, Sreerama K., S. Kasif, S. Salzberg, and R. Beigel. OC1: Randomized induction of oblique decision trees. In (AAAI-93), pages 322-327.Google Scholar
  267. Murthy, Sreerama K., Simon Kasif, and Steven Salzberg. A system for induction of oblique decision trees. J. of Artificial Intelligence Research, 2:1-33, August 1994.Google Scholar
  268. Murthy, Sreerama K. and Steven Salzberg. (1995a). Decision tree induction: How effective is the greedy heuristic? In Proc. of the First Int. Conf. on Knowledge Discovery in Databases, Montreal, Canada, August 1995.Google Scholar
  269. Murthy, Sreerama K. and Steven Salzberg. (1995b). Lookahead and pathology in decision tree induction. In (IJCAI-95). to appear.Google Scholar
  270. Narendra, P. M. and K. Fukanaga. A branch and bound algorithm for feature subset selection. IEEE Trans. on Comp., C-26(9):917-922, 1977.Google Scholar
  271. Nau, Dana S. Decision quality as a function of search depth on game trees. J. of the Association of Computing Machinery, 30(4):687-708, October 1983.Google Scholar
  272. Naumov, G. E. NP-completeness of problems of construction of optimal decision trees. Soviet Physics, Doklady, 36(4):270-271, April 1991.Google Scholar
  273. Niblett, T. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, England, 1986.Google Scholar
  274. Nilsson, N.J. Learning Machines. Morgan Kaufmann, 1990.Google Scholar
  275. Nilsson, T., T. Lundgren, H. Odelius, R. Sillen, and J.G. Noren. A computerized induction analysis of possible covariations among different elements in human tooth enamel. Artificial Intelligence in Medicine, 8(6):515-526, November 1996.CrossRefGoogle Scholar
  276. Norton, Steven W. Generating better decision trees. In (IJCAI-89), pages 800-805. Editor: N. S. Sridharan.Google Scholar
  277. Núñez, M. The use of background knowledge in decision tree induction. Machine Learning, 6:231-250, 1991.CrossRefGoogle Scholar
  278. Oates, Tim and David Jensen. The effects of training set size on decision tree complexity. In Proceedings of the 14th International Conference on Machine Learning, pages 254-262. Morgan Kaufmann, 1997.Google Scholar
  279. Oliver, J. Decision graphs-an extension of decision trees. In (AI&Stats-93).Google Scholar
  280. O'Muircheartaigh, Colm A. Statistical analysis in the context of survey research. In O'Muircheartaigh and Payne (1977), pages 1-40.Google Scholar
  281. O'Muircheartaigh, Colm A. and Clive Payne, editors. The analysis of survey data, volume I. JohnWiley & Sons, Chichester, UK, 1977.Google Scholar
  282. Pagallo, Giulia M. and D. Haussler. Boolean feature discovery in empirical learning. Machine Learning, 5(1):71-99, March 1990.CrossRefGoogle Scholar
  283. Page, C. D. and S. Muggleton. How U-learnability fits machine learning practice: a learnability result for the decision tree learner CART. In Proceedings of the Conference on Applied Decision Technologies (ADT'95). Volume 1: Computational Learning and Probabilistic Reasoning, pages 325-342, Uxbridge, UK, April 1995. Unicom Seminars.Google Scholar
  284. Pal, N.R., S. Chakraborty, and A. Bagchi. RID3: An id3-like algorithm for real data. Information Sciences, 96(3-4):271-290, February 1997.CrossRefGoogle Scholar
  285. Palvia, Shailendra C. and Steven R. Gordon. Tables, trees and formulas in decision analysis. Communications of the ACM, 35(10):104-113, October 1992.CrossRefGoogle Scholar
  286. Park, Youngtae. A comparison of neural net classifiers and linear tree classifiers: Their similarities and differences. Pattern Recognition, 27(11):1493-1503, 1994.CrossRefGoogle Scholar
  287. Park, Youngtae and Jack Sklansky. Automated design of linear tree classifiers. Pattern Recognition, 23(12):1393-1412, 1990.CrossRefGoogle Scholar
  288. Park, Youngtae and Jack Sklansky. Automated design of multiple-class piecewise linear classifiers. J. of Classification, 6:195-222, 1989.Google Scholar
  289. Pattipati, Krishna R. and Mark G. Alexandridis. Application of heuristic search and information theory to sequential fault diagnosis. IEEE Trans. on Systems, Man and Cybernetics, 20(4):872-887, July/August 1990.Google Scholar
  290. Payne, R. W. and D. A. Preece. Identification keys and diagnostic tables: A review. J. of the Royal Statistical Society: series A, 143:253, 1980.Google Scholar
  291. Pearson, R. A. and P. E. Stokes. Vector evaluation in induction algorithms. Int. J. of High Speed Computing, 2(1):25-100, March 1990.Google Scholar
  292. Perner, P., T. B. Belikova, and N. I. Yashunskaya. Knowledge acquisition by symbolic decision tree induction for interpretation of digital images in radiology. Lecture Notes in Computer Science, 1121:208, 1996.Google Scholar
  293. Pipitone, F., K. A. De Jong, and W. M. Spears. An artificial intelligence approach to analog systems diagnosis. In Ruey-wen Liu, editor, Testing and Diagnosis of Analog Circuits and Systems. Van Nostrand-Reinhold, New York, 1991.Google Scholar
  294. Piramuthu, Selwyn, Narayan Raman, and Michael J. Shaw. Learning-based scheduling in a flexible manufacturing flow line. IEEE Trans. on Eng. Management, 41(2):172-182, May 1994.CrossRefGoogle Scholar
  295. Pizzi, N. J. and D. Jackson. Comparitive review of knowledge eng. and inductive learning using data in a medical domain. Proc. of the SPIE: The Int. Society for Optical Eng., 1293(2):671-679, April 1990.Google Scholar
  296. Qing-Yun, Shi and King-Sun Fu. A method for the design of binary tree classifiers. Pattern Recognition, 16:593-603, 1983.CrossRefGoogle Scholar
  297. Quinlan, John Ross. Discovering rules by induction from large collections of examples. In Donald Michie, editor, Expert Systems in the Micro Electronic Age. Edinburgh Univ. Press, Edinburgh, UK, 1979.Google Scholar
  298. Quinlan, John Ross. (1986a). The effect of noise on concept learning. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, volume 2. Morgan Kauffman, San Mateo, CA, 1986.Google Scholar
  299. Quinlan, John Ross. (1986b). Induction of decision trees. Machine Learning, 1:81-106, 1986.CrossRefGoogle Scholar
  300. Quinlan, John Ross. Simplifying decision trees. Int. J. of Man-Machine Studies, 27:221-234, 1987.Google Scholar
  301. Quinlan, John Ross. An empirical comparison of genetic and decision tree classifiers. In Fifth Int. Conf. on Machine Learning, pages 135-141, Ann Arbor, Michigan, 1988. Morgan Kaufmann.Google Scholar
  302. Quinlan, John Ross. Unknown attribute values in induction. In Proc. of the Sixth Int. Workshop on Machine Learning, pages 164-168, San Mateo, CA, 1989. Morgan Kaufmann.Google Scholar
  303. Quinlan, John Ross. Probabilistic decision trees. In R.S. Michalski and Y. Kodratoff, editors, Machine Learning: An Artificial Intelligence Approach-Volume 3. Morgan Kaufmann, San Mateo, CA, 1990.Google Scholar
  304. Quinlan, John Ross. (1993a). C4.5: Programs for Machine Learning. Morgan Kaufmann Pub., San Mateo, CA, 1993.Google Scholar
  305. Quinlan, John Ross. (1993b). Comparing connectionist and symbolic learning methods. In S. Hanson, G. Drastal, and R. Rivest, editors, Computational Learning Theory and Natural Learning Systems: Constraints and Prospects. MIT Press, 1993.Google Scholar
  306. Quinlan, John Ross. Improved use of continuous attributes in C4.5. J. of Artificial Intelligence Research, 4:77-90, March 1996.Google Scholar
  307. Quinlan, John Ross and Ronald L. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation, 80(3):227-248, March 1989.Google Scholar
  308. Ragavan, Harish and Larry Rendell. Lookahead feature construction for learning hard concepts. In (ML-93), pages 252-259. Editor: Paul E. Utgoff.Google Scholar
  309. Rendell, Larry and Harish Ragavan. Improving the design of induction methods by analyzing algorithm functionality and data-based concept complexity. In (IJCAI-93), pages 952-958. Editor: Ruzena Bajcsy.Google Scholar
  310. Renyi, Alfred and Laszlo Vekerdi. Probability Theory. North-Holland Publishing Company, Amsterdam, 1970.Google Scholar
  311. Riddle, P., R. Segal, and O. Etzioni. Representation design and brute-force induction in a Boeing manufacturing domain. Applied Artificial Intelligence, 8(1):125-147, January-March 1994.Google Scholar
  312. Risannen, Jorma. Stochastic Complexity in Statistica Enquiry. World Scientific, 1989.Google Scholar
  313. Riskin, Eve A. and Robert M. Gray. Lookahead in growing tree-structured vector quantizers. In ICASSP 91: Int. Conf. on Accoustics, Speech and Signal Processing, volume 4, pages 2289-2292, Toronto, Ontario, May 14th-17th 1991. IEEE.Google Scholar
  314. Rounds, E. A combined non-parametric approach to feature selection and binary decision tree design. Pattern Recognition, 12:313-317, 1980.CrossRefGoogle Scholar
  315. Rovnyak, Steven, Stein Kretsinger, James Thorp, and Donald Brown. Decision trees for real time transient stability prediction. IEEE Trans. on Power Systems, 9(3):1417-1426, August 1994.CrossRefGoogle Scholar
  316. Rymon, Ron. An SE-tree based characterization of the induction problem. In (ML-93), pages 268-275. Editor: Paul E. Utgoff.Google Scholar
  317. Rymon, Ron and N. M. Short, Jr. Automatic cataloging and characterization of earth science data using set enumeration trees. Telematics and Informatics, 11(4):309-318, Fall 1994.CrossRefGoogle Scholar
  318. Safavin, S. Rasoul and David Landgrebe. A survey of decision tree classifier methodology. IEEE Trans. on Systems, Man and Cybernetics, 21(3):660-674, May/June 1991.Google Scholar
  319. Sahami, M. Learning non-linearly separable boolean functions with linear threshold unit trees and madaline-style networks. In (AAAI-93), pages 335-341.Google Scholar
  320. Salzberg, Steven. Locating protein coding regions in human DNA using a decision tree algorithm. J. of Computational Biology, 2(3): 473-485, 1995.Google Scholar
  321. Salzberg, Steven, Rupali Chandar, Holland Ford, Sreerama Murthy, and Rick White. Decision trees for automated identification of cosmic-ray hits in Hubble Space Telescope images. Publications of the Astronomical Society of the Pacific, 107:1-10, March 1995.CrossRefGoogle Scholar
  322. Sankar, Anant and Richard J. Mammone. Growing and pruning neural tree networks. IEEE Trans. on Comp., 42(3):291-299, March 1993.CrossRefGoogle Scholar
  323. Saul, Lawrence and Michael I. Jordan. Learning in Boltzmann trees. Neural Computation, 6(6):1174-1184, November 1994.Google Scholar
  324. Schaffer, Cullen. Overfitting avoidance as bias. Machine Learning, 10:153-178, 1993.CrossRefGoogle Scholar
  325. Schaffer, Cullen. A conservation law for generalization performance. In (ML-94), pages 259-265. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  326. Schaffer, Cullen. Conservation of generalization: A case study. Technical report, Department of Computer Science, CUNY/Hunter College, February 1995.Google Scholar
  327. Schmidl, T.M., P. C. Cosman, and Robert M. Gray. Unbalanced non-binary tree-structured vector quantizers. In A. Singh, editor, Conf. Record of the Twenty-Seventh Asilomar Conf. on Signals, Systems and Comp., volume 2, pages 1519-1523, Los Alamitos, CA, November 1st-3rd 1993. IEEE Computer Society Press. Conf. held at Pacific Grove, CA.Google Scholar
  328. Schuermann, J. and W. Doster. A decision-theoretic approach in hierarchical classifier design. Pattern Recognition, 17:359-369, 1984.CrossRefGoogle Scholar
  329. Schwartz, S., J. Wiles, I. Gough, and S. philips. Connectionist, rule-based and bayesian decision aids: An empirical comparison. In “AI Frontiers in Statistics III,” Editor D.J. Hand, pages 264-278. Chapman & Hall, London, 1993.Google Scholar
  330. Sethi, Ishwar Krishnan. Entropy nets: From decision trees to neural networks. Proc. of the IEEE, 78(10), October 1990.Google Scholar
  331. Sethi, Ishwar Krishnan and B. Chatterjee. Efficient decision tree design for discrete variable pattern recognition problems. Pattern Recognition, 9:197-206, 1977.CrossRefGoogle Scholar
  332. Sethi, Ishwar Krishnan and G.P.R. Sarvarayudu. Hierarchical classifier design using mutual information. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-4(4):441-445, July 1982.Google Scholar
  333. Sethi, Ishwar Krishnan and J. H. Yoo. Design of multicategory, multifeature split decision trees using perceptron learning. Pattern Recognition, 27(7):939-947, 1994.CrossRefGoogle Scholar
  334. Shang, Nong and Leo Breiman. Distribution based trees are more accurate. In Proc. of the Int. Conf. on Neural Information Processing, pages 133-138. 1996.Google Scholar
  335. Shannon, C. E. A mathematical theory of communication. Bell System Technical J., 27:379-423, 623-656, 1948.Google Scholar
  336. Shavlik, Jude W., R. J. Mooney, and G. G. Towell. Symbolic and neural learning algorithms: An empirical comparison. Machine Learning, 6(2):111-144, 1991.CrossRefGoogle Scholar
  337. Shimozono, S., A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara, and S. Arikawa. Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Trans. of the Information Processing Society of Japan, 35(10):2009-2018, October 1994.Google Scholar
  338. Shlien, Seymour. Multiple binary decision tree classifiers. Pattern Recognition, 23(7):757-763, 1990.CrossRefGoogle Scholar
  339. Shlien, Seymour. Nonparametric classification using matched binary decision trees. Pattern Recognition Letters, 13(2):83-88, February 1992.CrossRefGoogle Scholar
  340. Siedlecki, W. and J. Sklansky. On automatic feature selection. Int. J. of Pattern Recognition and Artificial Intelligence, 2(2):197-220, 1988.Google Scholar
  341. Sirat, J.A. and J.-P. Nadal. Neural trees: A new tool for classification. Network: Computation in Neural Systems, 1(4):423-438, October 1990.CrossRefGoogle Scholar
  342. Sklansky, Jack and Leo Michelotti. Locally trained piecewise linear classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-2(2):101-111, March 1980.Google Scholar
  343. Sklansky, Jack and Gustav Nicholas Wassel. Pattern classifiers and trainable machines. Springer-Verlag, New York, 1981.Google Scholar
  344. Smyth, Padhraic, Alexander Gray, and Usama M. Fayyad. Retrofitting decision tree classifiers using kernel density estimation. In Proc. 12th International Conference on Machine Learning, pages 506-514. Morgan Kaufmann, 1995.Google Scholar
  345. Sonquist, J.A., E. L. Baker, and J. N. Morgan. Searching for Structure. Institute for Social Research, Univ. of Michigan, Ann Arbor, MI, 1971.Google Scholar
  346. Suen, C. Y. and Qing Ren Wang. ISOETRP - an interactive clustering algorithm with new objectives. Pattern Recognition, 17:211-219, 1984.CrossRefGoogle Scholar
  347. Sun, Xiaorong, Yuping Qiu, and Louis Anthony Cox. A hill-climbing approach to construct near-optimal decision trees. In (AI&Stats-95), pages 513-519.Google Scholar
  348. Swain, P. and H. Hauska. The decision tree classifier design and potential. IEEE Trans. on Geoscience and Electronics, GE-15:142-147, 1977.Google Scholar
  349. Talmon, Jan L. A multiclass nonparametric partitioning algorithm. Pattern Recognition Letters, 4:31-38, 1986.CrossRefGoogle Scholar
  350. Talmon, Jan L., Willem R. M. Dassen, and Vincent Karthaus. Neural nets and classification trees: A comparison in the domain of ECG analysis. In (Gelsema and Kanal, 1994), pages 415-423.Google Scholar
  351. Talmon, Jan L. and P. McNair. The effect of noise and biases on the performance of machine learning algorithms. Int. J. of Bio-Medical Computing, 31(1):45-57, July 1992.Google Scholar
  352. Tan, Ming. Cost-sensitive learning of classification knowledge and its applications in robotics. Machine Learning, 13:7-33, 1993.CrossRefGoogle Scholar
  353. Taylor, Paul C. and Bernard W. Silverman. Block diagrams and splitting criteria for classification trees. Statistics and Computing, 3(4):147-161, December 1993.Google Scholar
  354. Thrun, Sebastian and et al. The monk's problems: A performance comparison of different learning algorithms. Technical Report CMU-CS-91-197, School of Computer Science, Carnegie-Mellon Univ., Pittsburgh, PA, 1991.Google Scholar
  355. Todeshini, R. and E. Marengo. Linear discriminant classification tree: a user-driven multicriteria classification method. Chemometrics and Intelligent Lab. Systems, 16:25-35, 1992.CrossRefGoogle Scholar
  356. Tu, Pei-Lei and Jen-Yao Chung. A new decision-tree classification algorithm for machine learning. In Proc. of the IEEE Int. Conf. on Tools with AI, pages 370-377, Arlington, Virginia, November 1992.Google Scholar
  357. Turksen, I. B. and H. Zhao. An equivalence between inductive learning and pseudo-Boolean logic simplification: a rule generation and reduction scheme. IEEE Trans. on Systems, Man and Cybernetics, 23(3):907-917, May-June 1993.Google Scholar
  358. Turney, Peter D. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2:369-409, March 1995.Google Scholar
  359. Utgoff, Paul E. (1989a). Incremental induction of decision trees. Machine Learning, 4:161-186, 1989.CrossRefGoogle Scholar
  360. Utgoff, Paul E. (1989b). Perceptron trees: A case study in hybrid concept representations. Connection Science, 1(4):377-391, 1989.Google Scholar
  361. Utgoff, Paul E. An improved algorithm for incremental induction of decision trees. In (ML-94), pages 318-325. Editors: William W. Cohen and Haym Hirsh.Google Scholar
  362. Utgoff, Paul E., Neil C. Berkman, and Jeffery A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, 29:5-44, 1997.CrossRefGoogle Scholar
  363. Utgoff, Paul E. and Carla E. Brodley. An incremental method for finding multivariate splits for decision trees. In Proc. of the Seventh Int. Conf. on Machine Learning, pages 58-65, Los Altos, CA, 1990. Morgan Kaufmann.Google Scholar
  364. Van Campenhout, J.M. On the Problem of Measurement Selection. PhD thesis, Stanford Univ., Dept. of Electrical Eng., 1978.Google Scholar
  365. Van Campenhout, Jan M. Topics in measurement selection. In Krishnaiah and Kanal (Krishnaiah and Kanal, 1987), pages 793-803.Google Scholar
  366. Van de Merckt, Thierry. Decision trees in numerical attribute spaces. In (IJCAI-93), pages 1016-1021. Editor: Ruzena Bajcsy.Google Scholar
  367. Van de Velde, Walter. Incremental induction of topologically minimal trees. In Bruce W. Porter and Ray J. Mooney, editors, Proc. of the Seventh Int. Conf. on Machine Learning, pages 66-74, Austin, Texas, 1990.Google Scholar
  368. Varshney, P.K., C.R.P. Hartmann, and J.M. De Faria Jr. Applications of information theory to sequential fault diagnosis. IEEE Trans. on Comp., C-31(2):164-170, 1982.Google Scholar
  369. Wallace, C.S. and D. M. Boulton. An information measure for classification. Computer J., 11:185-194, 1968.Google Scholar
  370. Wallace, C.S. and J. D. Patrick. Coding decision trees. Machine Learning, 11(1):7-22, April 1993.CrossRefGoogle Scholar
  371. Wang, Qing Ren and C. Y. Suen. Analysis and design of a decision tree based on entropy reduction and its application to large character set recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:406-417, 1984.Google Scholar
  372. Wang, Qing Ren and Ching Y. Suen. Large tree classifier with heuristic search and global training. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-9(1):91-102, January 1987.Google Scholar
  373. Wassel, Gustav Nicholas and Jack Sklansky. Training a one-dimensional classifier to minimize the probability of error. IEEE Trans. on Systems, Man and Cybernetics, SMC-2:533-541, September 1972.Google Scholar
  374. Watanabe, Larry and Larry Rendell. Learning structural decision trees from examples. IJCAI volume 2, pages 770-776, Darling Harbour, Sydney, Australia, 24-30th, August 1991. Morgan Kaufmann Pub. Inc., San Mateo, CA. Editors: John Mylopoulos and Ray Reiter.Google Scholar
  375. Watanabe, S. Pattern recognition as a quest for minimum entropy. Pattern Recognition, 13:381-387, 1981.CrossRefGoogle Scholar
  376. Weir, Nicholas, S. Djorgovski, and Usama M. Fayyad. Initial galaxy counts from digitized POSS-II. The Astronomical J., 110(1):1, 1995.CrossRefGoogle Scholar
  377. Weir, Nicholas, Usama M. Fayyad, and S. Djorgovski. Automated star/galaxy classification for digitized POSS-II. The Astronomical J., 109(6):2401, 1995.CrossRefGoogle Scholar
  378. Weiss, S. and I. Kapouleas. An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. In (IJCAI-89), pages 781-787. Editor: N. S. Sridharan.Google Scholar
  379. White, Allan P. and Wei Zhang Liu. Technical note: Bias in information-based measures in decision tree induction. Machine Learning, 15(3):321-329, June 1994.CrossRefGoogle Scholar
  380. Wilks, P.A.D. and M.J. English. Accurate segmentation of respiration waveforms from infants enabling identification and classification of irregular breathing patterns. Medical Eng. and Physics, 16(1):19-23, January 1994.Google Scholar
  381. Wirth, J. and J. Catlett. Experiments on the costs and benefits of windowing in ID3. In Fifth Int. Conf. on Machine Learning, pages 87-99, Ann Arbor, Michigan, 1988. Morgan Kaufmann.Google Scholar
  382. Wolpert, David H. (1992a). On overfitting avoidance as bias. Technical Report SFI TR 92-03-5001, The Santa Fe Institute, 1992.Google Scholar
  383. Wolpert, David H. (1992b). On the connection between in-sample testing and generalization error. Complex Systems, 6:47-94, 1992.Google Scholar
  384. Woods, K.S., C. C. Doss, K. W. Vowyer, J. L. Solka, C. E. Prieve, and W. P. Jr. Kegelmeyer. Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Int. J. of Pattern Recognition and Artificial Intelligence, 7(6):1417-1436, December 1993.Google Scholar
  385. You, K.C. and King-Sun Fu. An approach to the design of a linear binary tree classifier. In Proc. of the Third Symposium on Machine Processing of Remotely Sensed Data, West Lafayette, IN, 1976. Purdue Univ.Google Scholar
  386. Yuan, Y. and M. J. Shaw. Induction of fuzzy decision trees. Fuzzy Sets and Systems, 69(2):125, 1995.CrossRefGoogle Scholar
  387. Zhengou, Wang and Lin Yan. A new inductive learning algorithm: Separability-Based Inductive learning algorithm. Acta Automatica Sinica, 5(3):267-270, 1993. Translated into Chinese J. of Automation.Google Scholar
  388. Zhou, Xiao Jia and Tharam S. Dillon. A statistical-heuristic feature selection criterion for decision tree induction. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-13(8):834-841, August 1991.CrossRefGoogle Scholar
  389. Zimmerman, Seth. An optimal search procedure. The American Mathematical Monthly, 66(8):690-693, March 1959.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Sreerama K. Murthy
    • 1
  1. 1.Siemens Corporate ResearchPrincetonUSA

Personalised recommendations