Heuristic Search for Model Structure: the Benefits of Restraining Greed

  • John F. ElderIV
Part of the Lecture Notes in Statistics book series (LNS, volume 112)


Inductive modeling or “machine learning” algorithms are able to discover structure in high-dimensional data in a nearly automated fashion. These adaptive statistical methods — including decision trees, polynomial networks, projection pursuit models, and additive networks — repeatedly search for, and add on, the model component judged best at that state. Because of the huge model space of possible components, the choice is typically greedy; that is, optimal only in the very short term. In fact, it is usual for the analyst and algorithm to be greedy at three levels: when choosing a 1) term within a model, 2) model within a family, and 3) family within a wide collection of methods. It is better, we argue, to “take a longer view” in each stage. For the first stage (term selection) examples are presented for classification using decision trees and estimation using regression. To improve the third stage (method selection) we propose fusing information from disparate models to make a combined model more robust. (Fused models merge their output estimates but also share information on, for example, variables to employ and cases to ignore.) Benefits of fusing are demonstrated on a challenging classification dataset, where the task is to infer the species of a bat from its chirps.


Inductive Modeling Optimal Subset Decision Tree Algorithm Greedy Method Automate Induction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barron, A. R. (1984). Predicted Squared Error: A Criterion for Automatic Model Selection. Ch. 4 of (Farlow, 1984 )Google Scholar
  2. Barron, R. L. & D. Abbott (1988). User of Polynomial Networks in Optimum, Real-time, Two-Point Boundary Value Guidance of Tactical Weapons,Proc. Military Comp. Conf. , Anaheim, CA, May 3–5.Google Scholar
  3. Berk, K. N. (1978). Comparing Subset Regression Procedures, Technometrics, 20, no. 1: 1–6.MathSciNetzbMATHCrossRefGoogle Scholar
  4. Breiman, L., J. H. Friedman, R. A. Olshen, & C. J. Stone (1984). Classification and Regression Trees. Wadsworth & Brooks, Pacific Grove, CA.zbMATHGoogle Scholar
  5. Cover, T. M. (1974). The Best Two Independent Measurements Are Not the Two Best. IEEE Trans. Systems, Man & Cybernetics, 4.Google Scholar
  6. Desroachers, A. & S. Mohseni (1984). On Determining the Structure of a Non-Linear System, International Journal of Control, 40: 923–938.MathSciNetCrossRefGoogle Scholar
  7. Draper, N. R. & H. Smith (1966).Applied Regression Analysis. Wiley, New York.Google Scholar
  8. Elder, J. F. IV (1985). User’s Manual: ASPN: Algorithm for Synthesis of Polynomial Networks (4th Ed., 1988 ). Barron Assoc. Inc., Stanardsville, VA.Google Scholar
  9. Elder, J. F. IV (1990). Feature Elimination Using High-Order Correlation,Proc. Aerospace Applications of Artificial Intelligence, Dayton, OH, Oct. 29–31: 65–72.Google Scholar
  10. Elder, J. F. IV (1993). Assisting Inductive Modeling through Visualization, Proc. Joint Statistical Mtg. , San Francisco, CA, Aug. 7–11.Google Scholar
  11. Elder, J. F. IV & R. L. Barron (1988). Automated Design of Continuously-Adaptive Control: The “Super-Controller” Strategy for Reconfigurable Systems,Proc. American Control Conf. , Atlanta, GA, June 15–17.Google Scholar
  12. Elder, J. F. IV & D. E. Brown (1992). Induction and Polynomial Networks, Univ. VA Tech. Report IPC-TR-92–9. (Forthcoming in 1995 as Chapter 3 in Advances in Control Networks and Large Scale Parallel Distributed Processing Models, Vol. 2. Ablex, Norwood, NJ.Google Scholar
  13. Elder, J.F.IV & D. Pregibon (1995, in press) A Statistical Perspective on Knowledge Discovery in Databases, Chapter 4 in Advances in Knowledge Discovery and Data Mining, eds. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy, AAAI/MIT Press.Google Scholar
  14. Farlow, S. J. (1984), Ed. Self-Organizing Methods in Modeling: GMDH Type Algorithms. Marcel Dekker.zbMATHGoogle Scholar
  15. Fulcher, G. E. & D. E. Brown (1991). A Polynomial Network for Predicting Temperature Distributions, Institute for Parallel Computation Tech. Report 91–008, Univ. VA.Google Scholar
  16. Ivakhnenko, A. G. (1968). The Group Method of Data Handling — A Rival of the Method of Stochastic Approximation, Soviet Automatic Control, 3.Google Scholar
  17. Lloyd, D. K., & M. Lipow (1962). Reliability: Management, Methods, and Mathematics. Prentice Hall, Englewood Cliffs: 360.Google Scholar
  18. Mallows, C. L. (1973). Some Comments on Cp, Technometrics. 15: 661–675.zbMATHCrossRefGoogle Scholar
  19. Miller, A. J. (1990).Subset Selection in Regression. Chapman and Hall, NY.zbMATHGoogle Scholar
  20. Mucciardi, A. N. (1982). ALN 4000 Ultrasonic Pipe Inspection System. Nondestructive Evaluation Program: Progress in 1981, EPRI Report NP-2088-SR, Jan.Google Scholar
  21. Murthy, S. K., S. Kasif, & S. Salzberg (1994). A System for Induction of Oblique Decision Trees, Journal of Artificial Intelligence, 2: 1–32.zbMATHGoogle Scholar
  22. Prager, M. H. (1988). Group Method of Data Handling: A New Method for Stock Identification. Trans. American Fisheries Society, 117: 290–296.CrossRefGoogle Scholar
  23. Rissanen, J. (1978). Modeling by Shortest Data Description, Automatica, 14: 465–471.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • John F. ElderIV
    • 1
  1. 1.Computational & Applied Mathematics Dept. & Center for Research on Parallel ComputationRice UniversityHoustonUSA

Personalised recommendations