The Racing Algorithm: Model Selection for Lazy Learners
 Oden Maron,
 Andrew W. Moore
 … show all 2 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called “racing” that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leaveoneout cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI scans.
 Aha, D. W. (1990) A Study of InstanceBased Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Evaluations. University of California, Irvine
 Atkeson, C. G., Moore, A. W. & Schaal, S. A. (1997). Locally Weighted Learning. AI Review, this issue.
 Atkeson, C. G. (1990). MemoryBased Approaches to Approximating Continuous Functions. In 1990 Workshop on Nonlinear Modeling and Forecasting. AdisonWesley.
 Bottou, L., Vapnik, V. (1992) Local Learning Algorithms. Neural Computation 4: pp. 888900
 Box, G. E. P., Hunter, W. G. & and Hunter, J. S. (1978). Statistics for Experimenters. Wiley.
 Caruana, R. A. & and Freitag, D. (1994). Greedy Attribute Selection. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 28–36. Morgan Kaufmann.
 Cleveland, W. S., Devlin, S. J., Grosse, E. (1988) Regression by local fitting: Methods, properties, and computational algorithms. Journal of Econometrics 37: pp. 87114
 Conte, S. D. & De Boor, C. (1980). Elementary Numerical Analysis. McGraw Hill.
 Dasarathy, B. V. (1991). Nearest Neighbor Norms: NN Patern Classifaction Techniques. IEEE Computer Society Press.
 Efron, B., Tibshirani, R. (1991) Statistical Data Analysis in the Computer Age. Science 253: pp. 390395
 Fix, E. & Hodges, J. L. (1951). Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Project 21–49–004, Report Number 4, USAF School of Aviation Medicine.
 Goldberg, D. (1989) Genetic Algorithms in Search, Optimization and Machine Learning. AddisonWesley, Reading, MA
 Gratch, J., Chien, S. & DeJong, G. (1993). Learning Search Control Knowledge for Deep Space Network Scheduling. In Proceedings of the 10th International Conference on Machine Learning, pp. 135–142. Morgan Kaufmann.
 Gratch, J. (1994) An effective method for correlated selection problems. University of Illinois, UrbanaChampaign
 Greiner, R. & Jurisca, I. (1992). A statistical approach to solving the EBL utility problem. In Proceedings of the Tenth International conference on Artificial Intelligence, pp. 241–248. MIT Press.
 Hastie, T. J. & Tibshirani, R. J. (1990). Generalized additive models. Chapman and Hall.
 Haussler, D. (1992) Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation 100: pp. 78150
 Hoeffding, W. (1963) Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58: pp. 1330
 John, G. H., Kohavi, R. & Pfleger, K. (1994). Irrelevant features and the Subset Selection Problem. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 121–129. Morgan Kaufmann.
 Kaelbling, L. P. (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR–90–04, Stanford University, Department of Computer Science.
 Kreider, J. F. & Haberl, J. S. (1994). Predicting hourly building energy usage: The great energy predictor shootout — Overview and discussion of results. Transactions of the American Society of Heating, Refrigerating and AirConditioning Engineers, 100, Part 2.
 Lowe, D. G. (1995) Similarity metric learning for a variablekernel classifier. Neural Computation 7: pp. 7285
 Maron, O. & Moore, A. W. (1994). Hoeffding Races: Accelerating model selection search for classification and function approximation. In Cowan, J. D., Tesauro, G. & Alspector, J. (eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
 Maron, O. (1994). Hoeffding Races: Model Selection for MRI Classification. Masters Thesis, Dept. of Electrical Engeineering and Computer Science, M.I.T.
 Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall.
 Moore, A. W. & Lee, M. S. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 190–198. Morgan Kaufmann.
 Moore, A. W., Hill, D. J. & Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers and function approximators. In Hanson, S., Judd, S. & Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press.
 Moore, A. W. (1992). Fast, robust adaptive control by learning only forward models. In Moody, J. E., Hanson, S. J. & Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4. Morgan Kaufmann.
 Murphy, P. M. (1996). UCI repository of machine learning databases. For more information contact mlrepository@ics.uci.edu.
 Omohundro, S. (1993). Private communication.
 Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery, B. P. (1992) Numerical Recipes in C: the art of scientific computing. Cambridge University Press, New York
 Rivest, R. L. & Yin, Y. (1993). Simulation Results for a new twoarmed bandit heuristic. Technical report, Laboratory for Computer Science, M.I.T.
 Schaal, S. & Atkeson, C. G. (1993). Open loop stable control strategies for robot juggling. In Proceedings of IEEE conference on Robotics and Automation.
 Schmitt, S. A. (1969). Measuring Uncertainty: An elementary introduction to Bayesian Statistics. AddisonWesley.
 Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 293–301. Morgan Kaufmann.
 Weiss, S. M., Kulikowski, C. A. (1991) Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. MorganKaufmann, San Mateo, CA
 Welch, B. L. (1937). The significance of the difference between two means when the population variances are unequal. Biometrika 29.
 Zhang, X, Mesirov, J. P., Waltz, D. L. (1992) Hybrid system for protein secondary structure prediction. Journal of Molecular Biology 225: pp. 10491063
 Title
 The Racing Algorithm: Model Selection for Lazy Learners
 Journal

Artificial Intelligence Review
Volume 11, Issue 15 , pp 193225
 Cover Date
 19970201
 DOI
 10.1023/A:1006556606079
 Print ISSN
 02692821
 Online ISSN
 15737462
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 lazy learning
 model selection
 cross validation
 optimization
 attribute selection
 Industry Sectors
 Authors

 Oden Maron ^{(1)}
 Andrew W. Moore ^{(2)}
 Author Affiliations

 1. M.I.T. Artificial Intelligence Lab, NE45755, 545 Technology Square, Cambridge, MA, 02139. Email
 2. Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213. Email