Abstract
Speedup learning systems are typically evaluated by comparing their impact on a problem solver's performance. The impact is measured by running the problem solver, before and after learning, on a sample of problems randomly drawn from some distribution. Often, the experimenter imposes a bound on the CPU time the problem solver is allowed to spend on any individual problem. Segre et al. (1991) argue that the experimenter's choice of time bound can bias the results of the experiment. To address this problem, we present statistical hypothesis tests specifically designed to analyze speedup data and eliminate this bias. We apply the tests to the data reported by Etzioni (1990a) and show that most (but not all) of the speedups observed are statistically significant.
Article PDF
Similar content being viewed by others
References
Brown, B.W. Jr., & Hollander, M. (1977).Statistics: A biomedical introduction. New York: Wiley.
Cohen, Paul R., & Kim, John B. (1993). A bootstrap test for comparing performance of programs when data are censored, and comparisons to Etzioni's test. Unpublished manuscript, University of Massachusetts, Amherst.
DeGroot, Morris H. (1986).Probability and statistics 2nd ed. Reading, MA: Addison Wesley.
Etzioni, Oren. (1990a).A structural theory of explanation-based learning. Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA. (Available as technical report CMU-CS-90-185.)
Etzioni, Oren. (1990b). Why Prodigy/EBL works. InProceedings of AAAI-90.
Gibbons, Jean Dickinson. (1971).Nonparametric statistical inference. New York: McGraw-Hill.
Hajek, J., & Sidak, Z. (1967).Theory of rank tests. New York: Academic Press.
Hemelryk, J. (1952). A theorem on the sign test when ties are present.Indagationes Mathematica, 14 322–326.
Holt, J.D. & Prentice, R.L. (1974). Survival analysis in twin studies and matched pair experiments.Biometrika, 61 17–30.
Kalbfleisch, J.D., & Prentice, R.L. (1980).The statistical analysis of failure time data. New York: Wiley.
Kambhampati, Subbarao, & Chen, Jengchin. (1993). Relative utility of ebg based plan reuse in partial ordering vs. total ordering planning. InProceedings of the 11th National Conference on Artificial Intelligence (AAAI-93). Cambridge, MA: MIT Press (AAAI).
Knoblock, Craig A. (1990). Learning abstraction hierarchies for problem solving. InProceedings of the Eighth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
Knoblock, Craig A. (In press). Automatically generating abstractions for planning.Artificial Intelligence.
Lehmann, E.L. (1975).Nonparametrics: Statistical methods based on ranks. San Francisco: Holden Day.
Minton, Steven (1988a). Quantitative results concerning the utility of explanation-based learning. InProceedings of AAAI-88 (pp. 564–569).
Minton, Steven. (1988b).Learning effective search control knowledge: An explanation-based approach. Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA. (Available as technical report CMU-CS-88-133.)
Minton, Steven. (1993). Integrating heuristics for constraint satisfaction problems: A case study. InAAAI-93 Proceedings.
Mooney, Raymond J. (1989). The effect of rule use on the utility of explanation-based learning. InProceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 725–730).
O'Rorke, P. (1989). LT revisited: Explanation-based learning and the logic of Principia Mathematica.Machine Learning, 4(2 117–160.
Segre, Alberto, Elkan, Charles, & Russell, Alexander. (1991). A critical look at experimental evaluations of EBL.Machine Learning, 6(2).
Shavlik, Jude W. (1990). Acquiring recursive concepts and iterative concepts with explanation-based learning.Machine Learning, 5(1).
Wilks, Samuel S. (1962).Mathematical statistics. New York: John Wiley & Sons.
Woolson, R.F., & Lachenbruch, P.A. (1980). Rank tests for censored matched pairs.Biometrika, 67 597–606.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Etzioni, O., Etzioni, R. Statistical methods for analyzing speedup learning experiments. Mach Learn 14, 333–347 (1994). https://doi.org/10.1007/BF00993983
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00993983