Machine Learning

, Volume 54, Issue 3, pp 211–254 | Cite as

Optimal Ordered Problem Solver

  • Jürgen Schmidhuber


We introduce a general and in a certain sense time-optimal way of solving one problem after another, efficiently searching the space of programs that compute solution candidates, including those programs that organize and manage and adapt and reuse earlier acquired knowledge. The Optimal Ordered Problem Solver (OOPS) draws inspiration from Levin's Universal Search designed for single problems and universal Turing machines. It spends part of the total search time for a new problem on testing programs that exploit previous solution-computing programs in computable ways. If the new problem can be solved faster by copy-editing/invoking previous code than by solving the new problem from scratch, then OOPS will find this out. If not, then at least the previous solutions will not cause much harm. We introduce an efficient, recursive, backtracking-based way of implementing OOPS on realistic computers with limited storage. Experiments illustrate how OOPS can greatly profit from metalearning or metasearching, that is, searching for faster search procedures.

OOPS bias-optimality incremental optimal universal search efficient planning and backtracking in program space metasearching and metalearning self-improvement 


  1. Anderson, C. W. (1986). Learning and problem solving with multilayer connectionist systems. Ph.D. thesis, University of Massachusetts, Dept. of Comp. and Inf. Sci.Google Scholar
  2. Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic Programming—An Introduction. San Francisco, CA, USA: Morgan Kaufmann Publishers.Google Scholar
  3. Baum, E. B., & Durdanovic, I. (1999). Toward a model of mind as an economy of agents. Machine Learning, 35:2, 155–185.Google Scholar
  4. Bennett, C. H. (1982). The thermodynamics of computation, a review. International Journal of Theoretical Physics, 21:12, 905–940.Google Scholar
  5. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.Google Scholar
  6. Bremermann, H. J. (1982). Minimum energy requirements of information transfer and computing. International Journal of Theoretical Physics, 21, 203–217.Google Scholar
  7. Chaitin, G. (1975). A theory of program size formally identical to information theory. Journal of the ACM, 22, 329–340.Google Scholar
  8. Cramer, N. L. (1985). A representation for the adaptive generation of simple sequential programs. In J. Grefenstette (Ed.), Proceedings of an International Conference on Genetic Algorithms and Their Applications, Carnegie-Mellon University, July 24–26, 1985. Hillsdale NJ: Lawrence Erlbaum Associates.Google Scholar
  9. Deville, Y., & Lau, K. K. (1994). Logic program synthesis. Journal of Logic Programming, 19:20, 321–350.Google Scholar
  10. Dickmanns, D., Schmidhuber, J., & Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.Google Scholar
  11. Dorigo, M., Di Caro, G., and Gambardella, L. M. (1999). Ant algorithms for discrete optimization. Artificial Life, 5:2, 137–172.Google Scholar
  12. Fredkin, E. F., & Toffoli, T. (1982). Conservative logic. International Journal of Theoretical Physics, 21:3/4, 219–253.Google Scholar
  13. Gambardella, L. M. & Dorigo, M. (2000). An ant colony system hybridized with a new local search for the sequential ordering problem. INFORMS Journal on Computing, 12:3, 237–255.Google Scholar
  14. Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38, 173–198.Google Scholar
  15. Green, C. C. (1969). Application of theorem proving to problem solving. In D. E. Walker & L. M. Norton (Eds.), Proceedings of the 1st International Joint Conference on Artificial Intelligence, IJCAI (pp. 219–240), Morgan Kaufmann.Google Scholar
  16. Hochreiter, S., Younger, A. S., & Conwell, P. R. (2001). Learning to learn using gradient descent. In Lecture Notes on Comp. Sci. 2130, Proc. Intl. Conf. on Artificial Neural Networks (ICANN-2001) (pp. 87–94). Berlin, Heidelberg: Springer.Google Scholar
  17. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press.Google Scholar
  18. Holland, J. H. (1985). Properties of the bucket brigade. In Proceedings of an International Conference on Genetic Algorithms. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  19. Hutter, M. (2001). Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions. In Proceedings of the 12th European Conference on Machine Learning (ECML-2001) (pp. 226–238). (On J. Schmidhuber's SNF grant 20-61847).Google Scholar
  20. Hutter, M. (2002a). The fastest and shortest algorithm for all well-defined problems. International Journal of Foundations of Computer Science, 13:3, 431–443. (On J. Schmidhuber's SNF grant 20-61847).Google Scholar
  21. Hutter, M. (2002b). Self-optimizing and pareto-optimal policies in general environments based on Bayes-mixtures. In J. Kivinen & R. H. Sloan (Eds.), Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002) (pp. 364–379). Sydney, Australia, Springer. (On J. Schmidhuber's SNF grant 20-61847).Google Scholar
  22. Jordan, M. I. & Rumelhart, D. E. (1990). Supervised learning with a distal teacher. Technical Report Occasional Paper #40, Center for Cog. Sci., Massachusetts Institute of Technology.Google Scholar
  23. computers/index.html.Google Scholar
  24. Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of AI research, 4:237–285.Google Scholar
  25. Koehler, J., Nebel, B., Hoffmann, J., & Dimopoulos, Y. (1997). Extending planning graphs to an adl subset. In S. Steel (Ed.), Proceedings of the 4th European Conference on Planning, Vol. 1348 of LNAI (pp. 273–285). Springer.Google Scholar
  26. Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1, 1–11.Google Scholar
  27. Kwee, I., Hutter, M., & Schmidhuber, J. (2001). Market-based reinforcement learning in partially observable worlds. In Proceedings of the International Conference on Artificial Neural Networks (ICANN-2001) (IDSIA-10-01, cs.AI/0105025).Google Scholar
  28. Langley, P. (1985). Learning to search: From weak methods to domain-specific heuristics. Cognitive Science, 9, 217–260.Google Scholar
  29. Lenat, D. (1983). Theory formation by heuristic search. Machine Learning, 21.Google Scholar
  30. Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9:3, 265–266.Google Scholar
  31. Levin, L. A. (1974). Laws of information (nongrowth) and aspects of the foundation of probability theory. Problems of Information Transmission, 10:3, 206–210.Google Scholar
  32. Levin, L. A. (1984). Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61, 15–37.Google Scholar
  33. Li, M., & Vitányi, P. M. B. (1997). An Introduction to Kolmogorov Complexity and its Applications (2nd edition). Springer.Google Scholar
  34. Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406, 1047–1054.Google Scholar
  35. Mitchell, T. (1997). Machine Learning. McGraw Hill.Google Scholar
  36. Moore, C. H., & Leach, G. C. (1970). Forth—A language for interactive computing. Scholar
  37. Newell, A. & Simon, H. (1963). GPS, a program that simulates human thought. In E. Feigenbaum & J. Feldman (Eds.), Computers and Thought (pp. 279–293). New York: McGraw-Hill.Google Scholar
  38. Nguyen, & Widrow, B. (1989). The truck backer-upper: An example of self learning in neural networks. In Proceedings of the International Joint Conference on Neural Networks (pp. 357–363). IEEE Press.Google Scholar
  39. Olsson, J. R. (1995). Inductive functional programming using incremental program transformation. Artificial Intelligence, 74:1, 55–83.Google Scholar
  40. Rechenberg, I. (1971). Evolutionsstrategie—Optimierung technischer systeme nach prinzipien der biologischen evolution. Dissertation. Published 1973 by Fromman-Holzboog.Google Scholar
  41. Rosenbloom, P. S., Laird, J. E., & Newell, A. (1993). The SOAR Papers. MIT Press.Google Scholar
  42. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel Distributed Processing (Vol. 1, pp. 318–362). MIT Press.Google Scholar
  43. Russell, S. & Norvig, P. (1994). Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  44. Salustowicz, R. P. & Schmidhuber, J. (1997), Probabilistic incremental program evolution. Evolutionary Computation, 5:2, 123–141.Google Scholar
  45. Salustowicz, R. P. & Schmidhuber, J. (1998). Evolving structured programs with hierarchical instructions and skip nodes. In J. Shavlik (Ed.), Machine Learning: Proceedings of the Fifeteenth International Conference (ICML'98) (pp. 488–496). San Francisco: Morgan Kaufmann Publishers.Google Scholar
  46. Salustowicz, R. P., Wiering, M. A., & Schmidhuber, J. (1998). Learning team strategies: soccer case studies. Machine Learning, 33:2/3, 263–282.Google Scholar
  47. Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. Diploma thesis, Institut für Informatik, Technische Universität München.Google Scholar
  48. Schmidhuber, J. (1991). Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, & D. S. Touretzky (Eds.), Advances in Neural Information Processing Systems 3 (pp. 500–506). Morgan Kaufmann.Google Scholar
  49. Schmidhuber, J. (1993a). An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks (pp. 191–195). Brighton: IEE.Google Scholar
  50. Schmidhuber, J. (1993b). A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks (pp. 446–451). Springer: Amsterdam.Google Scholar
  51. Schmidhuber, J. (1994). On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München. See (Schmidhuber, Zhao, & Wiering, 1997b; Schmidhuber, Zhao, & Schraudolph, 1997a).Google Scholar
  52. Schmidhuber, J. (1995), Discovering solutions with low Kolmogorov complexity and high generalization capability. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 488–496). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  53. Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10:5, 857–873.Google Scholar
  54. Schmidhuber, J. (2000). Algorithmic theories of everything. Technical Report IDSIA-20-00, quant-ph/0011122, IDSIA, Manno (Lugano), Switzerland. Sections 1–5: see (Schmidhuber, 2002b); Section 6: see (Schmidhuber, 2002d).Google Scholar
  55. Schmidhuber, J. (2001). Sequential decision making based on direct search. In R. Sun & C. L. Giles (Eds.), Sequence Learning: Paradigms, Algorithms, and Applications. Springer. Lecture Notes on AI 1828.Google Scholar
  56. Schmidhuber, J. (2002a). Exploring the predictable. In A. Ghosh & S. Tsuitsui (Eds.), Advances in Evolutionary Computing (pp. 579–612). Springer.Google Scholar
  57. Schmidhuber, J. (2002b) Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit. International Journal of Foundations of Computer Science, 13:4, 587–612.Google Scholar
  58. Schmidhuber, J. (2002c), Optimal ordered problem solver. Technical Report IDSIA-12-02, arXiv:cs.AI/0207097, IDSIA, Manno-Lugano, Switzerland.Google Scholar
  59. Schmidhuber, J. (2002d). The speed prior: A new simplicity measure yielding near-optimal computable predictions. In J. Kivinen & R. H. Sloan (Eds.), Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002) (pp. 216–228). Lecture Notes in Artificial Intelligence. Sydney, Australia: Springer.Google Scholar
  60. Schmidhuber, J. (2003a). Bias-optimal incremental problem solving. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15 (pp. 1571–1578). Cambridge, MA: MIT Press.Google Scholar
  61. Schmidhuber, J. (2003b). Gödel machines: Self-referential universal problem solvers making provably optimal self-improvements. Technical Report IDSIA-19-03, arXiv:cs.LO/0309048 v2, IDSIA, Manno-Lugano, Switzerland.Google Scholar
  62. Schmidhuber, J. (2003c), The new AI: General & sound & relevant for physics. Technical Report TR IDSIA-04-03, Version 1.0, cs.AI/0302012 v1.Google Scholar
  63. Schmidhuber, J. (2003d). The new AI: General & sound & relevant for physics. In B. Goertzel & C. Pennachin (Eds.), Real AI: New Approaches to Artificial General Intelligence. Plenum Press, New York. To appear. Also available as TR IDSIA-04-03, cs.AI/0302012.Google Scholar
  64. Schmidhuber, J. (2003e). Towards solving the grand problem of AI. In P. Quaresma, A. Dourado, E. Costa, & J. F. Costa (Eds.), Soft Computing and complex systems. Centro Internacional de Mathematica, Coimbra, Portugal, (pp. 77–97). Based on (Schmidhuber, 2003c).Google Scholar
  65. Schmidhuber, J., Zhao, J., & Schraudolph, N. (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (Eds.), Learning to learn (pp. 293–309). Kluwer.Google Scholar
  66. Schmidhuber, J., Zhao, J., & Wiering, M. (1996). Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA. See (Schmidhuber, Zhao, & Schraudolph, 1997a, 1997b).Google Scholar
  67. Schmidhuber, J., Zhao, J., & Wiering, M. (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28, 105–130.Google Scholar
  68. Schmidhuber, J., Zhumatiy, V., & Gagliolo, M. (2004). Bias-optimal incremental learning of control sequences for virtual robots. In Proc. 8th Conference on Intelligent Autonomous Systems IAS-8. Amsterdam, NL.Google Scholar
  69. Schwefel, H. P. (1974). Numerische optimierung von computer-modellen. Dissertation. Published 1977 by Birkhäuser, Basel.Google Scholar
  70. Solomonoff, R. (1964). A formal theory of inductive inference. Part I. Information and Control, 7, 1–22.Google Scholar
  71. Solomonoff, R. (1986). An application of algorithmic probability to problems in artificial intelligence. In L. N. Kanal & J. F. Lemmer (Eds.), Uncertainty in Artificial Intelligence (pp. 473–491). Elsevier Science Publishers.Google Scholar
  72. Solomonoff, R. (1989). A system for incremental learning based on algorithmic probability. In Proceedings of the Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition (pp. 515–527). Tel Aviv, Israel.Google Scholar
  73. Tsukamoto, M. (1977). Program stacking technique. Information Processing in Japan (Information Processing Society of Japan), 17:1, 114–120.Google Scholar
  74. Turing, A. M. (1936). On computable numbers, with an application to the entscheidungsproblem. In Proceedings of the London Mathematical Society, Series 2, 41, 230–267.Google Scholar
  75. Ulam, S. (1950). Random processes and transformations. In Proceedings of the International Congress on Mathematics (Vol. 2, pp. 264–275).Google Scholar
  76. Utgoff, P. (1986). Shift of bias for inductive concept learning. In R. Michalski, J. Carbonell, & T. Mitchell (Eds.), Machine Learning (pp. 163–190). Vol. 2. Morgan Kaufmann, Los Altos, CA.Google Scholar
  77. Vapnik, V. (1992). Principles of risk minimization for learning theory. In D. S. Lippman, J. E. Moody, & D. S. Touretzky (Eds.), Advances in Neural Information Processing Systems 4 (pp. 831–838). Morgan Kaufmann.Google Scholar
  78. von Neumann, J. (1966). Theory of Self-Reproducing Automata. Champain, IL: University of Illionois Press.Google Scholar
  79. Waldinger, R. J., & Lee, R. C. T. (1969). Prow: A step toward automatic program writing. In D. E. Walker & L. M. Norton (Eds.), Proceedings of the 1st International Joint Conference on Artificial Intelligence, IJCAI (pp. 241–252). Morgan Kaufmann.Google Scholar
  80. Werbos, P. J. (1974). Beyond Regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University.Google Scholar
  81. Werbos, P. J. (1987). Learning how the world works: Specifications for predictive networks in robots and brains. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY.Google Scholar
  82. Wiering, M., & Schmidhuber, J. (1996). Solving POMDPs with levin search and EIRA. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 534–542). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  83. Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for search. IEEE Transactions on Evolutionary Computation, 1.Google Scholar
  84. Zuse, K. (1969). Rechnender Raum. Braunschweig: Friedrich Vieweg & Sohn. English translation: Calculating Space, MIT Technical Translation AZT-70-164-GEMIT, Massachusetts Institute of Technology (Proj. MAC), Cambridge, MA 02139, Feb. 1970.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIAManno-LuganoSwitzerland

Personalised recommendations