Universal Algorithmic Intelligence: A Mathematical Top→Down Approach

  • Marcus Hutter
Part of the Cognitive Technologies book series (COGTECH)


Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is un-computable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t·2l. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.


Turing Machine Travel Salesman Problem Kolmogorov Complexity Strategic Game Future Reward 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angluin D, Smith CH (1983) Inductive inference: Theory and methods. ACM Computing Surveys, 15(3):237–269.CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bellman RE (1957) Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
  3. 3.
    Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.zbMATHGoogle Scholar
  4. 4.
    Chaitin GJ (1966) On the length of programs for computing finite binary sequences. Journal of the ACM, 13(4):547–5691.CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Chaitin GJ (1975) A theory of program size formally identical to information theory. Journal of the ACM, 22(3):329–340.CrossRefMathSciNetzbMATHGoogle Scholar
  6. 6.
    Chaitin GJ (1991) Algorithmic information and evolution. In Solbrig O, Nicolis G (eds) Perspectives on Biological Complexity, IUBS Press, Paris.Google Scholar
  7. 7.
    Cheeseman P (1985) In defense of probability. In Proc. 9th International Joint Conf. on Artificial Intelligence, Morgan Kaufmann, Los Altos, CA.Google Scholar
  8. 8.
    Cheeseman P (1988) An inquiry into computer understanding. Computational Intelligence, 4(1):58–66.Google Scholar
  9. 9.
    Conte M, Tautteur G, De Falco I, Della Cioppa A, Tarantino E (1997) Genetic programming estimates of Kolmogorov complexity. In Proc. 17th International Conf. on Genetic Algorithms, Morgan Kaufmann, San Francisco, CA.Google Scholar
  10. 10.
    Cox RT (1946) Probability, frequency, and reasonable expectation. American Journal of Physics, 14(1):1–13.CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Daley RP (1973) Minimal-program complexity of sequences with restricted resources. Information and Control, 23(4):301–312.CrossRefMathSciNetzbMATHGoogle Scholar
  12. 12.
    Daley RP (1977) On the inference of optimal descriptions. Theoretical Computer Science, 4(3):301–319.CrossRefMathSciNetzbMATHGoogle Scholar
  13. 13.
    Dawid AP (1984) Statistical theory. The prequential approach. Journal of the Royal Statistical Society, Series A 147:278–292.MathSciNetzbMATHGoogle Scholar
  14. 14.
    Feder M, Merhav N, Gutman M (1992) Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38:1258–1270.CrossRefMathSciNetzbMATHGoogle Scholar
  15. 15.
    Fitting MC (1996) First-Order Logic and Automated Theorem Proving. Graduate Texts in Computer Science. Springer, Berlin.zbMATHGoogle Scholar
  16. 16.
    Fudenberg D, Tirole J (1991) Game Theory. MIT Press, Cambridge, MA.Google Scholar
  17. 17.
    Gács P (1974) On the symmetry of algorithmic information. Soviet Mathematics Doklady, 15:1477–1480.zbMATHGoogle Scholar
  18. 18.
    Hopcroft J, Motwani R, Ullman JD (2001) Introduction to Automata Theory, Language, and Computation. Addison-Wesley.Google Scholar
  19. 19.
    Hutter M (2000) A theory of universal artificial intelligence based on algorithmic complexity. Technical Report cs.AI/0004001 Scholar
  20. 20.
    Hutter M (2001) Convergence and error bounds for universal prediction of nonbinary sequences. In Proc. 12th European Conf. on Machine Learning (ECML-2001), volume 2167 of LNAI, Springer, Berlin.Google Scholar
  21. 21.
    Hutter M (2001) General loss bounds for universal sequence prediction. In Proc. 18th International Conf. on Machine Learning (ICML-2001).Google Scholar
  22. 22.
    Hutter M (2001) New error bounds for Solomonoff prediction. Journal of Computer and System Sciences, 62(4):653–667.CrossRefMathSciNetzbMATHGoogle Scholar
  23. 23.
    Hutter M (2001) Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions. In Proc. 12th European Conf. on Machine Learning (ECML-2001), volume 2167 of LNAI, Springer, Berlin.Google Scholar
  24. 24.
    Hutter M (2001) Universal sequential decisions in unknown environments. In Proc. 5th European Workshop on Reinforcement Learning (EWRL-5).Google Scholar
  25. 25.
    Hutter M (2002) The fastest and shortest algorithm for all well-defined problems. International Journal of Foundations of Computer Science, 13(3):431–443.CrossRefMathSciNetzbMATHGoogle Scholar
  26. 26.
    Hutter M (2002) Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures. In Proc. 15th Annual Conf. on Computational Learning Theory (COLT 2002), volume 2375 of LNAI, Springer, Berlin.Google Scholar
  27. 27.
    Hutter M (2003) Convergence and loss bounds for Bayesian sequence prediction. IEEE Transactions on Information Theory, 49(8):2061–2067.CrossRefMathSciNetGoogle Scholar
  28. 28.
    Hutter M (2003) On the existence and convergence of computable universal priors. In Proc. 14th International Conf. on Algorithmic Learning Theory (ALT-2003), volume 2842 of LNAI, Springer, Berlin.Google Scholar
  29. 29.
    Hutter M (2003) Optimality of universal Bayesian prediction for general loss and alphabet. Journal of Machine Learning Research, 4:971–1000.CrossRefMathSciNetGoogle Scholar
  30. 30.
    Hutter M (2004) Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin. Scholar
  31. 31.
    Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4:237–285.Google Scholar
  32. 32.
    Ko K-I (1986) On the notion of infinite pseudorandom sequences. Theoretical Computer Science, 48(1):9–33.CrossRefMathSciNetzbMATHGoogle Scholar
  33. 33.
    Kolmogorov AN (1965) Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7.MathSciNetGoogle Scholar
  34. 34.
    Kumar PR, Varaiya PP (1986). Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cliffs, NJ.zbMATHGoogle Scholar
  35. 35.
    Kwee I, Hutter M, Schmidhuber J (2001) Gradient-based reinforcement planning in policy-search methods. In Proc. 5th European Workshop on Reinforcement Learning (EWRL-5).Google Scholar
  36. 36.
    Kwee I, Hutter M, Schmidhuber J (2001) Market-based reinforcement learning in partially observable worlds. In Proc. International Conf. on Artificial Neural Networks (ICANN-2001), volume 2130 of LNCS, Springer, Berlin.Google Scholar
  37. 37.
    Levin L (1973) Universal sequential search problems. Problems of Information Transmission, 9:265–266.Google Scholar
  38. 38.
    Levin L (1974) Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems of Information Transmission, 10(3):206–210.Google Scholar
  39. 39.
    Li M, Vitányi PMB (1991) Learning simple concepts under simple distributions. SIAM Journal on Computing, 20(5):911–935.CrossRefMathSciNetzbMATHGoogle Scholar
  40. 40.
    Li M, Vitányi PMB (1992) Inductive reasoning and Kolmogorov complexity. Journal of Computer and System Sciences, 44:343–384.CrossRefMathSciNetzbMATHGoogle Scholar
  41. 41.
    Li M, Vitányi PMB (1992). Philosophical issues in Kolmogorov complexity (invited lecture). In Proceedings on Automata, Languages and Programming (ICALP-92), Springer, Berlin.Google Scholar
  42. 42.
    Li M, Vitányi PMB (1997) An Introduction to Kolmogorov Complexity and its Applications. Springer, Berlin, 2nd edition.zbMATHGoogle Scholar
  43. 43.
    Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In 30th Annual Symposium on Foundations of Computer Science.Google Scholar
  44. 44.
    Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Information and Computation, 108(2):212–261.CrossRefMathSciNetzbMATHGoogle Scholar
  45. 45.
    Lucas, JR (1961) Minds, machines, and Gödel. Philosophy, 36:112–127.CrossRefGoogle Scholar
  46. 46.
    Michie D (1966) Game-playing and game-learning automata. In Fox, E (ed) Advances in Programming and Non-Numerical Computation, Pergamon, New York.Google Scholar
  47. 47.
    Von Neumann J, Morgenstern O (1944) Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ.zbMATHGoogle Scholar
  48. 48.
    Osborne MJ, Rubenstein A (1994) A Course in Game Theory. MIT Press, Cambridge, MA.Google Scholar
  49. 49.
    Penrose R (1989) The Emperor’s New Mind. Oxford University Press, Oxford.Google Scholar
  50. 50.
    Penrose R (1994) Shadows of the Mind, A Search for the Missing Science of Consciousness. Oxford University Press, Oxford.Google Scholar
  51. 51.
    Pintado X, Fuentes E (1997) A forecasting algorithm based on information theory. In Tsichritzis D (ed) Objects at Large, Technical Report, Université de Genève.Google Scholar
  52. 52.
    Rissanen JJ (1989) Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.zbMATHGoogle Scholar
  53. 53.
    Russell S, Norvig P (2003) Artificial Intelligence. A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ, 2nd edition.Google Scholar
  54. 54.
    Schmidhuber J (1997) Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5):857–873.CrossRefGoogle Scholar
  55. 55.
    Schmidhuber J (2002) The speed prior: A new simplicity measure yielding near-optimal computable predictions. In Proc. 15th Conf. on Computational Learning Theory (COLT-2002), volume 2375 of LNAI, Springer, Berlin.Google Scholar
  56. 56.
    Schmidhuber J (2003) Bias-optimal incremental problem solving. In Becker S, Thrun S, Obermayer K (eds) Advances in Neural Information Processing Systems 15, MIT Press, Cambridge, MA.Google Scholar
  57. 57.
    Schmidhuber J (2004) Optimal ordered problem solver. Machine Learning, 54(3):211–254, also this volume.CrossRefzbMATHGoogle Scholar
  58. 58.
    Schmidhuber J, Zhao J, Wiering MA (1997) Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28:105–130.CrossRefGoogle Scholar
  59. 59.
    Schmidt M (1999) Time-bounded Kolmogorov complexity may help in search for extra terrestrial intelligence (SETI). Bulletin of the European Association for Theoretical Computer Science, 67:176–180.zbMATHGoogle Scholar
  60. 60.
    Shoenfield JR (1967) Mathematical Logic. Addison-Wesley, Reading, MA.zbMATHGoogle Scholar
  61. 61.
    Solomonoff R (1964) A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254.CrossRefMathSciNetzbMATHGoogle Scholar
  62. 62.
    Solomonoff R (1978) Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transaction on Information Theory, IT-24:422–432.CrossRefMathSciNetGoogle Scholar
  63. 63.
    Solomonoff R (1986) The application of algorithmic probability to problems in artificial intelligence. In Kanal L, Lemmer J (eds) Uncertainty in Artificial Intelligence, Elsevier Science/North-Holland, Amsterdam.Google Scholar
  64. 64.
    Solomonoff R (1997) The discovery of algorithmic probability. Journal of Computer and System Sciences, 55(1):73–88.CrossRefMathSciNetzbMATHGoogle Scholar
  65. 65.
    Solomonoff R (1999) Two kinds of probabilistic induction. Computer Journal, 42(4):256–259.CrossRefzbMATHGoogle Scholar
  66. 66.
    Sutton R, Barto A (1998) Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.Google Scholar
  67. 67.
    Valiant LG (1984) A theory of the learnable. Communications of the ACM, 27(11):1134–1142.CrossRefzbMATHGoogle Scholar
  68. 68.
    Vovk VG (1992) Universal forecasting algorithms. Information and Computation, 96(2):245–277.CrossRefMathSciNetzbMATHGoogle Scholar
  69. 69.
    Vovk VG, Watkins C (1998) Universal portfolio selection. In Proc. 11th Conf. on Computational Learning Theory (COLT-98), ACM Press, New York.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Marcus Hutter
    • 1
    • 2
  1. 1.IDSIAManno-LuganoSwitzerland
  2. 2.RSISE/ANU/NICTACanberraAustralia

Personalised recommendations