Intelligence as Inference or Forcing Occam on the World

  • Peter Sunehag
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8598)


We propose to perform the optimization task of Universal Artificial Intelligence (UAI) through learning a reference machine on which good programs are short. Further, we also acknowledge that the choice of reference machine that the UAI objective is based on is arbitrary and, therefore, we learn a suitable machine for the environment we are in. This is based on viewing Occam’s razor as an imperative instead of as a proposition about the world. Since this principle cannot be true for all reference machines, we need to find a machine that makes the principle true. We both want good policies and the environment to have short implementations on the machine. Such a machine is learnt iteratively through a procedure that generalizes the principle underlying the Expectation-Maximization algorithm.


Reinforcement Learning Return Function Turing Machine Good Program Deep Belief Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BLPL07]
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS 2007. MIT Press (2007)Google Scholar
  2. [Bot12]
    Botvinick, M., Toussaint, M.: Planning as inference. Trends in Cognitive Sciences 16(10), 485–488 (2012)CrossRefGoogle Scholar
  3. [DH97]
    Dayan, P., Hinton, G.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)CrossRefzbMATHGoogle Scholar
  4. [DLR77]
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Stat. Soc.: B 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  5. [FSG10]
    Fremaux, N., Sprekeler, H., Gerstner, W.: Functional requirements for reward-modulated spike timing-dependent plasticity. Journal of Neuroscience 30(40), 13326–13337 (2010)CrossRefGoogle Scholar
  6. [HB04]
    Hawkins, J., Blakeslee, S.: On Intelligence. Times Books (2004)Google Scholar
  7. [Her70]
    Herrnstein, R.J.: On the law of effect. Journal of the Experimental Analysis of Behavior 13, 243–266 (1970)CrossRefGoogle Scholar
  8. [HOT06]
    Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  9. [Hut05]
    Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
  10. [Kah11]
    Kahneman, D.: Thinking, fast and slow (2011)Google Scholar
  11. [Len80]
    Lenat, D.: The plausible mutation of DNA. Technical report. Standford University (1980)Google Scholar
  12. [LH07]
    Legg, S., Hutter, M.: Universal Intelligence: A defintion of machine intelligence. Mind and Machine 17, 391–444 (2007)CrossRefGoogle Scholar
  13. [LPM07]
    Legenstein, R., Pecevski, D., Maass, W.: Theoretical analysis of learning with reward-modulated spike-timing-dependent plasticity. In: NIPS (2007)Google Scholar
  14. [LS06]
    Loewenstein, Y., Seung, S.: Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. PNAS 103(41), 15224–15229 (2006)CrossRefGoogle Scholar
  15. [OR12]
    Orseau, L., Ring, M.: Space-time embedded intelligence. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS (LNAI), vol. 7716, pp. 209–218. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. [Pel12]
    Pelikan, M.: Probabilistic model-building genetic algorithms. In: GECCO, pp. 777–804. ACM (2012)Google Scholar
  17. [RN10]
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice-Hall, Englewood Cliffs (2010)Google Scholar
  18. [Rus97]
    Russell, S.: Rationality and intelligence. Artificial Intelligence (1997)Google Scholar
  19. [Sch07]
    Schmidhuber, J.: Gödel machines: Fully self-referential optimal universal self-improvers. In: Artificial General Intelligence, pp. 199–226 (2007)Google Scholar
  20. [SH12a]
    Sunehag, P., Hutter, M.: Optimistic agents are asymptotically optimal. In: Proceedings of the 25th Australasian AI Conference, pp. 15–26 (2012)Google Scholar
  21. [SH12b]
    Sunehag, P., Hutter, M.: Optimistic AIXI. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS (LNAI), vol. 7716, pp. 312–321. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. [SL14]
    Shteingart, H., Loewenstein, Y.: Reinforcement learning and human behavior. Current Opinion in Neurobiology 25(0), 93–98 (2014)CrossRefGoogle Scholar
  23. [SZW97]
    Schmidhuber, J., Zhao, J., Wiering, M.: Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28, 105–130 (1997)CrossRefGoogle Scholar
  24. [WE03]
    West-Eberhard, M.J.: Developmental Plasticity and Evolution. Oxford University Press, USA (2003)Google Scholar
  25. [Web10]
    Webb, G.: Occam’s razor. In: Encl. of Machine Learning, Springer (2010)Google Scholar
  26. [WGRT11]
    Wingate, D., Goodman, N., Kaelbling, L., Roy, D., Tenenbaum, J.: Bayesian policy search with policy priors. IJCAI, 1565–1570 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Peter Sunehag
    • 1
  • Marcus Hutter
    • 1
  1. 1.Research School of Computer ScienceAustralian National UniversityCanberraAustralia

Personalised recommendations