Abstract
We propose to perform the optimization task of Universal Artificial Intelligence (UAI) through learning a reference machine on which good programs are short. Further, we also acknowledge that the choice of reference machine that the UAI objective is based on is arbitrary and, therefore, we learn a suitable machine for the environment we are in. This is based on viewing Occam’s razor as an imperative instead of as a proposition about the world. Since this principle cannot be true for all reference machines, we need to find a machine that makes the principle true. We both want good policies and the environment to have short implementations on the machine. Such a machine is learnt iteratively through a procedure that generalizes the principle underlying the Expectation-Maximization algorithm.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS 2007. MIT Press (2007)
Botvinick, M., Toussaint, M.: Planning as inference. Trends in Cognitive Sciences 16(10), 485–488 (2012)
Dayan, P., Hinton, G.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Stat. Soc.: B 39, 1–38 (1977)
Fremaux, N., Sprekeler, H., Gerstner, W.: Functional requirements for reward-modulated spike timing-dependent plasticity. Journal of Neuroscience 30(40), 13326–13337 (2010)
Hawkins, J., Blakeslee, S.: On Intelligence. Times Books (2004)
Herrnstein, R.J.: On the law of effect. Journal of the Experimental Analysis of Behavior 13, 243–266 (1970)
Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)
Kahneman, D.: Thinking, fast and slow (2011)
Lenat, D.: The plausible mutation of DNA. Technical report. Standford University (1980)
Legg, S., Hutter, M.: Universal Intelligence: A defintion of machine intelligence. Mind and Machine 17, 391–444 (2007)
Legenstein, R., Pecevski, D., Maass, W.: Theoretical analysis of learning with reward-modulated spike-timing-dependent plasticity. In: NIPS (2007)
Loewenstein, Y., Seung, S.: Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. PNAS 103(41), 15224–15229 (2006)
Orseau, L., Ring, M.: Space-time embedded intelligence. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS (LNAI), vol. 7716, pp. 209–218. Springer, Heidelberg (2012)
Pelikan, M.: Probabilistic model-building genetic algorithms. In: GECCO, pp. 777–804. ACM (2012)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice-Hall, Englewood Cliffs (2010)
Russell, S.: Rationality and intelligence. Artificial Intelligence (1997)
Schmidhuber, J.: Gödel machines: Fully self-referential optimal universal self-improvers. In: Artificial General Intelligence, pp. 199–226 (2007)
Sunehag, P., Hutter, M.: Optimistic agents are asymptotically optimal. In: Proceedings of the 25th Australasian AI Conference, pp. 15–26 (2012)
Sunehag, P., Hutter, M.: Optimistic AIXI. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS (LNAI), vol. 7716, pp. 312–321. Springer, Heidelberg (2012)
Shteingart, H., Loewenstein, Y.: Reinforcement learning and human behavior. Current Opinion in Neurobiology 25(0), 93–98 (2014)
Schmidhuber, J., Zhao, J., Wiering, M.: Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28, 105–130 (1997)
West-Eberhard, M.J.: Developmental Plasticity and Evolution. Oxford University Press, USA (2003)
Webb, G.: Occam’s razor. In: Encl. of Machine Learning, Springer (2010)
Wingate, D., Goodman, N., Kaelbling, L., Roy, D., Tenenbaum, J.: Bayesian policy search with policy priors. IJCAI, 1565–1570 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sunehag, P., Hutter, M. (2014). Intelligence as Inference or Forcing Occam on the World. In: Goertzel, B., Orseau, L., Snaider, J. (eds) Artificial General Intelligence. AGI 2014. Lecture Notes in Computer Science(), vol 8598. Springer, Cham. https://doi.org/10.1007/978-3-319-09274-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-09274-4_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09273-7
Online ISBN: 978-3-319-09274-4
eBook Packages: Computer ScienceComputer Science (R0)