Biological Cybernetics

, Volume 112, Issue 6, pp 547–573 | Cite as

Deep active inference

  • Kai UeltzhöfferEmail author
Original Article


This work combines the free energy principle and the ensuing active inference dynamics with recent advances in variational inference in deep generative models, and evolution strategies to introduce the “deep active inference” agent. This agent minimises a variational free energy bound on the average surprise of its sensations, which is motivated by a homeostatic argument. It does so by optimising the parameters of a generative latent variable model of its sensory inputs, together with a variational density approximating the posterior distribution over the latent variables, given its observations, and by acting on its environment to actively sample input that is likely under this generative model. The internal dynamics of the agent are implemented using deep and recurrent neural networks, as used in machine learning, making the deep active inference agent a scalable and very flexible class of active inference agent. Using the mountain car problem, we show how goal-directed behaviour can be implemented by defining appropriate priors on the latent states in the agent’s model. Furthermore, we show that the deep active inference agent can learn a generative model of the environment, which can be sampled from to understand the agent’s beliefs about the environment and its interaction therewith.


Variational inference Deep learning Generative models Cognition Action Perception 



The author would like to thank Karl Friston, Thorben Kröger, Manuel Baltieri and Annina Luck for insightful comments on earlier versions of this manuscript and the participants and organisers of the Computational Psychiatry Course 2016 for stimulating lectures and discussions. Furthermore, he would like to thank the three anonymous reviewers for very constructive and tremendously helpful comments, which significantly improved the quality of the paper.

Supplementary material

422_2018_785_MOESM1_ESM.pdf (543 kb)
Supplementary material 1 (pdf 542 KB)


  1. Adams RA, Stephan KE, Brown H, Frith CD, Friston KJ (2013) The computational anatomy of psychosis. Front Psychiatry 4:47CrossRefGoogle Scholar
  2. Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14(3):257–262CrossRefGoogle Scholar
  3. Baez JC, Pollard BS (2015) Relative entropy in biological systems. arXiv:1512.02742
  4. Baltieri M, Buckley CL (2017) An active inference implementation of phototaxis. arXiv:1707.01806
  5. Berkes P, Orbán G, Lengyel M, Fiser J (2011) Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science 331:83–87CrossRefGoogle Scholar
  6. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540
  7. Brown H, Friston KJ (2012) Free-energy and illusions: the cornsweet effect. Front Psychol 3:43PubMedPubMedCentralGoogle Scholar
  8. Campbell JO (2016) Universal Darwinism as a process of Bayesian inference. arXiv:1606.07937
  9. Caticha A (2004) Relative entropy and inductive inference. In: AIP conference proceedings, 707Google Scholar
  10. Chung J, Kastner K, Dinh L, Goel K, Courville A, Bengio Y (2015) A recurrent latent variable model for sequential data. arXiv:1506.02216
  11. Conant R, Ashby W (1970) Every good regulator of a system must be a model of that system. Int J Syst Sci 1(2):89–97CrossRefGoogle Scholar
  12. Crapse TB, Sommer MA (2008) Corollary discharge across the animal kingdom. Nat Rev Neurosci 9:587–600CrossRefGoogle Scholar
  13. Dosovitskiy A, Koltun V (2017) Learning to act by predicting the future. ICLRGoogle Scholar
  14. Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P (2010) Why does unsupervised pre-training help deep learning? JMLR 11:625–660Google Scholar
  15. Ernst M, Banks M (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415:429–433CrossRefGoogle Scholar
  16. Friston KJ (2005) A theory of cortical responses. Phil Trans R Soc B 360:815–836CrossRefGoogle Scholar
  17. Friston KJ (2008) Hierarchical models in the brain. PLoS Comput Biol 4(11):e1000211CrossRefGoogle Scholar
  18. Friston KJ (2010) The free-energy principle: a unified brain theory? Nat Rev Neurosci 11(2):127–138CrossRefGoogle Scholar
  19. Friston KJ (2012) A free energy principle for biological systems. Entropy 14:2100–2121CrossRefGoogle Scholar
  20. Friston KJ (2013) Life as we know it. J R Soc Interface 10:20130475CrossRefGoogle Scholar
  21. Friston KJ, Kiebel SJ (2009) Predictive coding under the free-energy principle. Philos Trans R Soc B 364:1211–1221CrossRefGoogle Scholar
  22. Friston KJ, Kilner J, Harrison L (2006) A free energy principle for the brain. J Physiol Paris 100:70–87CrossRefGoogle Scholar
  23. Friston KJ, Daunizeau J, Kilner J, Kiebel SJ (2010) Action and behavior: a free-energy formulation. Biol Cybern 192(3):227–260CrossRefGoogle Scholar
  24. Friston KJ, Mattout J, Kilner J (2011) Action understanding and active inference. Biol Cybern 104:137–160CrossRefGoogle Scholar
  25. Friston KJ, Rigoli F, Ognibene D, Mathys C, Fitzgerald T, Pezzulo G (2015) Active inference and epistemic value. Cogn Neurosci 6(4):187–214CrossRefGoogle Scholar
  26. Friston KJ, Frith CD, Pezzulo G, Hobson JA, Ondobaka S (2017a) Active inference, curiosity and insight. Neural Comput 29:1–51CrossRefGoogle Scholar
  27. Friston KJ, Rosch R, Parr T, Price C, Bowman H (2017b) Deep temporal models and active inference. Neurosci Biobehav Rev 77:388–402CrossRefGoogle Scholar
  28. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv:1406.2661
  29. Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge.
  30. Graves A, Wayne G, Reynolds M, Harley T, Danihelka I, Grabska-Barwinska A, Gómez Caolmenarejo S, Grefenstette E, Ramalho T, Agapiou J, Puigdomenèch Badia A, Hermann KM, Zwols Y, Ostrovski G, Cain A, King H, Summerfield C, Blunsum P, Kavukcuoglu K, Hassabis D (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538:471–476CrossRefGoogle Scholar
  31. Ha D, Schmidhuber J (2018) World models. arXiv:1803.10122
  32. Haefner R, Berkes P, Fiser J (2016) Perceptual decision-making as probabilistic inference by neural sampling. Neuron 90(3):649–660CrossRefGoogle Scholar
  33. Hansen N (2016) The CMA evolution strategy: a tutorial. arXiv:1604.00772
  34. Harper M (2009) The replicator equation as an inference dynamic. arXiv:0911.1763
  35. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507CrossRefGoogle Scholar
  36. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366CrossRefGoogle Scholar
  37. Huszár F (2017) Variational inference using implicit distributions. arXiv:1702.08235
  38. Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078
  39. Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of gans for improved quality, stability, and variation. ICLRGoogle Scholar
  40. Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) Vizdoom: a doom-based AI research platform for visual reinforcement learning. arXiv:1605.02097
  41. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
  42. Kingma DP, Welling M (2014) Auto-encoding variational bayes. ICLRGoogle Scholar
  43. Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M (2016) Improving variational inference with inverse autoregressive flow. arXiv:1606.04934
  44. Knill D, Pouget A (2004) The bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci 27(12):712–719CrossRefGoogle Scholar
  45. Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941
  46. LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  47. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (2016) Photo-realistic single image super-resolution using a generative adversarial network. arXiv:1609.04802
  48. Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. Neural Information Processing Systems (NIPS). arXiv:1703.00848
  49. Maaløe L, Sønderby CK, Sønderby SK, Winther O (2016) Auxiliary deep generative models. arXiv:1602.05473
  50. Maheswaranathan N, Metz L, Tucker G, Sohl-Dickenstein J (2018) Guided evolutionary strategies: escaping the curse of dimensionality in random search. arXiv:1806.10230
  51. Mescheder L, Nowozin S, Geiger A (2017) Adversarial variational Bayes: unifying variational autoencoders and generative adversarial networks. arXiv:1701.04722
  52. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529–533CrossRefGoogle Scholar
  53. Moore A (1991) Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces. In: Proceedings of the eight international conference on machine learning. Morgan KaufmannGoogle Scholar
  54. Moreno-Bote R, Knill D, Pouget A (2011) Bayesian sampling in visual perception. Proc Natl Acad Sci USA 108(30):12491–12496CrossRefGoogle Scholar
  55. Pathak D, Pulkit A, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. arXiv:1705.05363
  56. Platt JC, Barr AH (1988) Constrained differential optimization. In: Neural information processing systems. American Institute of Physics, New York, pp 612–621Google Scholar
  57. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. Technical report, OpenAIGoogle Scholar
  58. Ramstead MJD, Badcock PB, Friston KJ (2017) Answering schrödinger’s question: a free-energy formulation. Phys Life Rev 24:1–16CrossRefGoogle Scholar
  59. Rezende DJ, Mohamed S (2015) Variational inference with normalizing flows. JMLR 37Google Scholar
  60. Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. ICMLGoogle Scholar
  61. Rezende DJ, Ali Eslami SM, Mohamed S, Battaglia P, Jaderberg M, Heess N (2016) Unsupervised learning of 3d structure from images. arXiv:1607.00662
  62. Salimans T, Ho J, Chen X, Sutskever I (2017) Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864
  63. Schwartenbeck P, Fitzgerald T, Mathys C, Dolan R, Kronbichler M, Friston KJ (2015) Evidence for surprise minimization over value maximization in choice behavior. Sci Rep 5:16575CrossRefGoogle Scholar
  64. Siegelmann HT (1995) Computation beyond the turing limit. Science 268:545–548CrossRefGoogle Scholar
  65. Theano Development Team (2016) Theano: a Python framework for fast computation of mathematical expressions. arXiv:1605.02688
  66. Todorov E, Erez T, Tassa Y (2012) Mujoco: A physics engine for model-based control. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS)Google Scholar
  67. Tomczak JM, Welling M (2016) Improving variational auto-encoders using householder flow. arXiv:1611.09630
  68. Tran D, Ranganath R, Blei D (2017) Hierarchical implicit models and likelihood-free variational inference. arXiv:1702.08896
  69. Watson RA, Szathmáry E (2016) How can evolution learn? Trends Ecol Evol 31(2):147–157CrossRefGoogle Scholar
  70. Wong KF, Wang XJ (2006) A recurrent network mechanism of time integration in perceptual decisions. J Neurosci 26(4):1314–1328CrossRefGoogle Scholar
  71. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv:1703.10593

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.HeidelbergGermany

Personalised recommendations