Active inference and agency: optimal control without cost functions

Friston, Karl; Samothrakis, Spyridon; Montague, Read

doi:10.1007/s00422-012-0512-8

Active inference and agency: optimal control without cost functions

Prospects
Open access
Published: 03 August 2012

Volume 106, pages 523–541, (2012)
Cite this article

Download PDF

You have full access to this open access article

Biological Cybernetics Aims and scope Submit manuscript

Active inference and agency: optimal control without cost functions

Download PDF

Karl Friston¹,
Spyridon Samothrakis² &
Read Montague³

4123 Accesses
140 Citations
5 Altmetric
Explore all metrics

Abstract

This paper describes a variational free-energy formulation of (partially observable) Markov decision problems in decision making under uncertainty. We show that optimal control can be cast as active inference. In active inference, both action and posterior beliefs about hidden states minimise a free energy bound on the negative log-likelihood of observed states, under a generative model. In this setting, reward or cost functions are absorbed into prior beliefs about state transitions and terminal states. Effectively, this converts optimal control into a pure inference problem, enabling the application of standard Bayesian filtering techniques. We then consider optimal trajectories that rest on posterior beliefs about hidden states in the future. Crucially, this entails modelling control as a hidden state that endows the generative model with a representation of agency. This leads to a distinction between models with and without inference on hidden control states; namely, agency-free and agency-based models, respectively.

Article PDF

Generalised free energy and active inference

Article Open access 27 September 2019

Probabilistic Majorization of Partially Observable Markov Decision Processes

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ashby WR (1947) Principles of the self-organizing dynamic system. J Gen Psychol 37: 125–128
Article PubMed CAS Google Scholar
Axmacher N, Henseler MM, Jensen O, Weinreich I, Elger CE, Fell J (2010) Cross-frequency coupling supports multi-item working memory in the human hippocampus. Proc Natl Acad Sci 107(7): 3228–3233
Article PubMed CAS Google Scholar
Baxter J, Bartlett PL, Weaver L (2001) Experiments with Infinite- Horizon, Policy-Gradient Estimation. J Artif Intell Res 15: 351–381
Google Scholar
Beal MJ (2003) Variational algorithms for approximate bayesian inference’. PhD. Thesis, University College London, London
Bellman R (1952) On the theory of dynamic programming. Proc Natl Acad Sci USA 38: 716–719
Article PubMed CAS Google Scholar
Berridge KC (2004) Motivation concepts in behavioral neuroscience. Physiol Behav 81(2): 179–209
Article PubMed CAS Google Scholar
Birkhoff GD (1931) Proof of the ergodic theorem. Proc Natl Acad Sci USA 17: 656–660
Article PubMed CAS Google Scholar
Botvinick MM, An J (2008) Goal-directed decision making in prefrontal cortex: a computational framework. Adv Neural Inf Process Syst (NIPS) 21
Braun DA, Ortega P, Theodorou E, Schaal S (2011) Path integral control and bounded rationality. In: ADPRL 2011, Paris
Brown LD (1981) A complete class theorem for statistical problems with finite sample spaces. Ann Stat 9(6): 1289–1300
Article Google Scholar
Camerer CF (2003) Behavioural studies of strategic thinking in games. Trends Cogn Sci 7(5): 225–231
Article PubMed Google Scholar
Canolty RT, Edwards E, Dalal SS, Soltani M, Nagarajan SS, Kirsch HE, Berger MS, Barbaro NM, Knight R (2006) High gamma power is phase-locked to theta oscillations in human neocortex. Science 313(5793): 1626–1628
Article PubMed CAS Google Scholar
Cooper G (1988) A method for using belief networks as influence diagrams. In: Proceedings of the Conference on uncertainty in artificial intelligence
Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16(2): 199–204
Article PubMed CAS Google Scholar
Dayan P, Daw ND (2008) Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci 8(4): 429–453
Article PubMed Google Scholar
Dayan P, Hinton GE (1997) Using expectation maximization for reinforcement learning. Neural Comput 9: 271–278
Article Google Scholar
Dayan P, Hinton GE, Neal R (1995) The Helmholtz machine. Neural Comput 7: 889–904
Article PubMed CAS Google Scholar
Duff M, (2002) Optimal learning: computational procedure for bayes-adaptive markov decision processes. PhD thesis. University of Massachusetts, Amherst
Evans DJ (2003) A non-equilibrium free energy theorem for deterministic systems. Mol Phys 101:15551–15554
Article Google Scholar
Feldbaum AA (1961) Dual control theory, Part I. Autom Remote Control 21(9): 874–880
Google Scholar
Feldman H, Friston KJ (2010) Attention, uncertainty, and free-energy. Front Hum Neurosci 4: 215
Article PubMed Google Scholar
Feynman RP (1972) Statistical mechanics. Benjamin, Reading MA
Google Scholar
Filatov N, Unbehauen H (2004) Adaptive dual control: theory and applications (lecture notes in control and information sciences. Springer, Berlin
Google Scholar
Fox C, Roberts S (2011) A tutorial on variational Bayes. In: Artificial intelligence review. Spinger, Berlin
Friston K (2008) Hierarchical models in the brain. PLoS Comput Biol 4(11): e1000211
Article PubMed Google Scholar
Friston K (2010) The free-energy principle: a unified brain theory?. Nat Rev Neurosci 11(2): 127–138
Article PubMed CAS Google Scholar
Friston K (2011) What is optimal about motor control?. Neuron 72(3): 488–498
Article PubMed CAS Google Scholar
Friston K, Ao P (2012) Free-energy, value and attractors. In: Computational and mathematical methods in medicine, vol 2012
Friston K, Kiebel S (2009) Cortical circuits for perceptual inference. Neural Netw 22(8): 1093–1104
Article PubMed Google Scholar
Friston K, Kiebel S (2009) Predictive coding under the free-energy principle. Philos Trans R Soc Lond B Biol Sci 364(1521): 1211–1221
Article PubMed Google Scholar
Friston KJ, Daunizeau J, Kiebel SJ (2009) Active inference or reinforcement learning?. PLoS One 4(7): e6421
Article PubMed Google Scholar
Friston KJ, Daunizeau J, Kilner J, Kiebel SJ (2010) Action and behavior: a free-energy formulation. Biol Cybern 102(3): 227–260
Article PubMed Google Scholar
Friston KJST, Fitzgerald T, Galea JM, Adams R, Brown H, Dolan RJ, Moran R, Stephan KE, Bestmann S (2012) Dopamine, affordance and active inference. PLoS Comput Biol 8(1): e1002327
Article PubMed CAS Google Scholar
Friston K, Kilner J, Harrison L (2006) A free energy principle for the brain. J Physiol Paris 100(1–3): 70–87
Article PubMed Google Scholar
Friston K, Mattout J, Kilner J (2011) Action understanding and active inference. Biol Cybern 104: 137–160
Article PubMed Google Scholar
Friston KJ, Tononi G, Reeke GNJ, Sporns O, Edelman GM (1994) Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59(2):229–243
Article PubMed CAS Google Scholar
Gigerenzer G, Gaissmaier W (2011) Heuristic decision making. Annu Rev Psychol 62: 451–482
Article PubMed Google Scholar
Gläscher J, Daw N, Dayan P, O’Doherty JP (2010) States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66(4): 585–595
Article PubMed Google Scholar
Gomez F, Miikkulainen R (2001) Learning robust nonlinear control with neuroevolution. Technical Report AI01-292, Department of Computer Sciences, The University of Texas at Austin
Gomez F, Schmidhuber J, Miikkulainen R (2009) Accelerated neural evolution through cooperatively coevolved synapses. J Mach Learn Res 9: 937–965
Google Scholar
Helmholtz H (1866/1962), Concerning the perceptions in general. In: Treatise on physiological optics, 3rd edn. Dover, New York
Hinton GE, van Camp D (1993) Keeping neural networks simple by minimizing the description length of weights. In: Proceedings of COLT-93,pp 5–13
Hoffman, M, de Freitas, N, Doucet, A, Peters J (2009) An expectation maximization algorithm for continuous markov decision processes with arbitrary rewards. In: Twelfth Int. Conf. on artificial intelligence and statistics (AISTATS 2009)
Howard RA (1960) Dynamic programming and Markov processes. MIT Press Cambridge, MA
Google Scholar
Jaeger H (2000) Observable operator models for discrete stochastic time series. Neural Comput 12: 1371–1398
Article PubMed CAS Google Scholar
Jensen F, Jensen V, Dittmer SL (1994) From influence diagrams to junction trees. In: Proc. of the Tenth Conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Fransisco
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101 (1–2): 99–134
Article Google Scholar
Kappen HJ (2005) Linear theory for control of nonlinear stochastic systems. Phys Rev Lett 95(20): 200201
Article PubMed Google Scholar
Kappen HJ (2005) Path integrals and symmetry breaking for optimal control theory. J Stat Mech: Theory Exp 11: P11011
Article Google Scholar
Kappen HJ, Gomez Y, Opper M (2009) Optimal control as a graphical model inference problem. arXiv:0901.0633v2
Kiebel SJ, Daunizeau J, Friston KJ (2009a) Perception and hierarchical dynamics. Front Neuroinf 3: 20
Google Scholar
Kiebel SJ, von Kriegstein K, Daunizeau J, Friston KJ (2009b) Recognizing sequences of sequences. PLoS Comput Biol 5(8):e1000464
Google Scholar
Kishida KT, King-Casas B, Montague PR (2010) Neuroeconomic approaches to mental disorders. Neuron 67(4): 543–554
Article PubMed CAS Google Scholar
Littman ML, Majercik SM, Pitassi T (2001) Stochastic boolean satisfiability. J Autom Reason 27(3): 251–296
Article Google Scholar
Littman ML, Sutton RS, Singh S (2002) Predictive Representations of State. Adv Neural Inf Process Syst 14
MacKay DJ (1995) Free-energy minimisation algorithm for decoding and cryptoanalysis. Electron Lett 31: 445–447
Article Google Scholar
Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foraging in uncertain environments using predictive Hebbian learning. Nature 377(6551): 725–728
Article PubMed CAS Google Scholar
Moutoussis M, Bentall RP, El-Deredy W, Dayan P (2011) Bayesian modelling of Jumping-to-conclusions bias in delusional patients. Cogn Neuropsychiatry 7: 1–26
Google Scholar
Namikawa J, Nishimoto R, Tani J (2011) A neurodynamic account of spontaneous behaviour. PLoS Comput Biol. 7(10): e1002221
Article PubMed CAS Google Scholar
Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental sparse and other variants. In: Jordan M (ed) Learning in graphical models. Kluwer Academic, Dordrecht
Google Scholar
Oliehoek F, Spaan MTJ, Vlassis N (2005) Best-response play in partially observable card games. In: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Fransisco
Google Scholar
Rao RP (2010) Decision making under uncertainty: a neural model based on partially observable markov decision processes. Front Comput Neurosci 4: 146
Article PubMed Google Scholar
Rao RP, Ballard DH (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2(1): 79–87
Article PubMed CAS Google Scholar
Rawlik K, Toussaint M, Vijayakumar S (2010) Approximate inference and stochastic optimal control. arXiv:1009.3958
Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black A, Prokasy W (eds) Classical conditioning II: current research and theory. Appleton Century Crofts, New York
Google Scholar
Robert C (1992) L’analyse statistique Bayesienne. In: Economica. Paris, France
Shachter RD (1988) Probabilistic inference and influence diagrams. Operat Res 36: 589–605
Article Google Scholar
Silver D, Veness J (2010) Monte-Carlo planning in large POMDPs. In: Proceedings of the Conference on neural information processing systems
Sutton RS, Barto AG (1981) Toward a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88(2): 135–170
Article PubMed CAS Google Scholar
Tani J (2003) Learning to generate articulated behavior through the bottom-up and the top-down interaction processes. Neural Netw 16(1): 11–23
Article PubMed Google Scholar
Theodorou E, Buchli J, Schaal S (2010) A generalized path integral control approach to reinforcement learning. J Mach Learn Res 11: 3137–3181
Google Scholar
Todorov E (2006) Linearly-solvable Markov decision problems. In: Advances in neural information processing systems. MIT Press, Boston
Todorov E (2008) General duality between optimal control and estimation. In: IEEE Conference on decision and control
Toussaint M, Charlin L, Poupart P (2008) Hierarchical POMDP controller optimization by likelihood maximization. In: Uncertainty in artificial intelligence (UAI 2008), AUAI Press, Menlo Park
Toussaint M, Storkey A (2006) Probabilistic inference for solving discrete and continuous state Markov decision processes. In: Proceedings of the 23nd International Conference on machine learning
van den Broek B, Wiegerinck W, Kappen B (2008) Graphical model inference in optimal control of stochastic multi-agent systems. J Artif Int Res 32(1): 95–122
Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8: 279–292
Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8: 229–256
Google Scholar
Zhang NL (1998) Probabilistic inference in influence diagrams. Comput Intell 14(4): 475–497
Article Google Scholar

Download references

Acknowledgments

We would like to thank Peter Dayan for invaluable comments on this work and also acknowledge the very helpful comments and guidance from anonymous reviewers of this work.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Author information

Authors and Affiliations

The Wellcome Trust Centre for Neuroimaging, UCL, Institute of Neurology, 12 Queen Square, London, WC1N 3BG, UK
Karl Friston
School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK
Spyridon Samothrakis
Department of Physics, Virginia Tech Carilion Research Institute, Virginia Tech, 2 Riverside Circle, Roanoke, VA, 24016, USA
Read Montague

Authors

Karl Friston
View author publications
You can also search for this author in PubMed Google Scholar
Spyridon Samothrakis
View author publications
You can also search for this author in PubMed Google Scholar
Read Montague
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karl Friston.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Friston, K., Samothrakis, S. & Montague, R. Active inference and agency: optimal control without cost functions. Biol Cybern 106, 523–541 (2012). https://doi.org/10.1007/s00422-012-0512-8

Download citation

Received: 01 February 2012
Accepted: 16 July 2012
Published: 03 August 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00422-012-0512-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Active inference and agency: optimal control without cost functions

Abstract

Article PDF

Similar content being viewed by others

Generalised free energy and active inference

Probabilistic Majorization of Partially Observable Markov Decision Processes

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

References

Acknowledgments

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Active inference and agency: optimal control without cost functions

Abstract

Article PDF

Similar content being viewed by others

Generalised free energy and active inference

Probabilistic Majorization of Partially Observable Markov Decision Processes

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

References

Acknowledgments

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation