Abstract
Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or discounted-surprise problems, despite deriving from the infinite-horizon, average-surprise imperative of the free-energy principle. Here we derive an infinite-horizon, average-surprise formulation of active inference from optimal control principles. Our formulation returns to the roots of active inference in neuroanatomy and neurophysiology, formally reconnecting active inference to optimal feedback control. Our formulation provides a unified objective functional for sensorimotor control and allows for reference states to vary over time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, R.A., Shipp, S., Friston, K.J.: Predictions not commands: active inference in the motor system. Brain Struct. Funct. 218(3), 611–643 (2013). https://doi.org/10.1007/S00429-012-0475-5
Alexander, W.H., Brown, J.W.: Hyperbolically discounted temporal difference learning. Neural Comput. 22(6), 1511–1527 (2010). https://doi.org/10.1162/neco.2010.08-09-1080
Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nature Rev. Neurosci. 16(7), 419–429 (2015). https://doi.org/10.1038/nrn3950. https://www.nature.com/articles/nrn3950
Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J.: Canonical microcircuits for predictive coding. Neuron 76(4), 695–711 (2012)
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15, 319–350 (2001). https://doi.org/10.1613/jair.806
Bogacz, R.: A tutorial on the free-energy framework for modelling perception and learning. J. Math. Psychol. 76, 198–211 (2017)
Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: IJCAI International Joint Conference on Artificial Intelligence, vol. 19, pp. 6065–6073 (2019). https://doi.org/10.24963/ijcai.2019/840
Carpenter, R.: Homeostasis: a plea for a unified approach. Adv. Physiol. Educ. 28(4), 180–187 (2004)
Cisek, P., Kalaska, J.F.: Neural mechanisms for interacting with a world full of action choices. Annu. Rev. Neurosci. 33, 269–298 (2010). https://doi.org/10.1146/annurev.neuro.051508.135409
Corcoran, A.W., Hohwy, J.: Allostasis, interoception, and the free energy principle: feeling our way forward. In: The Interoceptive Mind: From homeostasis to awareness, pp. 272–292. Oxford University Press (2019)
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)
Daw, N.D., Touretzky, D.S.: Behavioral considerations suggest an average reward td model of the dopamine system. Neurocomputing 32, 679–684 (2000)
Faisal, A.A., Selen, L.P., Wolpert, D.M.: Noise in the nervous system. Nat. Rev. Neurosci. 9(4), 292–303 (2008)
Feldman, A.G.: Once more on the equilibrium-point hypothesis (\(\lambda \) model) for motor control. J. Mot. Behav. 18(1), 17–54 (1986). https://doi.org/10.1080/00222895.1986.10735369
Feldman, Anatol G..: Referent Control of Action and Perception. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2736-4
Friston, K.: The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11(2), 127–138 (2010)
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)
Friston, K., Samothrakis, S., Montague, R.: Active inference and agency: optimal control without cost functions. Biol. Cybern. 106(8–9), 523–541 (2012). https://doi.org/10.1007/s00422-012-0512-8
Friston, K., Stephan, K., Li, B., Daunizeau, J.: Generalised filtering. Math. Prob. Eng. 2010, 1–35 (2010)
Friston, K.J., Daunizeau, J., Kiebel, S.J.: Reinforcement learning or active inference? PLoS ONE 4(7), e6421 (2009)
Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J.: Action and behavior: a free-energy formulation. Biol. Cybern. 102(3), 227–260 (2010). https://doi.org/10.1007/s00422-010-0364-z
Friston, K.J., Rosch, R., Parr, T., Price, C., Bowman, H.: Deep temporal models and active inference. Neurosci. Biobehav. Rev. 77(April), 388–402 (2017). https://doi.org/10.1016/j.neubiorev.2017.04.009. citation Key: Friston 2017
Gallivan, J.P., Chapman, C.S., Wolpert, D.M., Flanagan, J.R.: Decision-making in sensorimotor control. Nat. Rev. Neurosci. 19(9), 519–534 (2018)
Howard, M.W.: Formal models of memory based on temporally-varying representations. In: The New Handbook of Mathematical Psychology, vol. 3. Cambridge University Press (2022)
Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: 35th International Conference on Machine Learning, ICML 2018, vol. 5, pp. 3347–3358 (2018)
Jahromi, M.J., Jain, R., Nayyar, A.: Online learning for unknown partially observable mdps. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS). Proceedings of Machine Learning Research, Valencia, Spain, vol. 151, p. 21 (2022)
Kiebel, S.J., Daunizeau, J., Friston, K.J.: A hierarchy of time-scales and the brain. PLOS Comput. Bio. 4(11), 1–12 (2008). https://doi.org/10.1371/journal.pcbi.1000209
Lanillos, P., et al.: Active inference in robotics and artificial agents: survey and challenges. (arXiv:2112.01871), https://arxiv.org/abs/2112.01871 [cs] (2021)
Latash, M.L.: Motor synergies and the equilibrium-point hypothesis. Mot. Control 14(3), 294–322 (2010). https://doi.org/10.1123/mcj.14.3.294
Latash, M.L.: Physics of biological action and perception. Academic Press (2019). https://doi.org/10.1016/C2018-0-04663-0
Livneh, Y., et al.: Estimation of current and future physiological states in insular cortex. Neuron 105(6), 1094-1111.e10 (2020). https://doi.org/10.1016/j.neuron.2019.12.027
Manohar, S.G., et al.: Reward pays the cost of noise reduction in motor and cognitive control. Curr. Biol. 25(13), 1707–1716 (2015)
Merel, J., Botvinick, M., Wayne, G.: Hierarchical motor control in mammals and machines. Nat. Commun. 10(1), 1–12 (2019). https://doi.org/10.1038/s41467-019-13239-6
Mitchell, B.A., et al.: A minimum free energy model of motor learning. Neural Comput. 31(10), 1945–1963 (2019)
Morville, T., Friston, K., Burdakov, D., Siebner, H.R., Hulme, O.J.: The homeostatic logic of reward. bioRxiv, p. 242974 (2018)
Mrosovsky, N.: Rheostasis: The Physiology of Change. Oxford University Press, Oxford (1990)
Nasiriany, S., Lin, S., Levine, S.: Planning with goal-conditioned policies. In: Advances in Neural Information Processing Systems. No. NeurIPS (2019)
Nasiriany, S., Pong, V.H., Nair, A., Khazatsky, A., Berseth, G., Levine, S.: DisCo RL: distribution-conditioned reinforcement learning for general-purpose policies. In: IEEE International Conference on Robotics and Automation (2021). https://arxiv.org/abs/2104.11707
Pan, Y., Theodorou, E.A.: Nonparametric infinite horizon Kullback-Leibler stochastic control. In: IEEE SSCI 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings, vol. 2(2) (2014). https://doi.org/10.1109/ADPRL.2014.7010616
Pezzulo, G., Cisek, P.: Navigating the affordance landscape: feedback control as a process model of behavior and cognition. Trends Cogn. Sci. 20(6), 414–424 (2016). https://doi.org/10.1016/j.tics.2016.03.013
Pezzulo, G., Rigoli, F., Friston, K.: Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015)
Pezzulo, G., Rigoli, F., Friston, K.J.: Hierarchical active inference: a theory of motivated control. Trends Cogn. Sci. 22(4), 294–306 (2018). https://doi.org/10.1016/j.tics.2018.01.009
Piray, P., Daw, N.D.: Linear reinforcement learning in planning, grid fields, and cognitive control. Nat. Commun. 12(1), 1–20 (2021)
Quigley, K.S., Kanoski, S., Grill, W.M., Barrett, L.F., Tsakiris, M.: Functions of interoception: from energy regulation to experience of the self. Trends in Neurosci. 44(1), 29–38 (2021). https://doi.org/10.1016/j.tins.2020.09.008
Ramstead, M.J., et al.: On Bayesian mechanics: a physics of and by beliefs. arXiv preprint arXiv:2205.11543 (2022)
Ringstrom, T.J., Hasanbeig, M., Abate, A.: Jump operator planning: Goal-conditioned policy ensembles and zero-shot transfer. arXiv preprint arXiv:2007.02527 (2020)
Scholz, J.P., Schöner, G.: The uncontrolled manifold concept: identifying control variables for a functional task. Exp. Brain Res. 126(3), 289–306 (1999)
Schulkin, J., Sterling, P.: Allostasis: a brain-centered, predictive mode of physiological regulation. Trends Neurosci. 42(10), 740–752 (2019)
Sennesh, E., Theriault, J., Brooks, D., van de Meent, J.W., Barrett, L.F., Quigley, K.S.: Interoception as modeling, allostasis as control. Biol. Psychol. 167, 108242 (2021)
Shadmehr, R., Ahmed, A.A.: Vigor: Neuroeconomics of movement control. MIT Press, Cambridge (2020)
Shankar, K.H., Howard, M.W.: A scale-invariant internal representation of time. Neural Comput. 24(1), 134–193 (2012)
Smith, R., Ramstead, M.J., Kiefer, A.: Active inference models do not contradict folk psychology. Synthese 200(2), 81 (2022). https://doi.org/10.1007/s11229-022-03480-w
Stephens, D.W., Krebs, J.R.: Foraging Theory. Princeton University Press, Princeton (2019)
Sterling, P.: Allostasis: a model of predictive regulation. Physiol. Behav. 106(1), 5–15 (2012)
Tadepalli, P., Ok, D.K.: Model-based average reward reinforcement learning. Artif. Intell. 100(1–2), 177–224 (1998). https://doi.org/10.1016/s0004-3702(98)00002-2
Tang, Y., Kucukelbir, A.: Hindsight expectation maximization for goal-conditioned reinforcement learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 130 (2021). https://arxiv.org/abs/2006.07549
Theodorou, E.: Relative entropy and free energy dualities: connections to path integral and kl control. In: 2012 IEEE 51st IEEE Conference, pp. 1466–1473 (2012)
Thijssen, S., Kappen, H.J.: Path integral control and state-dependent feedback. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 91(3), 1–7 (2015). https://doi.org/10.1103/PhysRevE.91.032104
Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. U.S.A. 106(28), 11478–11483 (2009). https://doi.org/10.1073/pnas.0710743106
Tschantz, A., Barca, L., Maisto, D., Buckley, C.L., Seth, A.K., Pezzulo, G.: Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biol. Psychol. 169, 108266 (2022). https://doi.org/10.1016/j.biopsycho.2022.108266, https://www.sciencedirect.com/science/article/pii/S0301051122000084
Zhang, Y., Ross, K.W.: On-policy deep reinforcement learning for the average-reward criterion. In: Proceedings of the 38th International Conference on Machine Learning, p. 11 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Detailed Derivations
A Detailed Derivations
This appendix provides detailed derivations for equations used elsewhere, particularly where doing so would have distracted from the flow of the paper.
Proposition 1 (Variational free energy as divergence from an unnormalized joint distribution)
The variational free energy (Eq. 9) is defined as the Kullback-Leibler divergence of the recognition model \(q_\phi \) from the unnormalized joint distribution of the generative model \(p_\theta \)
and therefore equals a sum of the cross entropy between the recognition model and the sensory likelihood and the exclusive KL divergence from the recognition model to the generative model over the latent variables
Proof
Taking a divergence between the (normalized) recognition model and the (unnormalized) joint generative model will yield
as required.
Proposition 2 (KL divergence of the optimal feedback controller from the feedforward controller)
The exclusive Kullback-Leibler divergence of the optimal feedback controller \(q^{*}\) from the feedforward generative model \(p_\theta \) is
Proof
We begin by writing out the definition of a KL divergence
The definition of \(q^{*}\) in terms of \(p_\theta \) (Eq. 21) allows the inner ratio of densities to simplify to
This simplified ratio therefore has the logarithm
and the divergence becomes
Proposition 3 (Path-integral expression for the optimal differential surprise-to-go)
The optimal differential surprise-to-go function defined by the Bellman equation (Eq. 20)
can be simplified by substituting in \(q^{*}\) to obtain a path-integral expression
Proof
Substituting Eq. 21 into Eq. 20 yields
whose recursive term is \(\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] \). The divergence term in \(\mathcal {J}\) (Eq. 14) will cancel this term. By Proposition 2 the divergence equals
Substituting Eq. 25 into Eq. 14 will yield
whose first term will cancel the recursive optimization when substituted into Eq. 26. The result will be a “smoothly minimizing” expression for the optimal differential surprise-to-go
and after unfolding of the recursive expectation, a path-integral expression for the optimal differential surprise-to-go
Sampling a trajectory of states from a feedback controller \(q_\phi \) instead of the feedforward planner \(p_\theta \) will then result in a nonzero divergence term
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sennesh, E., Theriault, J., van de Meent, JW., Barrett, L.F., Quigley, K. (2023). Deriving Time-Averaged Active Inference from Control Principles. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-28719-0_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28718-3
Online ISBN: 978-3-031-28719-0
eBook Packages: Computer ScienceComputer Science (R0)