Deriving Time-Averaged Active Inference from Control Principles

Sennesh, Eli; Theriault, Jordan; van de Meent, Jan-Willem; Barrett, Lisa Feldman; Quigley, Karen

doi:10.1007/978-3-031-28719-0_25

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1721))

Included in the following conference series:

International Workshop on Active Inference

428 Accesses
1 Citations
3 Altmetric

Abstract

Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or discounted-surprise problems, despite deriving from the infinite-horizon, average-surprise imperative of the free-energy principle. Here we derive an infinite-horizon, average-surprise formulation of active inference from optimal control principles. Our formulation returns to the roots of active inference in neuroanatomy and neurophysiology, formally reconnecting active inference to optimal feedback control. Our formulation provides a unified objective functional for sensorimotor control and allows for reference states to vary over time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, R.A., Shipp, S., Friston, K.J.: Predictions not commands: active inference in the motor system. Brain Struct. Funct. 218(3), 611–643 (2013). https://doi.org/10.1007/S00429-012-0475-5
Article Google Scholar
Alexander, W.H., Brown, J.W.: Hyperbolically discounted temporal difference learning. Neural Comput. 22(6), 1511–1527 (2010). https://doi.org/10.1162/neco.2010.08-09-1080
Article MATH Google Scholar
Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nature Rev. Neurosci. 16(7), 419–429 (2015). https://doi.org/10.1038/nrn3950. https://www.nature.com/articles/nrn3950
Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J.: Canonical microcircuits for predictive coding. Neuron 76(4), 695–711 (2012)
Article Google Scholar
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15, 319–350 (2001). https://doi.org/10.1613/jair.806
Article MathSciNet MATH Google Scholar
Bogacz, R.: A tutorial on the free-energy framework for modelling perception and learning. J. Math. Psychol. 76, 198–211 (2017)
Article MathSciNet MATH Google Scholar
Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: IJCAI International Joint Conference on Artificial Intelligence, vol. 19, pp. 6065–6073 (2019). https://doi.org/10.24963/ijcai.2019/840
Carpenter, R.: Homeostasis: a plea for a unified approach. Adv. Physiol. Educ. 28(4), 180–187 (2004)
Article Google Scholar
Cisek, P., Kalaska, J.F.: Neural mechanisms for interacting with a world full of action choices. Annu. Rev. Neurosci. 33, 269–298 (2010). https://doi.org/10.1146/annurev.neuro.051508.135409
Article Google Scholar
Corcoran, A.W., Hohwy, J.: Allostasis, interoception, and the free energy principle: feeling our way forward. In: The Interoceptive Mind: From homeostasis to awareness, pp. 272–292. Oxford University Press (2019)
Google Scholar
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)
Article MathSciNet MATH Google Scholar
Daw, N.D., Touretzky, D.S.: Behavioral considerations suggest an average reward td model of the dopamine system. Neurocomputing 32, 679–684 (2000)
Article Google Scholar
Faisal, A.A., Selen, L.P., Wolpert, D.M.: Noise in the nervous system. Nat. Rev. Neurosci. 9(4), 292–303 (2008)
Article Google Scholar
Feldman, A.G.: Once more on the equilibrium-point hypothesis ($\lambda $ model) for motor control. J. Mot. Behav. 18(1), 17–54 (1986). https://doi.org/10.1080/00222895.1986.10735369
Article Google Scholar
Feldman, Anatol G..: Referent Control of Action and Perception. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2736-4
Book Google Scholar
Friston, K.: The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11(2), 127–138 (2010)
Article Google Scholar
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)
Article MathSciNet MATH Google Scholar
Friston, K., Samothrakis, S., Montague, R.: Active inference and agency: optimal control without cost functions. Biol. Cybern. 106(8–9), 523–541 (2012). https://doi.org/10.1007/s00422-012-0512-8
Article MathSciNet MATH Google Scholar
Friston, K., Stephan, K., Li, B., Daunizeau, J.: Generalised filtering. Math. Prob. Eng. 2010, 1–35 (2010)
Article MathSciNet MATH Google Scholar
Friston, K.J., Daunizeau, J., Kiebel, S.J.: Reinforcement learning or active inference? PLoS ONE 4(7), e6421 (2009)
Article Google Scholar
Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J.: Action and behavior: a free-energy formulation. Biol. Cybern. 102(3), 227–260 (2010). https://doi.org/10.1007/s00422-010-0364-z
Article Google Scholar
Friston, K.J., Rosch, R., Parr, T., Price, C., Bowman, H.: Deep temporal models and active inference. Neurosci. Biobehav. Rev. 77(April), 388–402 (2017). https://doi.org/10.1016/j.neubiorev.2017.04.009. citation Key: Friston 2017
Gallivan, J.P., Chapman, C.S., Wolpert, D.M., Flanagan, J.R.: Decision-making in sensorimotor control. Nat. Rev. Neurosci. 19(9), 519–534 (2018)
Article Google Scholar
Howard, M.W.: Formal models of memory based on temporally-varying representations. In: The New Handbook of Mathematical Psychology, vol. 3. Cambridge University Press (2022)
Google Scholar
Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: 35th International Conference on Machine Learning, ICML 2018, vol. 5, pp. 3347–3358 (2018)
Google Scholar
Jahromi, M.J., Jain, R., Nayyar, A.: Online learning for unknown partially observable mdps. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS). Proceedings of Machine Learning Research, Valencia, Spain, vol. 151, p. 21 (2022)
Google Scholar
Kiebel, S.J., Daunizeau, J., Friston, K.J.: A hierarchy of time-scales and the brain. PLOS Comput. Bio. 4(11), 1–12 (2008). https://doi.org/10.1371/journal.pcbi.1000209
Article Google Scholar
Lanillos, P., et al.: Active inference in robotics and artificial agents: survey and challenges. (arXiv:2112.01871), https://arxiv.org/abs/2112.01871 [cs] (2021)
Latash, M.L.: Motor synergies and the equilibrium-point hypothesis. Mot. Control 14(3), 294–322 (2010). https://doi.org/10.1123/mcj.14.3.294
Article MathSciNet Google Scholar
Latash, M.L.: Physics of biological action and perception. Academic Press (2019). https://doi.org/10.1016/C2018-0-04663-0
Article Google Scholar
Livneh, Y., et al.: Estimation of current and future physiological states in insular cortex. Neuron 105(6), 1094-1111.e10 (2020). https://doi.org/10.1016/j.neuron.2019.12.027
Article Google Scholar
Manohar, S.G., et al.: Reward pays the cost of noise reduction in motor and cognitive control. Curr. Biol. 25(13), 1707–1716 (2015)
Article Google Scholar
Merel, J., Botvinick, M., Wayne, G.: Hierarchical motor control in mammals and machines. Nat. Commun. 10(1), 1–12 (2019). https://doi.org/10.1038/s41467-019-13239-6
Article Google Scholar
Mitchell, B.A., et al.: A minimum free energy model of motor learning. Neural Comput. 31(10), 1945–1963 (2019)
Article MATH Google Scholar
Morville, T., Friston, K., Burdakov, D., Siebner, H.R., Hulme, O.J.: The homeostatic logic of reward. bioRxiv, p. 242974 (2018)
Google Scholar
Mrosovsky, N.: Rheostasis: The Physiology of Change. Oxford University Press, Oxford (1990)
Google Scholar
Nasiriany, S., Lin, S., Levine, S.: Planning with goal-conditioned policies. In: Advances in Neural Information Processing Systems. No. NeurIPS (2019)
Google Scholar
Nasiriany, S., Pong, V.H., Nair, A., Khazatsky, A., Berseth, G., Levine, S.: DisCo RL: distribution-conditioned reinforcement learning for general-purpose policies. In: IEEE International Conference on Robotics and Automation (2021). https://arxiv.org/abs/2104.11707
Pan, Y., Theodorou, E.A.: Nonparametric infinite horizon Kullback-Leibler stochastic control. In: IEEE SSCI 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings, vol. 2(2) (2014). https://doi.org/10.1109/ADPRL.2014.7010616
Pezzulo, G., Cisek, P.: Navigating the affordance landscape: feedback control as a process model of behavior and cognition. Trends Cogn. Sci. 20(6), 414–424 (2016). https://doi.org/10.1016/j.tics.2016.03.013
Article Google Scholar
Pezzulo, G., Rigoli, F., Friston, K.: Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015)
Article Google Scholar
Pezzulo, G., Rigoli, F., Friston, K.J.: Hierarchical active inference: a theory of motivated control. Trends Cogn. Sci. 22(4), 294–306 (2018). https://doi.org/10.1016/j.tics.2018.01.009
Article Google Scholar
Piray, P., Daw, N.D.: Linear reinforcement learning in planning, grid fields, and cognitive control. Nat. Commun. 12(1), 1–20 (2021)
Article Google Scholar
Quigley, K.S., Kanoski, S., Grill, W.M., Barrett, L.F., Tsakiris, M.: Functions of interoception: from energy regulation to experience of the self. Trends in Neurosci. 44(1), 29–38 (2021). https://doi.org/10.1016/j.tins.2020.09.008
Article Google Scholar
Ramstead, M.J., et al.: On Bayesian mechanics: a physics of and by beliefs. arXiv preprint arXiv:2205.11543 (2022)
Ringstrom, T.J., Hasanbeig, M., Abate, A.: Jump operator planning: Goal-conditioned policy ensembles and zero-shot transfer. arXiv preprint arXiv:2007.02527 (2020)
Scholz, J.P., Schöner, G.: The uncontrolled manifold concept: identifying control variables for a functional task. Exp. Brain Res. 126(3), 289–306 (1999)
Article Google Scholar
Schulkin, J., Sterling, P.: Allostasis: a brain-centered, predictive mode of physiological regulation. Trends Neurosci. 42(10), 740–752 (2019)
Article Google Scholar
Sennesh, E., Theriault, J., Brooks, D., van de Meent, J.W., Barrett, L.F., Quigley, K.S.: Interoception as modeling, allostasis as control. Biol. Psychol. 167, 108242 (2021)
Article Google Scholar
Shadmehr, R., Ahmed, A.A.: Vigor: Neuroeconomics of movement control. MIT Press, Cambridge (2020)
Book Google Scholar
Shankar, K.H., Howard, M.W.: A scale-invariant internal representation of time. Neural Comput. 24(1), 134–193 (2012)
Article MathSciNet MATH Google Scholar
Smith, R., Ramstead, M.J., Kiefer, A.: Active inference models do not contradict folk psychology. Synthese 200(2), 81 (2022). https://doi.org/10.1007/s11229-022-03480-w
Article Google Scholar
Stephens, D.W., Krebs, J.R.: Foraging Theory. Princeton University Press, Princeton (2019)
Book Google Scholar
Sterling, P.: Allostasis: a model of predictive regulation. Physiol. Behav. 106(1), 5–15 (2012)
Article Google Scholar
Tadepalli, P., Ok, D.K.: Model-based average reward reinforcement learning. Artif. Intell. 100(1–2), 177–224 (1998). https://doi.org/10.1016/s0004-3702(98)00002-2
Article MATH Google Scholar
Tang, Y., Kucukelbir, A.: Hindsight expectation maximization for goal-conditioned reinforcement learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 130 (2021). https://arxiv.org/abs/2006.07549
Theodorou, E.: Relative entropy and free energy dualities: connections to path integral and kl control. In: 2012 IEEE 51st IEEE Conference, pp. 1466–1473 (2012)
Google Scholar
Thijssen, S., Kappen, H.J.: Path integral control and state-dependent feedback. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 91(3), 1–7 (2015). https://doi.org/10.1103/PhysRevE.91.032104
Article MathSciNet Google Scholar
Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. U.S.A. 106(28), 11478–11483 (2009). https://doi.org/10.1073/pnas.0710743106
Article MATH Google Scholar
Tschantz, A., Barca, L., Maisto, D., Buckley, C.L., Seth, A.K., Pezzulo, G.: Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biol. Psychol. 169, 108266 (2022). https://doi.org/10.1016/j.biopsycho.2022.108266, https://www.sciencedirect.com/science/article/pii/S0301051122000084
Zhang, Y., Ross, K.W.: On-policy deep reinforcement learning for the average-reward criterion. In: Proceedings of the 38th International Conference on Machine Learning, p. 11 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Northeastern University, Boston, MA, 02115, USA
Eli Sennesh, Jordan Theriault, Lisa Feldman Barrett & Karen Quigley
University of Amsterdam, 1090, Amsterdam, GH, The Netherlands
Jan-Willem van de Meent

Authors

Eli Sennesh
View author publications
You can also search for this author in PubMed Google Scholar
Jordan Theriault
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Willem van de Meent
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Feldman Barrett
View author publications
You can also search for this author in PubMed Google Scholar
Karen Quigley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eli Sennesh .

Editor information

Editors and Affiliations

University of Sussex, Brighton, UK
Christopher L. Buckley
University of Chieti-Pescara, Pescara, Italy
Daniela Cialfi
Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
Pablo Lanillos
Wellcome Centre for Human Neuroimaging, London, UK
Maxwell Ramstead
Wellcome Centre for Human Neuroimaging, London, UK
Noor Sajid
Kyoto University, Kyoto, Japan
Hideaki Shimazaki
Ghent University, Ghent, Belgium
Tim Verbelen

A Detailed Derivations

This appendix provides detailed derivations for equations used elsewhere, particularly where doing so would have distracted from the flow of the paper.

Proposition 1 (Variational free energy as divergence from an unnormalized joint distribution)

The variational free energy (Eq. 9) is defined as the Kullback-Leibler divergence of the recognition model $q_\phi $ from the unnormalized joint distribution of the generative model $p_\theta $

$$\begin{aligned} \mathcal {F}_{\theta , \phi }(t)&= D_{\text {KL}}\left( q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) \Vert p_\theta (\textbf{s}_{t} \mid \textbf{s}_{t-1}) \right) , \end{aligned}$$

and therefore equals a sum of the cross entropy between the recognition model and the sensory likelihood and the exclusive KL divergence from the recognition model to the generative model over the latent variables

$$\begin{aligned} \mathcal {F}_{\theta , \phi }(t) ={} & {} \mathbb {E}_{q_\phi } \left[ - \log p_{\theta }(o_t \mid a^{(1)}_t, s^{(1)}_t) \right] + \\{} & {} \qquad D_{\text {KL}}\left( q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) \Vert p_{\theta }(s^{(1:2)}_t, a^{(1:2)}_t \mid \textbf{s}_{t-1}) \right) . \end{aligned}$$

Proof

Taking a divergence between the (normalized) recognition model and the (unnormalized) joint generative model will yield

$$\begin{aligned} \mathcal {F}_{\theta , \phi }(t)&= D_{\text {KL}}\left( q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) \Vert p_\theta (\textbf{s}_{t} \mid \textbf{s}_{t-1}) \right) \\&= \mathbb {E}_{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \left[ -\log \frac{ p_\theta (\textbf{s}_{t} \mid \textbf{s}_{t-1}) }{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \right] \\&= \mathbb {E}_{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \left[ - \log \frac{ p_{\theta }(o_t \mid a^{(1)}_t, s^{(1)}_t) p_{\theta }(s^{(1:2)}_t, a^{(1:2)}_t \mid \textbf{s}_{t-1}) }{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \right] \\&= \mathbb {E}_{q_\phi } \left[ - \log p_{\theta }(o_t \mid a^{(1)}_t, s^{(1)}_t) \right] - \mathbb {E}_{ q_\phi } \left[ \log \frac{ p_{\theta }(s^{(1:2)}_t, a^{(1:2)}_t \mid \textbf{s}_{t-1}) }{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \right] , \end{aligned}$$

as required.

Proposition 2 (KL divergence of the optimal feedback controller from the feedforward controller)

The exclusive Kullback-Leibler divergence of the optimal feedback controller $q^{*}$ from the feedforward generative model $p_\theta $ is

$$\begin{aligned} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right){} & {} = -\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] - \nonumber \\{} & {} \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] . \end{aligned}$$

(25)

Proof

We begin by writing out the definition of a KL divergence

$$\begin{aligned} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right)&= \mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ -\log \frac{ p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) }{ q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) } \right] . \end{aligned}$$

The definition of $q^{*}$ in terms of $p_\theta $ (Eq. 21) allows the inner ratio of densities to simplify to

This simplified ratio therefore has the logarithm

$$\begin{aligned} \log \frac{ p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) }{ q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) }&= \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] + \tilde{H}^{*}(t+1; \textbf{s}_{0}) \end{aligned}$$

and the divergence becomes

$$\begin{aligned}{} & {} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right) = \\{} & {} \qquad -\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] - \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Proposition 3 (Path-integral expression for the optimal differential surprise-to-go)

The optimal differential surprise-to-go function defined by the Bellman equation (Eq. 20)

$$\begin{aligned} \tilde{H}^{*}(t; \textbf{s}_{0})&= h(t; \textbf{s}_{0}) + \min _{q_\phi } \mathbb {E}_{\textbf{s}_{t+1}\sim q_\phi (\cdot \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] \end{aligned}$$

can be simplified by substituting in $q^{*}$ to obtain a path-integral expression

$$\begin{aligned} \tilde{H}^{*}(\textbf{s}_{0})&= -\log \mathbb {E}_{ p_\theta (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \left( J(\textbf{s}_{t}) + L(\textbf{s}_{t}) \right) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] , \\&= -\log \mathbb {E}_{ q_\phi (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \mathcal {J}_{\theta ,\phi }(t) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Proof

Substituting Eq. 21 into Eq. 20 yields

$$\begin{aligned} \tilde{H}^{*}(t; \textbf{s}_{0})&= \bar{\mathcal {J}}(\textbf{s}_{0}) - \mathcal {J}_{\theta ,\phi }(t) + \mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] , \end{aligned}$$

(26)

whose recursive term is $\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] $. The divergence term in $\mathcal {J}$ (Eq. 14) will cancel this term. By Proposition 2 the divergence equals

$$\begin{aligned}{} & {} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right) = \\{} & {} \qquad -\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] - \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Substituting Eq. 25 into Eq. 14 will yield

$$\begin{aligned} -\mathcal {J}_{\theta ,\phi }(t) = \mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] +{} & {} \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] \\{} & {} \qquad \,\,\, + \mathbb {E}_{q_\phi } \left[ -J(\textbf{s}_{t}) \right] + \mathbb {E}_{q_\phi } \left[ -L(\textbf{s}_{t}) \right] , \end{aligned}$$

whose first term will cancel the recursive optimization when substituted into Eq. 26. The result will be a “smoothly minimizing” expression for the optimal differential surprise-to-go

$$\begin{aligned}{} & {} \tilde{H}^{*}(t; \textbf{s}_{0}) = \bar{\mathcal {J}}(\textbf{s}_{0}) - \left( J(\textbf{s}_{t}) + L(\textbf{s}_{t})\right) \\{} & {} \qquad \qquad \qquad \qquad \qquad \qquad -\log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] , \end{aligned}$$

and after unfolding of the recursive expectation, a path-integral expression for the optimal differential surprise-to-go

$$\begin{aligned} \tilde{H}^{*}(\textbf{s}_{0})&= -\log \mathbb {E}_{ p_\theta (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \left( J(\textbf{s}_{t}) + L(\textbf{s}_{t}) \right) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Sampling a trajectory of states from a feedback controller $q_\phi $ instead of the feedforward planner $p_\theta $ will then result in a nonzero divergence term

$$\begin{aligned} \tilde{H}^{*}(\textbf{s}_{0})&= -\log \mathbb {E}_{ q_\phi (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \mathcal {J}_{\theta ,\phi }(t) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sennesh, E., Theriault, J., van de Meent, JW., Barrett, L.F., Quigley, K. (2023). Deriving Time-Averaged Active Inference from Control Principles. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-28719-0_25
Published: 22 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28718-3
Online ISBN: 978-3-031-28719-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deriving Time-Averaged Active Inference from Control Principles

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Detailed Derivations

A Detailed Derivations

Proposition 1 (Variational free energy as divergence from an unnormalized joint distribution)

Proof

Proposition 2 (KL divergence of the optimal feedback controller from the feedforward controller)

Proof

Proposition 3 (Path-integral expression for the optimal differential surprise-to-go)

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation