Skip to main content

Deriving Time-Averaged Active Inference from Control Principles

  • Conference paper
  • First Online:
Active Inference (IWAI 2022)

Abstract

Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or discounted-surprise problems, despite deriving from the infinite-horizon, average-surprise imperative of the free-energy principle. Here we derive an infinite-horizon, average-surprise formulation of active inference from optimal control principles. Our formulation returns to the roots of active inference in neuroanatomy and neurophysiology, formally reconnecting active inference to optimal feedback control. Our formulation provides a unified objective functional for sensorimotor control and allows for reference states to vary over time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adams, R.A., Shipp, S., Friston, K.J.: Predictions not commands: active inference in the motor system. Brain Struct. Funct. 218(3), 611–643 (2013). https://doi.org/10.1007/S00429-012-0475-5

    Article  Google Scholar 

  2. Alexander, W.H., Brown, J.W.: Hyperbolically discounted temporal difference learning. Neural Comput. 22(6), 1511–1527 (2010). https://doi.org/10.1162/neco.2010.08-09-1080

    Article  MATH  Google Scholar 

  3. Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nature Rev. Neurosci. 16(7), 419–429 (2015). https://doi.org/10.1038/nrn3950. https://www.nature.com/articles/nrn3950

  4. Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J.: Canonical microcircuits for predictive coding. Neuron 76(4), 695–711 (2012)

    Article  Google Scholar 

  5. Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15, 319–350 (2001). https://doi.org/10.1613/jair.806

    Article  MathSciNet  MATH  Google Scholar 

  6. Bogacz, R.: A tutorial on the free-energy framework for modelling perception and learning. J. Math. Psychol. 76, 198–211 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  7. Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: IJCAI International Joint Conference on Artificial Intelligence, vol. 19, pp. 6065–6073 (2019). https://doi.org/10.24963/ijcai.2019/840

  8. Carpenter, R.: Homeostasis: a plea for a unified approach. Adv. Physiol. Educ. 28(4), 180–187 (2004)

    Article  Google Scholar 

  9. Cisek, P., Kalaska, J.F.: Neural mechanisms for interacting with a world full of action choices. Annu. Rev. Neurosci. 33, 269–298 (2010). https://doi.org/10.1146/annurev.neuro.051508.135409

    Article  Google Scholar 

  10. Corcoran, A.W., Hohwy, J.: Allostasis, interoception, and the free energy principle: feeling our way forward. In: The Interoceptive Mind: From homeostasis to awareness, pp. 272–292. Oxford University Press (2019)

    Google Scholar 

  11. Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  12. Daw, N.D., Touretzky, D.S.: Behavioral considerations suggest an average reward td model of the dopamine system. Neurocomputing 32, 679–684 (2000)

    Article  Google Scholar 

  13. Faisal, A.A., Selen, L.P., Wolpert, D.M.: Noise in the nervous system. Nat. Rev. Neurosci. 9(4), 292–303 (2008)

    Article  Google Scholar 

  14. Feldman, A.G.: Once more on the equilibrium-point hypothesis (\(\lambda \) model) for motor control. J. Mot. Behav. 18(1), 17–54 (1986). https://doi.org/10.1080/00222895.1986.10735369

    Article  Google Scholar 

  15. Feldman, Anatol G..: Referent Control of Action and Perception. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2736-4

    Book  Google Scholar 

  16. Friston, K.: The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11(2), 127–138 (2010)

    Article  Google Scholar 

  17. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  18. Friston, K., Samothrakis, S., Montague, R.: Active inference and agency: optimal control without cost functions. Biol. Cybern. 106(8–9), 523–541 (2012). https://doi.org/10.1007/s00422-012-0512-8

    Article  MathSciNet  MATH  Google Scholar 

  19. Friston, K., Stephan, K., Li, B., Daunizeau, J.: Generalised filtering. Math. Prob. Eng. 2010, 1–35 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  20. Friston, K.J., Daunizeau, J., Kiebel, S.J.: Reinforcement learning or active inference? PLoS ONE 4(7), e6421 (2009)

    Article  Google Scholar 

  21. Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J.: Action and behavior: a free-energy formulation. Biol. Cybern. 102(3), 227–260 (2010). https://doi.org/10.1007/s00422-010-0364-z

    Article  Google Scholar 

  22. Friston, K.J., Rosch, R., Parr, T., Price, C., Bowman, H.: Deep temporal models and active inference. Neurosci. Biobehav. Rev. 77(April), 388–402 (2017). https://doi.org/10.1016/j.neubiorev.2017.04.009. citation Key: Friston 2017

  23. Gallivan, J.P., Chapman, C.S., Wolpert, D.M., Flanagan, J.R.: Decision-making in sensorimotor control. Nat. Rev. Neurosci. 19(9), 519–534 (2018)

    Article  Google Scholar 

  24. Howard, M.W.: Formal models of memory based on temporally-varying representations. In: The New Handbook of Mathematical Psychology, vol. 3. Cambridge University Press (2022)

    Google Scholar 

  25. Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: 35th International Conference on Machine Learning, ICML 2018, vol. 5, pp. 3347–3358 (2018)

    Google Scholar 

  26. Jahromi, M.J., Jain, R., Nayyar, A.: Online learning for unknown partially observable mdps. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS). Proceedings of Machine Learning Research, Valencia, Spain, vol. 151, p. 21 (2022)

    Google Scholar 

  27. Kiebel, S.J., Daunizeau, J., Friston, K.J.: A hierarchy of time-scales and the brain. PLOS Comput. Bio. 4(11), 1–12 (2008). https://doi.org/10.1371/journal.pcbi.1000209

    Article  Google Scholar 

  28. Lanillos, P., et al.: Active inference in robotics and artificial agents: survey and challenges. (arXiv:2112.01871), https://arxiv.org/abs/2112.01871 [cs] (2021)

  29. Latash, M.L.: Motor synergies and the equilibrium-point hypothesis. Mot. Control 14(3), 294–322 (2010). https://doi.org/10.1123/mcj.14.3.294

    Article  MathSciNet  Google Scholar 

  30. Latash, M.L.: Physics of biological action and perception. Academic Press (2019). https://doi.org/10.1016/C2018-0-04663-0

    Article  Google Scholar 

  31. Livneh, Y., et al.: Estimation of current and future physiological states in insular cortex. Neuron 105(6), 1094-1111.e10 (2020). https://doi.org/10.1016/j.neuron.2019.12.027

    Article  Google Scholar 

  32. Manohar, S.G., et al.: Reward pays the cost of noise reduction in motor and cognitive control. Curr. Biol. 25(13), 1707–1716 (2015)

    Article  Google Scholar 

  33. Merel, J., Botvinick, M., Wayne, G.: Hierarchical motor control in mammals and machines. Nat. Commun. 10(1), 1–12 (2019). https://doi.org/10.1038/s41467-019-13239-6

    Article  Google Scholar 

  34. Mitchell, B.A., et al.: A minimum free energy model of motor learning. Neural Comput. 31(10), 1945–1963 (2019)

    Article  MATH  Google Scholar 

  35. Morville, T., Friston, K., Burdakov, D., Siebner, H.R., Hulme, O.J.: The homeostatic logic of reward. bioRxiv, p. 242974 (2018)

    Google Scholar 

  36. Mrosovsky, N.: Rheostasis: The Physiology of Change. Oxford University Press, Oxford (1990)

    Google Scholar 

  37. Nasiriany, S., Lin, S., Levine, S.: Planning with goal-conditioned policies. In: Advances in Neural Information Processing Systems. No. NeurIPS (2019)

    Google Scholar 

  38. Nasiriany, S., Pong, V.H., Nair, A., Khazatsky, A., Berseth, G., Levine, S.: DisCo RL: distribution-conditioned reinforcement learning for general-purpose policies. In: IEEE International Conference on Robotics and Automation (2021). https://arxiv.org/abs/2104.11707

  39. Pan, Y., Theodorou, E.A.: Nonparametric infinite horizon Kullback-Leibler stochastic control. In: IEEE SSCI 2014 IEEE Symposium Series on Computational Intelligence - ADPRL 2014: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Proceedings, vol. 2(2) (2014). https://doi.org/10.1109/ADPRL.2014.7010616

  40. Pezzulo, G., Cisek, P.: Navigating the affordance landscape: feedback control as a process model of behavior and cognition. Trends Cogn. Sci. 20(6), 414–424 (2016). https://doi.org/10.1016/j.tics.2016.03.013

    Article  Google Scholar 

  41. Pezzulo, G., Rigoli, F., Friston, K.: Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015)

    Article  Google Scholar 

  42. Pezzulo, G., Rigoli, F., Friston, K.J.: Hierarchical active inference: a theory of motivated control. Trends Cogn. Sci. 22(4), 294–306 (2018). https://doi.org/10.1016/j.tics.2018.01.009

    Article  Google Scholar 

  43. Piray, P., Daw, N.D.: Linear reinforcement learning in planning, grid fields, and cognitive control. Nat. Commun. 12(1), 1–20 (2021)

    Article  Google Scholar 

  44. Quigley, K.S., Kanoski, S., Grill, W.M., Barrett, L.F., Tsakiris, M.: Functions of interoception: from energy regulation to experience of the self. Trends in Neurosci. 44(1), 29–38 (2021). https://doi.org/10.1016/j.tins.2020.09.008

    Article  Google Scholar 

  45. Ramstead, M.J., et al.: On Bayesian mechanics: a physics of and by beliefs. arXiv preprint arXiv:2205.11543 (2022)

  46. Ringstrom, T.J., Hasanbeig, M., Abate, A.: Jump operator planning: Goal-conditioned policy ensembles and zero-shot transfer. arXiv preprint arXiv:2007.02527 (2020)

  47. Scholz, J.P., Schöner, G.: The uncontrolled manifold concept: identifying control variables for a functional task. Exp. Brain Res. 126(3), 289–306 (1999)

    Article  Google Scholar 

  48. Schulkin, J., Sterling, P.: Allostasis: a brain-centered, predictive mode of physiological regulation. Trends Neurosci. 42(10), 740–752 (2019)

    Article  Google Scholar 

  49. Sennesh, E., Theriault, J., Brooks, D., van de Meent, J.W., Barrett, L.F., Quigley, K.S.: Interoception as modeling, allostasis as control. Biol. Psychol. 167, 108242 (2021)

    Article  Google Scholar 

  50. Shadmehr, R., Ahmed, A.A.: Vigor: Neuroeconomics of movement control. MIT Press, Cambridge (2020)

    Book  Google Scholar 

  51. Shankar, K.H., Howard, M.W.: A scale-invariant internal representation of time. Neural Comput. 24(1), 134–193 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  52. Smith, R., Ramstead, M.J., Kiefer, A.: Active inference models do not contradict folk psychology. Synthese 200(2), 81 (2022). https://doi.org/10.1007/s11229-022-03480-w

    Article  Google Scholar 

  53. Stephens, D.W., Krebs, J.R.: Foraging Theory. Princeton University Press, Princeton (2019)

    Book  Google Scholar 

  54. Sterling, P.: Allostasis: a model of predictive regulation. Physiol. Behav. 106(1), 5–15 (2012)

    Article  Google Scholar 

  55. Tadepalli, P., Ok, D.K.: Model-based average reward reinforcement learning. Artif. Intell. 100(1–2), 177–224 (1998). https://doi.org/10.1016/s0004-3702(98)00002-2

    Article  MATH  Google Scholar 

  56. Tang, Y., Kucukelbir, A.: Hindsight expectation maximization for goal-conditioned reinforcement learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 130 (2021). https://arxiv.org/abs/2006.07549

  57. Theodorou, E.: Relative entropy and free energy dualities: connections to path integral and kl control. In: 2012 IEEE 51st IEEE Conference, pp. 1466–1473 (2012)

    Google Scholar 

  58. Thijssen, S., Kappen, H.J.: Path integral control and state-dependent feedback. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 91(3), 1–7 (2015). https://doi.org/10.1103/PhysRevE.91.032104

    Article  MathSciNet  Google Scholar 

  59. Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. U.S.A. 106(28), 11478–11483 (2009). https://doi.org/10.1073/pnas.0710743106

    Article  MATH  Google Scholar 

  60. Tschantz, A., Barca, L., Maisto, D., Buckley, C.L., Seth, A.K., Pezzulo, G.: Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biol. Psychol. 169, 108266 (2022). https://doi.org/10.1016/j.biopsycho.2022.108266, https://www.sciencedirect.com/science/article/pii/S0301051122000084

  61. Zhang, Y., Ross, K.W.: On-policy deep reinforcement learning for the average-reward criterion. In: Proceedings of the 38th International Conference on Machine Learning, p. 11 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eli Sennesh .

Editor information

Editors and Affiliations

A Detailed Derivations

A Detailed Derivations

This appendix provides detailed derivations for equations used elsewhere, particularly where doing so would have distracted from the flow of the paper.

Proposition 1 (Variational free energy as divergence from an unnormalized joint distribution)

The variational free energy (Eq. 9) is defined as the Kullback-Leibler divergence of the recognition model \(q_\phi \) from the unnormalized joint distribution of the generative model \(p_\theta \)

$$\begin{aligned} \mathcal {F}_{\theta , \phi }(t)&= D_{\text {KL}}\left( q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) \Vert p_\theta (\textbf{s}_{t} \mid \textbf{s}_{t-1}) \right) , \end{aligned}$$

and therefore equals a sum of the cross entropy between the recognition model and the sensory likelihood and the exclusive KL divergence from the recognition model to the generative model over the latent variables

$$\begin{aligned} \mathcal {F}_{\theta , \phi }(t) ={} & {} \mathbb {E}_{q_\phi } \left[ - \log p_{\theta }(o_t \mid a^{(1)}_t, s^{(1)}_t) \right] + \\{} & {} \qquad D_{\text {KL}}\left( q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) \Vert p_{\theta }(s^{(1:2)}_t, a^{(1:2)}_t \mid \textbf{s}_{t-1}) \right) . \end{aligned}$$

Proof

Taking a divergence between the (normalized) recognition model and the (unnormalized) joint generative model will yield

$$\begin{aligned} \mathcal {F}_{\theta , \phi }(t)&= D_{\text {KL}}\left( q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) \Vert p_\theta (\textbf{s}_{t} \mid \textbf{s}_{t-1}) \right) \\&= \mathbb {E}_{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \left[ -\log \frac{ p_\theta (\textbf{s}_{t} \mid \textbf{s}_{t-1}) }{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \right] \\&= \mathbb {E}_{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \left[ - \log \frac{ p_{\theta }(o_t \mid a^{(1)}_t, s^{(1)}_t) p_{\theta }(s^{(1:2)}_t, a^{(1:2)}_t \mid \textbf{s}_{t-1}) }{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \right] \\&= \mathbb {E}_{q_\phi } \left[ - \log p_{\theta }(o_t \mid a^{(1)}_t, s^{(1)}_t) \right] - \mathbb {E}_{ q_\phi } \left[ \log \frac{ p_{\theta }(s^{(1:2)}_t, a^{(1:2)}_t \mid \textbf{s}_{t-1}) }{ q_\phi (s^{(1:2)}_{t}, a^{(1:2)}_{t} \mid o_{t}, \textbf{s}_{t+1}, \textbf{s}_{t-1}) } \right] , \end{aligned}$$

as required.

Proposition 2 (KL divergence of the optimal feedback controller from the feedforward controller)

The exclusive Kullback-Leibler divergence of the optimal feedback controller \(q^{*}\) from the feedforward generative model \(p_\theta \) is

$$\begin{aligned} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right){} & {} = -\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] - \nonumber \\{} & {} \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] . \end{aligned}$$
(25)

Proof

We begin by writing out the definition of a KL divergence

$$\begin{aligned} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right)&= \mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ -\log \frac{ p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) }{ q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) } \right] . \end{aligned}$$

The definition of \(q^{*}\) in terms of \(p_\theta \) (Eq. 21) allows the inner ratio of densities to simplify to

figure a

This simplified ratio therefore has the logarithm

$$\begin{aligned} \log \frac{ p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) }{ q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) }&= \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] + \tilde{H}^{*}(t+1; \textbf{s}_{0}) \end{aligned}$$

and the divergence becomes

$$\begin{aligned}{} & {} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right) = \\{} & {} \qquad -\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] - \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Proposition 3 (Path-integral expression for the optimal differential surprise-to-go)

The optimal differential surprise-to-go function defined by the Bellman equation (Eq. 20)

$$\begin{aligned} \tilde{H}^{*}(t; \textbf{s}_{0})&= h(t; \textbf{s}_{0}) + \min _{q_\phi } \mathbb {E}_{\textbf{s}_{t+1}\sim q_\phi (\cdot \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] \end{aligned}$$

can be simplified by substituting in \(q^{*}\) to obtain a path-integral expression

$$\begin{aligned} \tilde{H}^{*}(\textbf{s}_{0})&= -\log \mathbb {E}_{ p_\theta (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \left( J(\textbf{s}_{t}) + L(\textbf{s}_{t}) \right) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] , \\&= -\log \mathbb {E}_{ q_\phi (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \mathcal {J}_{\theta ,\phi }(t) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Proof

Substituting Eq. 21 into Eq. 20 yields

$$\begin{aligned} \tilde{H}^{*}(t; \textbf{s}_{0})&= \bar{\mathcal {J}}(\textbf{s}_{0}) - \mathcal {J}_{\theta ,\phi }(t) + \mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] , \end{aligned}$$
(26)

whose recursive term is \(\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] \). The divergence term in \(\mathcal {J}\) (Eq. 14) will cancel this term. By Proposition 2 the divergence equals

$$\begin{aligned}{} & {} D_{\text {KL}}\left( q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t}) \Vert p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t}) \right) = \\{} & {} \qquad -\mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] - \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Substituting Eq. 25 into Eq. 14 will yield

$$\begin{aligned} -\mathcal {J}_{\theta ,\phi }(t) = \mathbb {E}_{q^{*}(\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \tilde{H}^{*}(t+1; \textbf{s}_{0}) \right] +{} & {} \log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] \\{} & {} \qquad \,\,\, + \mathbb {E}_{q_\phi } \left[ -J(\textbf{s}_{t}) \right] + \mathbb {E}_{q_\phi } \left[ -L(\textbf{s}_{t}) \right] , \end{aligned}$$

whose first term will cancel the recursive optimization when substituted into Eq. 26. The result will be a “smoothly minimizing” expression for the optimal differential surprise-to-go

$$\begin{aligned}{} & {} \tilde{H}^{*}(t; \textbf{s}_{0}) = \bar{\mathcal {J}}(\textbf{s}_{0}) - \left( J(\textbf{s}_{t}) + L(\textbf{s}_{t})\right) \\{} & {} \qquad \qquad \qquad \qquad \qquad \qquad -\log \mathbb {E}_{p_\theta (\textbf{s}_{t+1} \mid \textbf{s}_{t})} \left[ \exp {\left( -\tilde{H}^{*}(t+1; \textbf{s}_{0}) \right) } \right] , \end{aligned}$$

and after unfolding of the recursive expectation, a path-integral expression for the optimal differential surprise-to-go

$$\begin{aligned} \tilde{H}^{*}(\textbf{s}_{0})&= -\log \mathbb {E}_{ p_\theta (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \left( J(\textbf{s}_{t}) + L(\textbf{s}_{t}) \right) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Sampling a trajectory of states from a feedback controller \(q_\phi \) instead of the feedforward planner \(p_\theta \) will then result in a nonzero divergence term

$$\begin{aligned} \tilde{H}^{*}(\textbf{s}_{0})&= -\log \mathbb {E}_{ q_\phi (\textbf{s}_{1:T} \mid \textbf{s}_{0}) } \left[ \exp {\left( \sum _{t=1}^{T} \mathcal {J}_{\theta ,\phi }(t) - \bar{\mathcal {J}}(\textbf{s}_{0}) \right) } \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sennesh, E., Theriault, J., van de Meent, JW., Barrett, L.F., Quigley, K. (2023). Deriving Time-Averaged Active Inference from Control Principles. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28719-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28718-3

  • Online ISBN: 978-3-031-28719-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics