Skip to main content

Unsupervised Real-Time Control Through Variational Empowerment

  • Conference paper
  • First Online:
Robotics Research (ISRR 2019)

Abstract

Intrinsic motivation is vital for living beings. It enables skill acquisitions, triggers explorative behaviour, and hence enhances cognitive capabilities. One way of formalising the variety of behaviours induced by intrinsic motivation is empowerment, an information-theoretic measure that encodes the influence an agent exerts on its environment. Formally, empowerment is the maximum mutual information between actions and the resulting states which is prohibitively hard to compute, especially in nonlinear continuous spaces. In this work, we introduce a method for efficiently computing a lower bound on empowerment, enabling its use as an unsupervised cost function for real-time control. We demonstrate that our algorithm can reliably handle continuous dynamical systems even when system dynamics are learnt from raw data. The resulting empowerment-maximizing policies consistently drive the agents into states with high potential impact.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We adopted the term source from the channel capacity literature.

  2. 2.

    https://developer.nvidia.com/physx-sdk.

References

  • Polani, D.: Information: currency of life? HFSP J. 3(5), 307–316 (2009)

    Article  Google Scholar 

  • Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: The 2005 IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135. IEEE (2005a)

    Google Scholar 

  • Mohamed, S., Rezende, D.J.: Variational information maximisation for intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2125–2133 (2015)

    Google Scholar 

  • Karl, M., Bayer, J., van der Smagt, P.: Efficient empowerment. arXiv:1509.08455, September 2015

  • Karl, M., Soelch, M., Bayer, J., van der Smagt, P.: Deep variational Bayes filters: unsupervised learning of state space models from raw data. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  • Stengel, R.F.: Optimal Control and Estimation. Courier Corporation (2012)

    Google Scholar 

  • Barber, D., Agakov, F.V.: The IM algorithm: a variational approach to information maximization. In: Advances in Neural Information Processing Systems, vol. 16, pp. 201–208 (2003)

    Google Scholar 

  • Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp. 1530–1538 (2015)

    Google Scholar 

  • Burda, Y., Grosse, R., Salakhutdinov, R.: Importance weighted autoencoders. arXiv preprint arXiv:1509.00519 (2015)

  • Kingma, D.P., Salimans, T., Józefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improving variational autoencoders with inverse autoregressive flow. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4736–4744 (2016)

    Google Scholar 

  • Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  • Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp. 1278–1286 (2014)

    Google Scholar 

  • Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1, pp. 318–362. MIT Press (1986)

    Google Scholar 

  • Rawlik, K., Toussaint, M., Vijayakumar, S.: Approximate inference and stochastic optimal control. arXiv preprint arXiv:1009.3958 (2010)

  • Salge, C., Glackin, C., Polani, D.: Empowerment–an introduction. In: Prokopenko, M. (ed.) Guided Self-Organization: Inception. ECC, vol. 9, pp. 67–114. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53734-9_4

    Chapter  Google Scholar 

  • Klyubin, A.S., Polani, D., Nehaniv, C.L.: Keep your options open: an information-based driving principle for sensorimotor systems. PLoS ONE 3(12), 1–14 (2008). https://doi.org/10.1371/journal.pone.0004018

    Article  Google Scholar 

  • Blahut, R.: Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 18(4), 460–473 (1972)

    Article  MathSciNet  Google Scholar 

  • Jung, T., Polani, D., Stone, P.: Empowerment for continuous agent-environment systems. Adapt. Behav. Anim. Animats Softw. Agents Robots Adapt. Syst. 19(1), 16–39 (2011). https://doi.org/10.1177/1059712310392389. ISSN: 1059-7123

    Article  Google Scholar 

  • Salge, C., Glackin, C., Polani, D.: Approximation of empowerment in the continuous domain. Adv. Complex Syst. 16(02n03), 1250079 (2013). https://doi.org/10.1142/S0219525912500798. ISSN: 0219-5259, 1793-6802

    Article  MathSciNet  Google Scholar 

  • Gregor, K., Rezende, D.J., Wierstra, D.: Variational intrinsic control. arXiv preprint arXiv:1611.07507 (2016)

  • Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  Google Scholar 

  • Singh, S.P., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS, pp. 1281–1288 (2004)

    Google Scholar 

  • Oudeyer, P.-Y., Kaplan, F.: How can we define intrinsic motivation? In: Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, Lund: LUCS, Brighton (2008)

    Google Scholar 

  • Schmidhuber, J.: Curious model-building control systems. In: IEEE International Joint Conference on Neural Networks, pp. 1458–1463. IEEE (1991)

    Google Scholar 

  • Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2(3), 230–247 (2010)

    Article  Google Scholar 

  • Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29, pp. 1471–1479 (2016)

    Google Scholar 

  • Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: Advances in Neural Information Processing Systems, pp. 547–554 (2006)

    Google Scholar 

  • Achiam, J., Sastry, S.: Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732 (2017)

  • Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, pp. 1109–1117 (2016)

    Google Scholar 

  • Schmidhuber, J.: PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4, 313 (2013)

    Article  Google Scholar 

  • Sukhbaatar, S., Kostrikov, I., Szlam, A., Fergus, R.: Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407 (2017)

  • Wissner-Gross, A.D., Freer, C.E.: Causal entropic forces. Phys. Rev. Lett. 110(16), 168702 (2013)

    Article  Google Scholar 

  • Coumans, E., Bai, Y.: PyBullet, a Python module for physics simulation for games, robotics and machine learning. GitHub repository (2016)

    Google Scholar 

  • Klyubin, A.S., Polani, D., Nehaniv, C.L.: All else being equal be empowered. In: Capcarrère, M.S., Freitas, A.A., Bentley, P.J., Johnson, C.G., Timmis, J. (eds.) ECAL 2005. LNCS (LNAI), vol. 3630, pp. 744–753. Springer, Heidelberg (2005b). https://doi.org/10.1007/11553090_75

  • Brockman, G., et al.: OpenAI gym (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maximilian Karl .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (ppt 355 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karl, M., Becker-Ehmck, P., Soelch, M., Benbouzid, D., van der Smagt, P., Bayer, J. (2022). Unsupervised Real-Time Control Through Variational Empowerment. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol 20. Springer, Cham. https://doi.org/10.1007/978-3-030-95459-8_10

Download citation

Publish with us

Policies and ethics