Skip to main content

Deep Active Inference for Pixel-Based Discrete Control: Evaluation on the Car Racing Problem

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

Despite the potential of active inference for visual-based control, learning the model and the preferences (priors) while interacting with the environment is challenging. Here, we study the performance of a deep active inference (dAIF) agent on OpenAI’s car racing benchmark, where there is no access to the car’s state. The agent learns to encode the world’s state from high-dimensional input through unsupervised representation learning. State inference and control are learned end-to-end by optimizing the expected free energy. Results show that our model achieves comparable performance to deep Q-learning. However, vanilla dAIF does not reach state-of-the-art performance compared to other world model approaches. Hence, we discuss the current model implementation’s limitations and potential architectures to overcome them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The code can be found at https://github.com/NTAvanHoeffelen/DAIF_CarRacing.

  2. 2.

    The full derivation can be found in Appendix D.

References

  1. Openai’s carracing-v0 leaderboard. https://github.com/openai/gym/wiki/Leaderboard#carracing-v0

  2. Çatal, O., Nauta, J., Verbelen, T., Simoens, P., Dhoedt, B.: Bayesian policy selection using active inference. arXiv preprint arXiv:1904.08149 (2019)

  3. Çatal, O., Wauthier, S., De Boom, C., Verbelen, T., Dhoedt, B.: Learning generative state space models for active inference. Front. Comput. Neurosci. 14, 103 (2020)

    Article  Google Scholar 

  4. Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. J. Math. Psychol. 99, 102447 (2020)

    Article  MathSciNet  Google Scholar 

  5. Fountas, Z., Sajid, N., Mediano, P.A., Friston, K.: Deep active inference agents using monte-carlo methods. arXiv preprint arXiv:2006.04176 (2020)

  6. Friston, K.: A theory of cortical responses. Philos. Trans. R. Soc. B Biol. Sci. 360(1456), 815–836 (2005)

    Article  Google Scholar 

  7. Friston, K., Samothrakis, S., Montague, R.: Active inference and agency: optimal control without cost functions. Biol. Cybern. 106(8), 523–541 (2012)

    Article  MathSciNet  Google Scholar 

  8. Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J.: Action and behavior: a free-energy formulation. Biol. Cybern. 102(3), 227–260 (2010)

    Article  Google Scholar 

  9. Gaier, A., Ha, D.: Weight agnostic neural networks. arXiv preprint arXiv:1906.04358 (2019)

  10. Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.10122 (2018)

  11. van der Himst, O., Lanillos, P.: Deep active inference for partially observable MDPs. In: IWAI 2020. CCIS, vol. 1326, pp. 61–71. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_8

    Chapter  Google Scholar 

  12. Khan, M., Elibol., O.: Car racing using reinforcement learning (2018). https://web.stanford.edu/class/cs221/2017/restricted/p-final/elibol/final.pdf

  13. Klimov, O.: Carracing-v0. https://gym.openai.com/envs/CarRacing-v0/

  14. Lanillos, P., van Gerven, M.: Neuroscience-inspired perception-action in robotics: applying active inference for state estimation, control and self-perception. arXiv preprint arXiv:2105.04261 (2021)

  15. Lin, L.: Reinforcement learning for robots using neural networks (1992)

    Google Scholar 

  16. Meo, C., Lanillos, P.: Multimodal VAE active inference controller. arXiv preprint arXiv:2103.04412 (2021)

  17. Millidge, B.: Deep active inference as variational policy gradients. J. Math. Psychol. 96, 102348 (2020)

    Article  MathSciNet  Google Scholar 

  18. Min J. Jang, S., Lee, C.: Reinforcement car racing with a3c (2017). https://www.scribd.com/document/358019044/Reinforcement-Car-Racing-with-A3C

  19. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  20. Noel, A.D., van Hoof, C., Millidge, B.: Online reinforcement learning with sparse rewards through an active inference capsule. arXiv preprint arXiv:2106.02390 (2021)

  21. Parr, T., Friston, K.J.: Generalised free energy and active inference. Biol. Cybern. 113(5), 495–513 (2019)

    Article  MathSciNet  Google Scholar 

  22. Risi, S., Stanley, K.O.: Deep neuroevolution of recurrent and discrete world models. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 456–462 (2019)

    Google Scholar 

  23. Sancaktar, C., van Gerven, M.A., Lanillos, P.: End-to-end pixel-based deep active inference for body perception and action. In: 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 1–8. IEEE (2020)

    Google Scholar 

  24. Schwartenbeck, P., Passecker, J., Hauser, T.U., FitzGerald, T.H., Kronbichler, M., Friston, K.J.: Computational mechanisms of curiosity and goal-directed exploration. Elife 8, e41703 (2019)

    Article  Google Scholar 

  25. Slik, J.: Deep reinforcement learning for end-to-end autonomous driving (2019)

    Google Scholar 

  26. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  27. Tschantz, A., Baltieri, M., Seth, A.K., Buckley, C.L.: Scaling active inference. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

    Google Scholar 

  28. Tschantz, A., Millidge, B., Seth, A.K., Buckley, C.L.: Reinforcement learning through active inference. arXiv preprint arXiv:2002.12636 (2020)

  29. Ueltzhöffer, K.: Deep active inference. Biol. Cybern. 112(6), 547–573 (2018)

    Article  MathSciNet  Google Scholar 

  30. van der Wal, D., Intelligentie, B.O.K., Shang, W.: Advantage actor-critic methods for carracing (2018)

    Google Scholar 

  31. Zhang, Y.: Deep reinforcement learning with mixed convolutional network. arXiv preprint arXiv:2010.00717 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. T. A. van Hoeffelen .

Editor information

Editors and Affiliations

Appendices

A Model Parameters

Table 2. General parameters
Table 3. DQN parameters
Table 4. dAIF parameters

B DQN: Policy Network

Table 5. Layers DQN policy network

C VAE

Table 6. VAE layers

D Derivations

Derivation for the EFE for a single time step:

$$\begin{aligned} \begin{aligned} G(s_k, o_k)&= \mathbf {KL}[q(s_k) \mid \mid p(s_k, o_k)] \\&= \int q(s_k) \ln \frac{q(s_k)}{p(s_k,o_k)} \\&= \int q(s_k) \ln {q(s_k)} - \ln {p(s_k,o_k)} \\&= \int q(s_k) \ln {q(s_k)} - \ln {p(s_k|o_k)} - \ln {p(o_k)} \\&\approx \int q(s_k) \ln {q(s_k)} - \ln {q(s_k|o_k)} - \ln {p(o_k)} \\&\approx - \ln {p(o_k)} + \int q(s_k) \ln {q(s_k)} - \ln {q(s_k|o_k)} \\&\approx - \ln {p(o_k)} + \int q(s_k) \ln {\frac{q(s_k)}{q(s_k|o_k)}} \\&\approx - \ln {p(o_k)} + \mathbf {KL}[q(s_k) \mid \mid q(s_k|o_k)] \ \\&\approx -r(o_k) + \mathbf {KL}[q(s_k) \mid \mid q(s_k|o_k)] \end{aligned} \end{aligned}$$

E Average Reward Over 100 Episodes

Fig. 5.
figure 5

Average reward test over 100 episodes for DQN and dAIF. The bright lines show the mean over episodes. The transparent lines show the reward that was obtained in a particular episode.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

van Hoeffelen, N.T.A., Lanillos, P. (2021). Deep Active Inference for Pixel-Based Discrete Control: Evaluation on the Car Racing Problem. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93736-2_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93735-5

  • Online ISBN: 978-3-030-93736-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics