Sensorimotor Visual Perception on Embodied System Using Free Energy Principle

Esaki, Kanako; Matsumura, Tadayuki; Ito, Kiyoto; Mizuno, Hiroyuki

doi:10.1007/978-3-030-93736-2_62

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1524))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2306 Accesses
2 Citations
1 Altmetric

Abstract

We propose an embodied system that is based on the free energy principle (FEP) for sensorimotor visual perception (SMVP). Although the FEP mathematically describes the rule that living things obey, limitation by embodiment is required to model SMVP. The proposed system is configured by a body, which partially observes the environment, and memory, which retains classified knowledge about the environment as a generative model, and executes active and perceptual inferences. Evaluation using the MNIST dataset showed that the proposed system recognizes characters by active and perceptual inferences, and the intentionality corresponding to human confirmation bias is reproduced on the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mandelbaum, J., Sloan, L.L.: Peripheral visual acuity*: with special reference to scotopic illumination. Am. J. Ophthalmol. 30(5), 581–588 (1947)
Article Google Scholar
O’Regan, J.K., Noë, A.: A sensorimotor account of vision and visual consciousness. Behav. Brain Sci. 24(5), 939–973 (2001)
Article Google Scholar
Seth, A.K.: The cybernetic Bayesian brain: from interoceptive inference to sensorimotor contingencies. In: Open MIND, vol. 35 (2015)
Google Scholar
Land, M.F.: Eye movements and the control of actions in everyday life. Prog. Retin. Eye Res. 25(3), 296–324 (2006)
Article Google Scholar
Friston, K., Kiebel, S.: Predictive coding under the free-energy principle. Philos. Trans. R. Soc. B Biol. Sci. 364(1521), 1211–1221 (2009)
Article Google Scholar
Seth, A.K., Suzuki, K., Critchley, H.D.: An interoceptive predictive coding model of conscious presence. Front. Psychol. 2, 395 (2012)
Article Google Scholar
Adams, R.A., Shipp, S., Friston, K.J.: Predictions not commands: active inference in the motor system. Brain Struct. Funct. 218(3), 611–643 (2013)
Article Google Scholar
Bogacz, R.: A tutorial on the free-energy framework for modelling perception and learning. J. Math. Psychol. 76, 198–211 (2017). https://doi.org/10.1016/j.jmp.2015.11.003
Article MathSciNet MATH Google Scholar
Lotter, W., Kreiman, G., Cox, D.: Deep predictive coding networks for video prediction and unsupervised learning. In: 5th International Conference on Learning Representations, Toulon (2017)
Google Scholar
O’Regan, J. K.: Experience is not something we feel but something we do: a principled way of explaining sensory phenomenology, with Change Blindness and other empirical consequences. http://nivea.psycho.univ-paris5.fr/ASSChtml/Pacherie4.html. Accessed 27 Aug 2021
Parr, T., Sajid, N., Da Costa, L., Mirza, M.B., Friston, K.J.: Generative models for active vision. Front. Neurorobot. 15, 34 (2021)
Article Google Scholar
Tang, Y., Nguyen, D., Ha, D.: Neuroevolution of self-interpretable agents. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, Cancún, pp. 414–424. Association for Computing Machinery (2020)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, pp. 1025–1030. Morgan Kaufmann Publishers Inc. (2003)
Google Scholar
Ji, S., Parr, R., Li, H., Liao, X., and Carin, L.: Point-based policy iteration. In: Proceedings of the 22nd National Conference on Artificial Intelligence, Vancouver, vol. 2, pp. 1243–1249. AAAI Press (2007)
Google Scholar
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. Adv. Neural. Inf. Process. Syst. 23, 2164–2172 (2010)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, pp. 1462–1471. JMLR.org (2015)
Google Scholar
Oord, A. V., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: Proceedings of the 33rd International Conference on Machine Learning, New York, pp. 1747–1756. JMLR.org (2016)
Google Scholar
Salimans, T., Karpathy, A., Chen, X., Kingma, D. P.: PixelCNN++: improving the pixelCNN with discretized logistic mixture likelihood and other modifications. In: 5th International Conference on Learning Representations, Toulon (2017)
Google Scholar
Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in Atari games. Adv. Neural. Inf. Process. Syst. 28, 2863–2871 (2015)
Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: VIME: variational information maximizing exploration. Adv. Neural. Inf. Process. Syst. 29, 1117–1125 (2016)
Google Scholar
van der Himst, O., Lanillos, P.: Deep active inference for partially observable MDPs. In: Verbelen, T., Lanillos, P., Buckley, C.L., De Boom, C. (eds.) International Workshop on Active Inference 2020. Communications in Computer and Information Science, vol. 1326, pp. 61–71. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_8
Daucé, E., Perrinet, L.: Visual search as active inference. In: Verbelen, T., Lanillos, P., Buckley, C.L., De Boom, C. (eds.) International Workshop on Active Inference 2020. Communications in Computer and Information Science, vol. 1326, pp. 165–178. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64919-7_17
Friston, K., Adams, R.A., Perrinet, L., Breakspear, M.: Perceptions as hypotheses: saccades as experiments. Front. Psychol. 3, 151 (2012)
Google Scholar
Mirza, M.B., Adams, R.A., Mathys, C.D., Friston, K.J.: Scene construction, visual foraging, and active inference. Front. Comput. Neurosci. 10, 56 (2016)
Article Google Scholar
Heins, R.C., Mirza, M.B., Parr, T., Friston, K., Kagan, I., Pooresmaeili, A.: Deep active inference and scene construction. Front. Artif. Intell. 3, 81 (2020)
Article Google Scholar
Friston, K., Kilner, J., Harrison, L.: A free energy principle for the brain. J. Physiol. Paris 100(1–3), 70–87 (2006)
Article Google Scholar
Friston, K.: The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010)
Article Google Scholar
McGregor, S., Baltieri, M., Buckley, C.L.: A minimal active inference agent. arXiv preprint arXiv:1503.04187 (2015)
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)
Article MathSciNet Google Scholar
Buckley, C.L., Kim, C.S., McGregor, S., Seth, A.K.: The free energy principle for action and perception: a mathematical review. J. Math. Psychol. 81, 55–79 (2017)
Article MathSciNet Google Scholar
Fitzpatrick, P., Metta, G., Natale, L., Rao, S., Sandini, G.: Learning about objects through action - initial steps towards artificial cognition. In: 2003 IEEE International Conference on Robotics and Automation, Taipei, pp. 3140–3145. IEEE (2003)
Google Scholar
Cheng, G., et al.: CB: a humanoid research platform for exploring neuroscience. In: 2006 6th IEEE-RAS International Conference on Humanoid Robots, Genova, pp. 182–187. IEEE (2006)
Google Scholar
Friston, K.: Embodied inference: or “I think therefore I am, if I am what I think”. In: Tschacher, W., Bergomi, C. (eds.) The Implications of Embodiment: Cognition and Communication, pp. 89–125. Imprint Academic (2011)
Google Scholar
Gallagher, S., Allen, M.: Active inference, enactivism and the hermeneutics of social cognition. Synthese 195(6), 2627–2648 (2016)
Article Google Scholar
THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed 27 Aug 2021
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 1–9. IEEE (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, Banff (2014)
Google Scholar
Rezende, D. J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, pp. 1278–1286. JMLR.org (2014)
Google Scholar
Kappes, A., Harvey, A.H., Lohrenz, T., Montague, P.R., Sharot, T.: Confirmation bias in the utilization of others’ opinion strength. Nat. Neurosci. 23, 130–137 (2020)
Article Google Scholar

Download references

Acknowledgements

The authors thank Dr. Qinghua Sun from Hitachi Ltd. for his constructive comments and suggestions for improving this paper.

Author information

Authors and Affiliations

Research & Development Group, Hitachi, Ltd., 1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo, 185-8601, Japan
Kanako Esaki, Tadayuki Matsumura, Kiyoto Ito & Hiroyuki Mizuno

Authors

Kanako Esaki
View author publications
You can also search for this author in PubMed Google Scholar
Tadayuki Matsumura
View author publications
You can also search for this author in PubMed Google Scholar
Kiyoto Ito
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Mizuno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanako Esaki .

Editor information

Editors and Affiliations

IKIM, Ruhr-University Bochum, Bochum, Germany
Michael Kamp
University of Sydney, Sydney, NSW, Australia
Irena Koprinska
University of Namur, Namur, Belgium
Adrien Bibal
University of Rennes 1, Rennes, France
Tassadit Bouadi
University of Namur, Namur, Belgium
Benoît Frénay
Inria, Rennes, France
Luis Galárraga
University of Antwerp, Antwerp, Belgium
José Oramas
Ruhr University Bochum, Bochum, Germany
Linara Adilova
Royal Holloway University of London, Egham, UK
Yamuna Krishnamurthy
Ghent University, Ghent, Belgium
Bo Kang
Université Jean Monnet, Saint-Etienne cedex 2, France
Christine Largeron
Ghent University, Gent, Belgium
Jefrey Lijffijt
Telecom Paris, Paris, France
Tiphaine Viard
University of Bonn, Bonn, Germany
Pascal Welke
Norwegian Univesity of Science and Technology, Trondheim, Norway
Massimiliano Ruocco
BI Norwegian Business School, Oslo, Norway
Erlend Aune
University of Pisa, Pisa, Italy
Claudio Gallicchio
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
Xilinx Research, Dublin, Ireland
Michaela Blott
Heidelberg University, Heidelberg, Germany
Holger Fröning
Heidelberg University, Heidelberg, Germany
Günther Schindler
University of Pisa, Pisa, Italy
Riccardo Guidotti
University of Pisa, Pisa, Italy
Anna Monreale
ISTI-CNR, Pisa, Italy
Salvatore Rinzivillo
Warsaw University of Technology, Warsaw, Poland
Przemyslaw Biecek
Freie Universität Berlin, Berlin, Germany
Eirini Ntoutsi
Eindhoven University of Technology, Eindhoven, The Netherlands
Mykola Pechenizkiy
Leibniz University Hannover, Hannover, Germany
Bodo Rosenhahn
University of Sussex, Brighton, UK
Christopher Buckley
University of Chieti-Pescara, Chieti, Italy
Daniela Cialfi
Radboud University Nijmegen, Nijmegen, The Netherlands
Pablo Lanillos
McGill University, Montreal, Canada
Maxwell Ramstead
Ghent University, Ghent, Belgium
Tim Verbelen
University of Lisbon, Lisboa, Portugal
Pedro M. Ferreira
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
Universita di Bari Aldo Moro, Bari, Italy
Donato Malerba
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger
Harbin Institute of Technology, Harbin, China
M. Saqib Nawaz
University of Córdoba, Córdoba, Spain
Sebastian Ventura
Peking University, Beijing, China
Meng Sun
Noah's Ark Lab, Huawei, Beijing, China
Min Zhou
UniCredit, Milan, Italy
Valerio Bitetta
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Andrea Ferretti
Unicredit, Rome, Italy
Francesco Gullo
ENEA Headquarters, Portici, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Porto, Porto, Portugal
Rita Ribeiro
University of Porto, Porto, Portugal
João Gama
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
Northwestern University, Chicago, IL, USA
Lee Cooper
PD Personalised Healthcare, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
ETH Zurich, Basel, Switzerland
Damian Roqueiro
F. Hoffmann–La Roche Ltd, Basel, Switzerland
Diego Saldana Miranda
Novartis Pharma AG, Basel, Switzerland
Konstantinos Sechidis
University of Lisbon, Lisbon, Portugal
Guilherme Graça

Appendix

The generative model, described in this paper, is a combination of a variational autoencoder (VAE) and a fully connected neural network (FNN). The architecture is shown in Fig. 7. The encoder consists of four 2D convolutional layers and each layer is followed by a batch normalization and a rectified linear unit. The bottleneck consists of two linear transformation layers for calculating the average and the variance with reparameterizing function. The decoder consists of four 2D transposed convolutional layers and each layer is followed by a batch normalization and a rectified linear unit (sigmoid unit for the last layer). The classifier consists of a linear transformation layer followed by a rectified linear unit and a linear transformation layer followed by a softmax unit. The model was trained using Adam optimizer (learning rate: 0.001) with the sum of VAE loss and FNN loss.

Algorithm 1 shows the pseudo code of the processing flow. The \({p}_{{a}_{t-1}}\left({s}_{t},{x}_{t}\right)\) is pre-trained using training data of \(\left({s}_{t},{x}_{t}\right)\). All the training data of \({s}_{t}\) are pre-processed so that the center of gravity of an image is shifted to the center position. During the operation of the proposed embodied system, the process from the 2nd line to the 12th line is repeated. First, an attention image \(s^{\prime}_{t}\) is obtained from the vision sensor. The past sensory input images are composed with the obtained \(s^{\prime}_{t}\) while maintaining each relative attention position. The center of gravity of the composed image is calculated, and the composed image is shifted so that the center of gravity is located at the center position of the image. The shifted composed image is an \({s}_{t}\). Then, \(q\left({x}_{t}|{\phi }_{{x}_{t}}\right)\) is calculated by inputting \({s}_{t}\) to \({p}_{{a}_{t-1}}\left({s}_{t},{x}_{t}\right)\). After that, the sub-function starting from the 14th line is called to generate expected sensory input images \({s}_{t+1}\). In the sub-function, an \({q}_{img}\left({x}_{t}|{\phi }_{{x}_{t}}\right)\) is calculated by inputting \({s}_{t}\) to \({p}_{{a}_{t-1}}\left({s}_{t},{x}_{t}\right)\). A template image is generated by detecting a bounding rectangle area of non-zero pixels in \({s}_{t}\) and extracting the area from \({s}_{t}\). Template matching is carried out in the \({q}_{img}\left({x}_{t}|{\phi }_{{x}_{t}}\right)\), and the representative position of the current \(s_{t}\), \(u_{cur}\), is obtained. To calculate the next candidate attention positions \({u}_{next}\), a candidate region of \({u}_{next}\) is set. The candidate region is a region obtained by adding a fixed margin pixel to a region of \({s}_{t}\) in \({q}_{img}\left({x}_{t}|{\phi }_{{x}_{t}}\right)\). The region of \({s}_{t}\) is defined by \({u}_{cur}\) and the size of the template image. The \({u}_{next}\) are calculated by sliding the window with the fixed stride pixel in the candidate region. The window is the size of \(s^{\prime}_{t}\). The representative positions of all the window positions during sliding are \({u}_{next}\). The \({s}_{t+1}\) are generated by extracting the region of the \({s}_{t}\) and the region of the next candidate attention images \(s^{\prime}_{t + 1}\) from \({q}_{img}\left({x}_{t}|{\phi }_{{x}_{t}}\right)\). The region of the \({s}_{t}\) is defined by \({u}_{cur}\) and the size of the template image, as mentioned above. The region of the \(s^{\prime}_{t + 1}\) are defined by \({u}_{next}\) and the size of \(s^{\prime}_{t + 1}\). The extracted images are clipped or applied with zero-padding to have the same size as \({s}_{t}\). Each approximate posterior distribution \(q\left({x}_{t+1}|{\phi }_{{x}_{t+1}}\right)\) is calculated by inputting each image included in \({s}_{t+1}\) to \({p}_{{a}_{t-1}}\left({s}_{t},{x}_{t}\right)\). Note that \(q\left({x}_{t}|{\phi }_{{x}_{t}}\right)\) is calculated using the current \({s}_{t}\), while \(q\left({x}_{t+1}|{\phi }_{{x}_{t+1}}\right)\) is calculated using \({s}_{t+1}\). The entropy of each \(q\left({x}_{t+1}|{\phi }_{{x}_{t+1}}\right)\) is calculated and added to the uncertainty map \(M\). Finally, the attention position having the minimum value in \(M\) is defined as the next attention position.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Esaki, K., Matsumura, T., Ito, K., Mizuno, H. (2021). Sensorimotor Visual Perception on Embodied System Using Free Energy Principle. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-93736-2_62
Published: 17 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93735-5
Online ISBN: 978-3-030-93736-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sensorimotor Visual Perception on Embodied System Using Free Energy Principle

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation