Embodied Language Understanding with a Multiple Timescale Recurrent Neural Network

Heinrich, Stefan; Weber, Cornelius; Wermter, Stefan

doi:10.1007/978-3-642-40728-4_27

Stefan Heinrich²²,
Cornelius Weber²² &
Stefan Wermter²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8131))

Included in the following conference series:

International Conference on Artificial Neural Networks

6166 Accesses
5 Citations

Abstract

How the human brain understands natural language and what we can learn for intelligent systems is open research. Recently, researchers claimed that language is embodied in most – if not all – sensory and sensorimotor modalities and that the brain’s architecture favours the emergence of language. In this paper we investigate the characteristics of such an architecture and propose a model based on the Multiple Timescale Recurrent Neural Network, extended by embodied visual perception. We show that such an architecture can learn the meaning of utterances with respect to visual perception and that it can produce verbal utterances that correctly describe previously unknown scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barsalou, L.W.: Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008)
Article Google Scholar
Borghi, A.M., Gianelli, C., Scorolli, C.: Sentence comprehension: effectors and goals, self and others. An overview of experiments and implications for robotics. Frontiers in Neurorobotics 4(3), 8 (2010)
Google Scholar
Cangelosi, A.: Grounding language in action and perception: From cognitive agents to humanoid robots. Physics of Life Reviews 7(2), 139–151 (2010)
Article Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. on Pattern Anal. and Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
Elman, J.L.: Finding structure in time. Cognitive Science 14(2), 179–211 (1990)
Article Google Scholar
Frank, S.L.: Strong systematicity in sentence processing by an echo state network. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 505–514. Springer, Heidelberg (2006)
Chapter Google Scholar
Friston, K.: A theory of cortical responses. Philosophical Transactions of the Royal Society B 360, 815–836 (2005)
Article Google Scholar
Heinrich, S., Weber, C., Wermter, S.: Adaptive learning of linguistic hierarchy in a multiple timescale recurrent neural network. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 555–562. Springer, Heidelberg (2012)
Chapter Google Scholar
Henderson, J.M.: Human gaze control during real-world scene perception. Trends in Cognitive Sciences 7(11), 498–504 (2003)
Article Google Scholar
Hickok, G., Poeppel, D.: The cortical organization of speech processing. Nature Reviews Neuroscience 8(5), 393–402 (2007)
Article Google Scholar
Hinoshita, W., Arie, H., Tani, J., Okuno, H.G., Ogata, T.: Emergence of hierarchical structure mirroring linguistic composition in a recurrent neural network. Neural Networks 24(4), 311–320 (2011)
Article Google Scholar
Karmiloff, K., Karmiloff-Smith, A.: Pathways to language: From fetus to adolescent. Harvard University Press (2002)
Google Scholar
LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)
Chapter Google Scholar
Pulvermüller, F., Fadiga, L.: Active perception: sensorimotor circuits as a cortical basis for language. Nature Reviews Neuroscience 11, 351–360 (2010)
Article Google Scholar
Rohde, D.L.T., Plaut, D.C.: Connectionist models of language processing. Cognitive Studies 10(1), 10–28 (2003)
Google Scholar
Roy, D.K., Pentland, A.P.: Learning words from sights and sounds: A computational model. Cognitive Science 26(1), 113–146 (2002)
Article Google Scholar
Steels, L., Spranger, M., van Trijp, R., Höfer, S., Hild, M.: Emergent action language on real robots. In: Language Grounding in Robots, ch. 13, pp. 255–276. Springer, New York (2012)
Chapter Google Scholar
Wermter, S., Panchev, C., Arevian, G.: Hybrid neural plausibility networks for news agents. In: Proc. National Conference on Artificial Intelligence (AAAI 1999), Orlando, US, pp. 93–98 (July 1999)
Google Scholar
Yamashita, Y., Tani, J.: Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment. PLoS Computational Biology 4(11), e1000220 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Knowledge Technology, University of Hamburg, Vogt-Kölln-Straße 30, D - 22527, Hamburg, Germany
Stefan Heinrich, Cornelius Weber & Stefan Wermter

Authors

Stefan Heinrich
View author publications
You can also search for this author in PubMed Google Scholar
Cornelius Weber
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wermter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty Automation,, Technical University of Sofia, 8 St. Kl. Ohridski Blvd., 1000, Sofia, Bulgaria
Valeri Mladenov
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl.25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89075, Ulm, Germany
Günther Palm
Quartier UNIL-Dorigny, Bâtiment Internef, Université de Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa
Department of Computer Science, University of Milano, Via Comelico, 39, 20135, Milano, Italy
Bruno Appollini
Knowledge Engineering, School of Computing and Mathematical Sciences, Auckland University of Technology, 120 Mayoral Drive, 3rd floor, 1010, Auckland, New Zealand
Nikola Kasabov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heinrich, S., Weber, C., Wermter, S. (2013). Embodied Language Understanding with a Multiple Timescale Recurrent Neural Network. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds) Artificial Neural Networks and Machine Learning – ICANN 2013. ICANN 2013. Lecture Notes in Computer Science, vol 8131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40728-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-40728-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40727-7
Online ISBN: 978-3-642-40728-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics