Interactive Language Understanding with Multiple Timescale Recurrent Neural Networks

  • Stefan Heinrich
  • Stefan Wermter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8681)


Natural language processing in the human brain is complex and dynamic. Models for understanding, how the brain’s architecture acquires language, need to take into account the temporal dynamics of verbal utterances as well as of action and visual embodied perception. We propose an architecture based on three Multiple Timescale Recurrent Neural Networks (MTRNNs) interlinked in a cell assembly that learns verbal utterances grounded in dynamic proprioceptive and visual information. Results show that the architecture is able to describe novel dynamic actions with correct novel utterances, and they also indicate that multi-modal integration allows for a disambiguation of concepts.


Actions Embodied MTRNN Language Acquisition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alnajjar, F., Yamashita, Y., Tani, J.: The hierarchical and functional connectivity of higher-order cognitive mechanisms: neurorobotic model to investigate the stability and flexibility of working memory. Front. Neurorobotics 7(2), 13 (2013)Google Scholar
  2. 2.
    Badre, D., Kayser, A.S., D’Esposito, M.: Frontal cortex and the discovery of abstract action rules. Neuron. 66(2), 315–326 (2010)CrossRefGoogle Scholar
  3. 3.
    Barsalou, L.W.: Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008)CrossRefGoogle Scholar
  4. 4.
    Bear, M.F., Connors, B.W., Paradiso, M.A.: Neuroscience: Exploring the Brain, 3rd edn. Lippincott Williams & Wilkins (2006)Google Scholar
  5. 5.
    Braitenberg, V.: Cell assemblies in the cerebral cortex. In: Theoretical Approaches to Complex Systems, pp. 171–188. Springer, Heidelberg (1978)CrossRefGoogle Scholar
  6. 6.
    Cangelosi, A.: Grounding language in action and perception: From cognitive agents to humanoid robots. Physics of Life Reviews 7(2), 139–151 (2010)CrossRefGoogle Scholar
  7. 7.
    Feldman, J.A.: The neural binding problem(s). Cogn. Neurodyn. 7(1), 1–11 (2013)CrossRefGoogle Scholar
  8. 8.
    Heinrich, S., Weber, C., Wermter, S.: Adaptive learning of linguistic hierarchy in a multiple timescale recurrent neural network. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 555–562. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Heinrich, S., Weber, C., Wermter, S.: Embodied language understanding with a multiple timescale recurrent neural network. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 216–223. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Heinrich, S., Magg, S., Wermter, S.: Analysing the multiple timescale recurrent neural network for embodied language understanding. In: Koprinkova-Hristova, P.D., Mladenov, V.M., Kasabov, N.K. (eds.) Artificial Neural Networks - Methods and Applications in Bio-/Neuroinformatics. SSBN, vol. 4, p. 26. Springer, Heidelberg (in press, 2014)Google Scholar
  11. 11.
    Hinoshita, W., Arie, H., Tani, J., Okuno, H.G., Ogata, T.: Emergence of hierarchical structure mirroring linguistic composition in a recurrent neural network. Neural Networks 24(4), 311–320 (2011)CrossRefGoogle Scholar
  12. 12.
    Hoffmann, T., Trousdale, G. (eds.): The Oxford handbook of construction grammar. Oxford Univ. Press (2013)Google Scholar
  13. 13.
    Larochelle, H., Bengio, Y., Bengio, J., Lamblin, P.: Exploring strategies for training deep neural networks. The Journal of Machine Learning Research 10, 1–40 (2009)zbMATHGoogle Scholar
  14. 14.
    Orban, G.A.: Higher order visual processing in macaque extrastriate cortex. Physiological Reviews 88(1), 59–89 (2008)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Pulvermüller, F., Moseley, R.L., Egorova, N., Shebani, Z., Boulenger, V.: Motor cognitionmotor semantics: Action perception theory of cognition and communication. Neuropsychologia, 28 (2013) (in press)Google Scholar
  16. 16.
    Pulvermüller, F., Shtyrov, Y.: Spatiotemporal signatures of large-scale synfire chains for speech processing as revealed by MEG. Cereb. Cortex 19(1), 79–88 (2009)CrossRefGoogle Scholar
  17. 17.
    Roy, D., Mukherjee, N.: Towards situated speech understanding: Visual context priming of language models. Computer Speech and Language 19, 227–248 (2005)CrossRefGoogle Scholar
  18. 18.
    Stramandinoli, F., Marocco, D., Cangelosi, A.: The grounding of higher order concepts in action and language: A cognitive robotics model. Neural Networks 32, 165–173 (2012)CrossRefGoogle Scholar
  19. 19.
    Wermter, S., Page, M., Knowles, M., Gallese, V., Pulvermüller, F., Taylor, J.G.: Multimodal communication in animals, humans and robots: An introduction to perspectives in brain-inspired informatics. Neural Networks 22(2), 111–115 (2009)CrossRefGoogle Scholar
  20. 20.
    Yamashita, Y., Tani, J.: Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment. PLoS Computational Biology 4(11), e1000220 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Stefan Heinrich
    • 1
  • Stefan Wermter
    • 1
  1. 1.Department of Informatics, Knowledge TechnologyUniversity of HamburgHamburgGermany

Personalised recommendations