Vision, action and language unified through embodiment

Caligiore, Daniele; Fischer, Martin H.

doi:10.1007/s00426-012-0417-0

Vision, action and language unified through embodiment

Editorial
Published: 07 February 2012

Volume 77, pages 1–6, (2013)
Cite this article

Download PDF

Psychological Research Aims and scope Submit manuscript

Vision, action and language unified through embodiment

Download PDF

Daniele Caligiore¹ &
Martin H. Fischer²

2448 Accesses
16 Citations
Explore all metrics

Introduction

Increasing evidence shows that vision, action and language should not be regarded as a set of disembodied processes. Instead, they form a closely integrated and highly dynamic system that is attuned to the constraints of its bodily implementation as well as to the constraints coming from the world with which this body interacts. One consequence of such embodiment of cognition is that seeing an object, even when there is no intention to handle it, activates plans for actions directed toward it (e.g., Tucker & Ellis, 1998, 2001; Fischer & Dahl, 2007). Using object names induces similar action planning effects as seeing the objects themselves (Tucker & Ellis, 2004; Borghi, Glenberg & Kaschak, 2004). Depending on linguistic context, different object features can be activated for action planning, as indicated by facilitated manual responses or “affordance effects” (e.g., Borghi, 2004; Glenberg & Robertson, 2000; Zwaan, 2004). Similarly, different action intentions direct attention differently to object features for processing (e.g., Bekkering & Neggers, 2002; Fischer & Hoellen, 2004; Symes, Tucker, Ellis, Vainio, & Ottoboni, 2008). Eye movements during visually guided actions shed further light on the close relationship between vision, action and language (Land & Furneaux, 1997; Johansson, Westling, Bäckström, & Flanagan, 2001). For example when humans interact with objects, their eyes move ahead of their hands to support the on-line control of grasping (e.g., Bekkering & Neggers, 2002).

These behavioral results are supported by brain imaging studies of object affordances in humans (e.g., Grèzes, Tucker, Armony, Ellis, & Passingham, 2003) and single cell recordings in monkeys (e.g., Sakata, Taira, Mine, & Murata, 1992; Fadiga, Fogassi, Gallese, & Rizzolatti, 2000). Together, these behavioral and neuroscientific studies have recently begun to inform computational models of embodied cognition. For example, Tsiotas, Borghi and Parisi (2005) devised an artificial life simulation to give an evolutionary account of some affordance effects, and Caligiore, Borghi, Parisi, and Baldassarre (2010) proposed a computational model to account for several affordance-related effects in grasping, reaching, and language. The neuroscientific constraints implemented in the design of the model allow its authors to investigate the neural mechanisms underlying affordance selection and control. The present special issue brings together recent developments at the intersection between behavioral, neuroscientific, and computational approaches to embodied cognition.

Strong support for the close link between vision, action and language comes from studies which highlight how language processing and comprehension make use of neural systems ordinarily used for perception and action (Lakoff, 1987; Zwaan, 2004; Barsalou, 1999; Glenberg & Robertson, 1999; Gallese, 2008; Glenberg, 2010). For example, when humans process the word “cup” they seem to reenact (and therefore internally simulate) many of the perceptual, motor and affective representations related to a cup (Barsalou, 1999). In a similar way sentences and abstract words are understood by creating a simulation of the actions underlying them (Glenberg and Kaschak, 2002; see also Borghi & Cimatti, 2009, for a new formulation of embodiment of abstract words which includes social aspects). Moreover, when hearing a verbal description of a visually available scene humans tend to look at objects that are about to be mentioned (Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995), indicating rapid and predictive comprehension that is tightly linked to action. Several computational models have therefore implemented simple learning mechanisms (such as Hebbian rules) to create associations between patterns of active neurons representing the phonological aspects of words and internal simulations (i.e., representations of object features involved in perception and action, cf. Jeannerod, 2007; Mayor & Plunkett, 2010; Caligiore et al., 2010; Li, Farkas, & MacWhinney, 2004).

Grounded cognition theories have found a neurophysiological basis in the recent discovery, in monkeys as well as in humans, of two kinds of visuomotor neurons: canonical and mirror neurons (Rizzolatti & Craighero, 2004; Buccino, Binkofski, & Riggio, 2004). Canonical neurons discharge to the visual presentation of objects that can be grasped with a specific type of prehension (object directed action), motorically coded by these neurons even when a grasping movement is not required. Mirror neurons, instead, fire when the monkey makes a goal-directed action and also when it observes another monkey or an experimenter performing the same or a similar action. Recent studies, mainly based on brain imaging techniques, indicate the existence of both canonical and mirror neurons in humans (Buccino et al., 2001; Grèzes et al., 2003; Johnson-Frey et al., 2003; Fadiga et al., 2006). Using fMRI it has been shown that in humans the observation of both object directed actions and mimed actions leads to activation of different regions in the premotor cortex, including the Broca’s region (Buccino et al., 2001) and the parietal cortex (Fogassi, Ferrari, Gesierich, Rozzi, Chersi, & Rizzolatti, 2005). The relationship between canonical and mirror neurons and their roles in different cognitive functions, including language processing, has to be better investigated (see Thill, Caligiore, Borghi, Ziemke, & Baldassarre 2012, submitted, for an up-to-date review on these topics).

In an influential paper, Rizzolatti and Arbib (1998) proposed that the matching process embodied by mirror neurons represents the basic mechanism from which language evolved. In the last decade this claim has been strongly supported by a series of experimental studies (for reviews see Pulvermüller, 2005; Willems & Hagoort, 2007). First, in an event-related fMRI study, the silent reading of words referring to face, arm or leg actions activated premotor–motor areas related to the word meanings (Hauk, Johnsrude, & Pulvermüller, 2004). An MEG study showed that reading action verbs activates motor and premotor cortices both rapidly and in a somatotopic fashion (Pulvermüller, 2005), thus suggesting that motor activation is inherent to lexical processing. In a further fMRI study, listening to sentences expressing mouth, hand and foot actions produced activation of effector-congruent sectors of the premotor cortex (Tettamanti et al., 2005). Interestingly, these distinct sectors coincide, albeit only approximately, with those active during the observation of hand, mouth and foot actions (Buccino et al., 2001).

These data support the notion that the mirror neuron system is involved not only in understanding visually presented actions, but also in coding acoustically presented action-related sentences. Several studies showed that similar mechanisms of motor resonance are active when we understand hand and mouth actions including speech production. First, grasping movements influence syllable pronunciation when executed (Gentilucci, Benuzzi, Gangitano, & Grimaldi, 2001) as well as when merely observed (Gentilucci, 2003). Second, both listening and observing speech movements causes an increase of motor evoked potentials recorded from tongue and lip muscles (Watkins, Strafella, & Paus, 2003; Pulvermüller & Fadiga, 2010). Finally, evidence for a link between gesturing and the speech system also comes from clinical studies: Hanlon, Brown and Gerstman (1990) showed that aphasic patients’ object naming benefits from pointing with the right hand to the referents.

Investigations into the integration of vision, action and language have greatly benefited from the use of computational models. The linguistic abilities of an artificial agent whose behavior is established by a computational model is strictly dependent on, and grounded in, other perceptual and motor skills (MacWhinney, 1998; Cangelosi & Riga, 2006; Cangelosi & Parisi, 2002). Such a grounded and embodied approach to language design is consistent with the theories of the grounding of language discussed above. In these models there exists an intrinsic link between the communication symbols (words) used by the agent and its own cognitive representations (meanings) of the perceptual and sensorimotor interaction with the external world (referents) (Steels & Vogt, 1997; Steels, 2003; Yoon, Heinke & Humphreys, 2002). Cangelosi, Hourdakis, and Tikhanoff (2006) proposed a neural network model in a robotic set-up as a model of language acquisition. The authors show how a robot can acquire new concepts of actions via linguistic instructions. Moreover, the associative mechanisms involving words and categorical representations of objects are used to transfer the compositionality properties of language to sensorimotor representations. In the same line Chersi, Thill, Ziemke and Borghi (2010) recently used a computational model to show how sentence processing might involve similar chaining mechanisms as does action sequence organization (Fogassi et al., 2005). Many other embodied modeling issues remain to be resolved (Pezzulo, Barsalou, Cangelosi, Fischer, Spivey, & McRae, 2011).

The various studies mentioned above investigate the integration of vision, action, and language through embodiment from rather different perspectives. We have discussed results from behavioral experiments as well as from neuroscientific and computational modeling. Unfortunately, despite the converging results obtained across these disciplines there is currently very little proper discussion and exchange of views among the experts that use these different perspectives. We believe that this kind of multi-methodological and multidisciplinary discussion is a useful and necessary step to share the advantages of each approach and to achieve a cumulative understanding of the neural mechanisms the brain uses to deal with the integration of vision, action and language into embodied behavior. Instead of stressing the differences between the different approaches, it could be productive to focus on their overlapping traits and on their enormous potential for cross-fertilization. The aim of this special issue is, therefore, to direct the attention of scientists who work with different approaches toward an inter-disciplinary discussion about vision, action and language unified through embodiment.

Content of the special issue

This issue of Psychological Research has a clear focus: understanding how the results from different scientific approaches could be shared to energize the empirical and theoretical discussion on the integration of vision, action and language by embodiment. The contributions to this special issue cover a wide range of methodologies, from psychophysics to computational modeling, from classical behavioral methods to neuropsychological and brain imaging approaches. The majority of contributions goes beyond the popular reaction time methodology and uses movement-related performance to strengthen the case for an embodiment of concept activation.

Some themes explored in the special issue regard: What is the organization (intended at several levels such as brain, functional, computational) needed to support integration of vision, action and language? How does the timing in language processing influence the embodiment of language representations? What is the relationship between canonical and mirror neurons in action and language organization? What is the social influence on affordances perception and organization? Is the motor system involved in understanding of concrete nouns, as it is for concrete verbs? The range of perspectives of the papers proposed in this special issue offers psychological, neuroscientific and computational modeling evidence in the investigation of these questions.

Caligiore, Borghi, Parisi, Ellis, Cangelosi, and Baldassarre (2012) propose an extended version of the TRoPICALS computational model (Caligiore et al., 2010) aimed at better understanding the mechanisms underlying positive as well as negative compatibility effects observed in behavioral experiments. The model addresses the case of distractor objects which, although irrelevant for the agent’s goals, activate affordances that have to be actively suppressed. The simulations fully replicate the findings reported in the literature. The authors further simulate damages to the model that are similar to those found in Parkinson’s Disease in order to predict compatibility effects that might be found with these patients in future experiments.

De Vega, Moreno, and Castillo (2012) present two experiments that look at changes in motor compatibility effects during comprehension based on the relative timing of the motor response to the processing of action-relevant language. The authors show that at short stimulus onset asynchrony, the traditional motor compatibility effect is reversed: participants are faster to respond when the direction of the action in the sentence mismatches the direction of the motor response that needs to be made. The work deals with a timely and important issue, and the data help to reconcile some differing results that have been reported about facilitating and interfering observations of motor compatibility effects.

Ellis, Swabey, Bridgeman, May, Tucker, and Hyne (2012) report a behavioral study to investigate the interaction of the mirror neuron system and the canonical neuron system when humans observe other agents acting on objects irrespective of theirs goals. They make a case for regarding them as different aspects of a common system for orchestrating the actions of agents.

Gianelli, Scorolli, and Borghi (2012) present an empirical study to investigate the effects of social influences on kinematic features of a reaching and grasping movements. They recorded reaching and grasping movements in the presence of a second person which could be either a friend or no friend. The authors demonstrate that the social relationship between a performer and a second person affected kinematic features of the task. Moreover, speaking sentences related to the reaching and grasping task had an effect depending on whether “I” or “you” was used as a pronoun. These results point in the direction of social motor control as a novel field of embodiment research.

Iizuka, Marocco, Ando, and Maeda (2012) present an empirical analysis of how a communication system emerges spontaneously between two interacting individuals in the absence of a specifically predefined communication channel. Participants tried to communicate to each other the identity of viewed objects by sliding fingers on a signaling device. The emerging communication patterns suggested gradual emergence of turn taking, association between behavior and perceptual categories, and the acquisition of novel meanings. These observations investigate the foundations of our sociality.

Marino, Gough, Gallese, Riggio, and Buccino (2012) offer new empirical evidence of embodied meaning associated to action-related nouns rather than verbs. The work addresses the crucial open question of whether the motor system is involved during the understanding of concrete nouns, as it is for concrete verbs. The results are discussed in terms of motor processes in the left brain hemisphere associated with action nouns.

Weiner and Grill-Spector (2012) summarize the results of two recently published studies (Weiner & Grill-Spector, 2010, 2011) that investigated the distribution of face and limb selectivity in human visual cortex. They propose a new three-stream model of high-level visual cortex which includes ventral, lateral and dorsal areas where multimodal processing related to vision, action and language might converge. Just as the other contributions to this special issue, this programmatic proposal sets a framework for a much needed dialog between disciplines.

Toward a common framework to study the embodiment of vision, action and language

The accumulation of evidence in favor of embodied cognition, which comes from such different disciplines as psychology, neuroscience and robotics, confirms the importance of the topic of this special issue for the wider scientific community. However, the multi-disciplinary and multi-methodological nature of the available data raises an important question: is it possible to find a common framework to interpret and explain data deriving from such vastly different methods? This is a crucial point because such a common framework could support cross-fertilization among different disciplines and, importantly, could help to discover general principles underlying the embodiment of vision, action and language. However, considering a framework that is understandable by scientists with dramatically different backgrounds who often use different terminology to indicate the similar phenomena is not trivial (Hommel & Colzato, 2010).

In the last decade some valid attempts, mainly using computational approaches, have been proposed (Arbib & Lee, 2007; Garagnani, Wennekers, & Pulvermüller, 2008; O’Reilly, 1998; Rothkopf & Ballard, 2010). Arbib and colleagues designed several models, including the FARS model (Fagg & Arbib, 1998) and various incarnations of the MNS models (MNS: Oztop & Arbib, 2002; MNS2-I: Bonaiuto, Rosta, & Arbib, 2007; MNS2-II: Bonaiuto & Arbib, 2010), that might be conducive to the intended cross-disciplinary investigation the topic of this special issue. Two other proposals merit attention in this regard since they have started to formalize some procedures to build cross-disciplinary frameworks to investigate psychological and neuroscientific phenomena. These two methods are: the brain-based devices (BBDs) approach (Fleischer & Edelman, 2009) and the computational embodied neuroscience (CEN) approach (Caligiore et al., 2010; Mannella, Mirolli, & Baldassarre, 2010; cf. Prescott, Montes-Gonzalez, Gurney, Humphries, & Redgrave, 2006, for a similar but less principled approach).

The BBDs approach and the CEN method are similar in conception. The key features of models based on these two methods are: (a) a simulated brain whose anatomy and physiology is constrained by knowledge about real brains; (b) an embodied system which operates in a real environment; (c) the comparison with data from behavioral experiments; (e) the adaptive learning of the behavior. However, differently from BBDs, the CEN approach is also guided by the further and fundamental meta-constraint of theoretical cumulativity. This idea aims at producing general models that account for an increasing number of experiments, avoiding at the same time to build ad-hoc models which account for only specific single experiments. In this way it could be possible to isolate general principles underlying the class of studied phenomena, thereby producing theoretical cumulativity.

To facilitate the integration among different perspectives and different methods it will also be crucial to design “system-level models”. This means that the main goal of the model should be to provide an operational hypothesis about the cerebral network of networks which underlies the investigated behavior. The system-level approach postulates that the different classes of behaviors are generated by the interplay of different subsets of components of the brain, rather than by specific components in isolation. In this way it will be possible to outline an integrated hypothesis about the system-level architectural and functioning brain mechanisms which might underlie the behavior under investigation. For example, a system-level model might take into account both cortical (Rizzolatti & Arbib, 1998) and sub-cortical (Strick, Dum, & Fiez, 2009) mechanisms underlying the embodiment of language, or might facilitate the interpretation of brain imaging data (Friston, 2009). We hope that, in the future, designing theoretical and computational frameworks using multi-disciplinary approaches such as those proposed by the BBDs and CEN methods will help to provide a unified view of embodiment of vision, action, and language, highlighting all its challenging aspects and fostering further research into this exciting topic.

References

Arbib, M. A., & Lee, J. Y. (2007). Vision and action in the language-ready brain: From mirror neurons to SemRep. Lecture Notes in Computer Science, 4729, 104.
Article Google Scholar
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22, 577–660.
PubMed Google Scholar
Bekkering, H., & Neggers, S. F. W. (2002). Visual search is modulated by action intentions. Psychological Science, 13, 370–374.
Article PubMed Google Scholar
Bonaiuto, J., & Arbib, M. A. (2010). Extending the mirror neuron system model, II: What did I just do? A new role for mirror neurons. Biological Cybernetics, 102, 341–259.
Google Scholar
Bonaiuto, J., Rosta, E., & Arbib, M. A. (2007). Extending the mirror neuron system model, I. Biological Cybernetics, 96, 9–38.
Article PubMed Google Scholar
Borghi, A. M. (2004). Object concepts and action: Extracting affordances from objects parts. Acta Psychologica, 115(1), 69–96.
Article PubMed Google Scholar
Borghi, A. M., & Cimatti, F. (2009). Words as tools and the problem of abstract words meanings. In N. Taatgen & H. van Rijn (Eds.), Proceedings of the 31st Annual Conference of the Cognitive Science Society (pp. 2304–2309). Amsterdam: Cognitive Science Society.
Google Scholar
Borghi, A. M., Glenberg, A. M., & Kaschak, M. (2004). Putting words in perspective. Memory & Cognition, 32, 863–873.
Article Google Scholar
Buccino, G., Binkofski, F., Fink, G. R., Fadiga, L., Fogassi, L., Gallese, V., et al. (2001). Action observation activates premotor and parietal areas in a somatotopic manner: an fMRI study. European Journal of Neuroscience, 13, 400–404.
PubMed Google Scholar
Buccino, G., Binkofski, F., & Riggio, L. (2004). The mirror neuron system and action recognition. Brain and Language, 89, 370–376.
Article PubMed Google Scholar
Caligiore, D., Borghi, A. M., Parisi, D., & Baldassarre, G. (2010). TRoPICALS: A computational embodied neuroscience model of compatibility effects. Psychological Review, 117, 1188–1228.
Article PubMed Google Scholar
Caligiore, D., Borghi, A. M., Parisi, D., Ellis, R., Cangelosi, A., & Baldassarre, G. (2012). How affordances associated with a distractor object affect compatibility effects: A study with the computational model TRoPICALS. Psychological Research (this issue).
Cangelosi, A., Hourdakis, E., & Tikhanoff, V. (2006). Language acquisition and symbol grounding transfer with neural networks and cognitive robots. The 2006 IEEE International Joint Conference on Neural Network Proceedings (pp. 1576–1582).
Cangelosi, A., & Parisi, D. (2002). Simulating the evolution of language. In A. Cangelosi & D. Parisi (Eds.), Simulating the Evolution of Language (p. xii + 356). Berlin: Springer.
Cangelosi, A., & Riga, T. (2006). An embodied model for sensorimotor grounding and grounding transfer: experiments with robots. Cognitive Science, 30, 673–689.
Article PubMed Google Scholar
Chersi, F., Thill, S., Ziemke, T., & Borghi, A. M. (2010) Sentence processing: Linking language to motor chains. Frontiers in Neurorobotics 4(4). doi:10.3389/fnbot.2010.00004.
De Vega, M., Moreno, V., & Castillo, D. (2012). The comprehension of action-related sentences may cause interference rather than facilitation on matching actions. Psychological Research (this issue).
Ellis, R., Swabey, D., Bridgeman, J., May, B., Tucker, M., & Hyne, A. (2012). Bodies and other visual objects: The dialectics of reaching toward objects. Psychological Research (this issue).
Fadiga, L., Craighero, L., Fabbri Destro, M., Finos, L., Cotillon Williams, N., Smith, A. T., et al. (2006). Language in shadow. Social Neuroscience, 1, 77–89.
Article PubMed Google Scholar
Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (2000). Visuomotor neurons: Ambiguity of the discharge or ‘motor’ perception? International Journal of Psychophysiology, 35, 165–177.
Article PubMed Google Scholar
Fagg, A. H., & Arbib, M. A. (1998). Modeling parietal-premotor interaction in primate control of grasping. Neural Networks, 11, 1277–1303.
Article PubMed Google Scholar
Fischer, M. H., & Dahl, C. (2007). The time course of visuo-motor affordances. Experimental Brain Research, 176, 519–524.
Article Google Scholar
Fischer, M. H., & Hoellen, N. (2004). Space- and object-based attention depend on motor intention. The Journal of General Psychology, 13, 365–377.
Google Scholar
Fleischer, J. G., & Edelman, G. M. (2009). Brain-based devices: An embodied approach to linking nervous system structure and function to behavior. IEEE Robotics & Automation Magazine, 16, 33–41.
Article Google Scholar
Fogassi, L., Ferrari, P. F., Gesierich, B., Rozzi, S., Chersi, F., & Rizzolatti, G. (2005). Parietal lobe: From action organization to intention understanding. Science, 308, 662–667.
Article PubMed Google Scholar
Friston, K. (2009). Causal modelling and brain connectivity in functional magnetic resonance imaging. PLoS Biol, 7(2), e33.
Article PubMed Google Scholar
Gallese, V. (2008). Mirror neurons and the social nature of language: The neural exploitation hypothesis. Social Neuroscience, 3, 317–333.
Article PubMed Google Scholar
Garagnani, M., Wennekers, T., & Pulvermüller, F. (2008). A neuroanatomically grounded Hebbian-learning model of attention-language interactions in the human brain. European Journal of Neuroscience, 27, 492–513.
Article PubMed Google Scholar
Gentilucci, M. (2003). Grasp observation influences speech production. European Journal of Neuroscience, 17, 179–184.
Article PubMed Google Scholar
Gentilucci, M., Benuzzi, F., Gangitano, M., & Grimaldi, S. (2001). Grasp with hand and mouth: A kinematic study on healthy subjects. Journal of Neurophysiology, 86, 1685–1699.
PubMed Google Scholar
Gianelli, C., Scorolli, C., & Borghi, A. M. (2012). Acting in perspective: The role of body and language as social tools. Psychological Research (this issue).
Glenberg, A. M. (2010). Embodiment as a unifying perspective for psychology. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 586–596.
Google Scholar
Glenberg, A. M., & Kaschak, M. P. (2002). Grounding language in action. Psychonomic Bulletin & Review, 9, 558–565.
Article Google Scholar
Glenberg, A. M., & Robertson, D. A. (1999). Indexical understanding of instructions. Discourse Processes, 28, 1–26.
Article Google Scholar
Glenberg, A. M., & Robertson, D. A. (2000). Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning. Journal of Memory and Language, 43, 379–401.
Article Google Scholar
Grèzes, J., Tucker, M., Armony, J., Ellis, R., & Passingham, R. E. (2003). Objects automatically potentiate action: An fMRI study of implicit processing. European Journal of Neuroscience, 17, 2735–2740.
Article PubMed Google Scholar
Hanlon, R. E., Brown, J. W., & Gerstman, L. J. (1990). Enhancement of naming in nonfluent aphasia through gesture. Brain and Language, 38, 298–314.
Article PubMed Google Scholar
Hauk, O., Johnsrude, I., & Pulvermüller, F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41, 301–307.
Article PubMed Google Scholar
Hommel, B., & Colzato, L. S. (2010). Games with(out) Frontiers: Towards an integrated science of human cognition. Frontiers in Psychology, 1, 2.
Article PubMed Google Scholar
Iizuka, H., Marocco, D., Ando, H., & Maeda, T. (2012). An experimental study on co-evolution of categorical perception and communication system in humans. Psychological Research (this issue).
Jeannerod, M. (2007). Motor cognition: What actions tell the self. Oxford: Oxford University Press.
Google Scholar
Johansson, R. S., Westling, G., Bäckström, A., & Flanagan, J. R. (2001). Eye-hand coordination in object manipulation. Journal of Neuroscience, 21, 6917–6932.
PubMed Google Scholar
Johnson-Frey, S. H., Maloof, F. R., Newman-Norlund, R., Farrer, C., Inati, S., & Grafton, S. T. (2003). Actions or hand-objects interactions? Human inferior frontal cortex and action observation. Neuron, 39, 1053–1058.
Article PubMed Google Scholar
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.
Google Scholar
Land, M. F., & Furneaux, S. (1997). The knowledge base of the oculomotor system. Philosophical Transactions of the Royal Society of London—Series B: Biological Sciences, 352(1358), 1231–1239.
Article PubMed Google Scholar
Li, P., Farkas, I., & MacWhinney, B. (2004). Early lexical development in a self-organizing neural network. Neural Networks, 17, 1345–1362.
Article PubMed Google Scholar
MacWhinney, B. (1998). Models of the emergence of language. Annual Review of Psychology, 49, 199–227.
Article PubMed Google Scholar
Mannella, F., Mirolli, M., & Baldassarre, G. (2010). The interplay of Pavlovian and instrumental processes in devaluation experiments: A computational embodied neuroscience model tested with a simulated rat. In C. R. Tosh & G. D. Ruxton (Eds.), Modelling perception using artificial neural networks. Cambridge: Cambridge University Press.
Google Scholar
Marino, B., Gough, P. M., Gallese, V., Riggio, L., & Buccino, G. (2012). How the motor system handles nouns: A behavioural study. Psychological Research (this issue).
Mayor, J., & Plunkett, K. (2010). A neurocomputational account of taxonomic responding and fast mapping in early word learning. Psychological Review, 117, 1–31.
Article PubMed Google Scholar
O’Reilly, R. C. (1998). Six Principles for Biologically-Based Computational Models of Cortical Cognition. Trends in Cognitive Sciences, 2, 455–462.
Article PubMed Google Scholar
Oztop, E., & Arbib, M. A. (2002). Schema design and implementation of the grasp-related mirror neuron system. Biological Cybernetics, 87, 116–140.
Article PubMed Google Scholar
Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., Spivey, M., & McRae, K. (2011). The mechanics of embodiment: A dialogue on embodiment and computational modeling. Frontiers in Psychology,. doi:10.3389/fpsyg.2011.00005.
PubMed Google Scholar
Prescott, T. J., Montes-Gonzalez, F., Gurney, K., Humphries, M. D., & Redgrave, P. (2006). A robot model of the basal ganglia: Behavior and intrinsic processing. Neural Networks, 19, 31–61.
Article PubMed Google Scholar
Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Review Neuroscience, 6, 576–582.
Article Google Scholar
Pulvermüller, F., & Fadiga, L. (2010). Active perception: Sensorimotor circuits as a cortical basis for language. Nature Reviews Neuroscience, 11, 351–360.
Article PubMed Google Scholar
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–194.
Article PubMed Google Scholar
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169–192.
Article PubMed Google Scholar
Rothkopf, C. A., & Ballard, D. H. (2010). Credit assignment in multiple goal embodied visuomotor behavior. Frontiers in Psychology, 1, 173.
Article PubMed Google Scholar
Sakata, H., Taira, M., Mine, S., & Murata, A. (1992). Hand-movement-related neurons of the posterior parietal cortex of the monkey: Their role in the visual guidance of hand movements. In R. Caminiti, P. B. Johnson, & Y. Burnod (Eds.), Control of arm movement in space: Neurophysiological and computational approaches (pp. 185–198). Heidelberg: Springer.
Chapter Google Scholar
Steels, L. (2003). Evolving grounded communication for robots. Trends in Cognitive Sciences, 7, 308–312.
Google Scholar
Steels, L., & Vogt, P. (1997). Grounding adaptive language games in robotic agents. In I. Harvey & P. Husbands (Eds.), Proceedings of the fourth european conference on artificial life (pp. 474–482). MIT Press.
Strick, P. L., Dum, R. P., & Fiez, J. A. (2009). Cerebellum and non-motor function. Annual Review of Neuroscience, 32, 413–434.
Article PubMed Google Scholar
Symes, E., Tucker, M., Ellis, R., Vainio, L., & Ottoboni, G. (2008). Grasp preparation improves change detection for congruent objects. Journal of Experimental Psychology: Human Perception and Performance, 34, 854–871.
Article PubMed Google Scholar
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. E. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632–1634.
Article PubMed Google Scholar
Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P., et al. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. Journal of Cognitive Neuroscience, 17, 273–281.
Article PubMed Google Scholar
Thill, S., Caligiore, D., Borghi, A. M., Ziemke, T., & Baldassarre, G. (2012). Integrating theories and computational models of affordance control and mirror neurons: A review and a model design. Neuroscience & Biobehavioral Reviews (Submitted).
Tsiotas, G., Borghi, A. M., & Parisi, D. (2005). Objects and affordances: An Artificial Life simulation. In B. Bara, L. Barsalou, B. Bucciarelli (Eds.), COGSCI2005. XXVII Annual Conference of the Cognitive Science Society (pp. 2212–2217). Mahwah, N. J.: Lawrence Erlbaum Associates.
Tucker, M., & Ellis, R. (1998). On the relations between seen objects and components of potential actions. Journal of Experimental Psychology: Human Perception and Performance, 24, 830–846.
Article PubMed Google Scholar
Tucker, M., & Ellis, R. (2001). The potentiation of grasp types during visual object categorization. Visual Cognition, 8, 769–800.
Article Google Scholar
Tucker, M., & Ellis, R. (2004). Action priming by briefly presented objects. Acta Psychologica, 116, 185–203.
Article PubMed Google Scholar
Watkins, K. E., Strafella, A. P., & Paus, T. (2003). Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia, 41, 989–994.
Article PubMed Google Scholar
Weiner, K. S., & Grill-Spector, K. (2010). Sparsely-distributed organization of face and limb activations in human ventral temporal cortex. Neuroimage, 52, 1559–1573.
Google Scholar
Weiner, K. S., & Grill-Spector, K. (2011). Not one extrastriate body area: Using anatomical landmarks, hMT+, and visual field maps to parcellate limb-selective activations in human lateral occipitotemporal cortex. Neuroimage, 56, 2183–2199.
Google Scholar
Weiner, K. S., & Grill-Spector, K. (2012). Neural representations of faces and limbs neighbor in human high-level visual cortex: Evidence for a new organization principle. Psychological Research (this issue).
Willems, R. M., & Hagoort, P. (2007). Neural evidence for the interplay between language, gesture and action: A review. Brain and Language, 101, 278–289.
Article PubMed Google Scholar
Yoon, E. Y., Heinke, D., & Humphreys, G. (2002). Modelling direct perceptual constraints on action selection: The Naming and Action Model (NAM). Visual Cognition, 9, 615–661.
Google Scholar
Zwaan, R. A. (2004). The immersed experiencer: Toward an embodied theory of language comprehension. In B. H. Ross (ed.), Psychology of learning and motivation, (vol. 44, pp. 35–62). New York: Academic.

Download references

Acknowledgments

We would like to thank all the reviewers that with their work have greatly enhanced the quality of this special issue and Dr. Gianluca Baldassarre for his precious comments on this editorial. This work was supported by the EU funded Projects “IM CLeVeR—Intrinsically Motivated Cumulative Learning Versatile Robots”, contract no. FP7-IP—231722, and “VALUE—Vision, Action, and Language Unified by Embodiment”, EPSRC Grant EP/F026471.

Author information

Authors and Affiliations

Laboratory of Computational Embodied Neuroscience, Istituto di Scienze e Tecnologie della Cognizione, Consiglio Nazionale delle Ricerche (LOCEN-ISTC-CNR), Roma, Italy
Daniele Caligiore
Division of Cognitive Sciences, University of Potsdam, Potsdam, Germany
Martin H. Fischer

Authors

Daniele Caligiore
View author publications
You can also search for this author in PubMed Google Scholar
Martin H. Fischer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele Caligiore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caligiore, D., Fischer, M.H. Vision, action and language unified through embodiment. Psychological Research 77, 1–6 (2013). https://doi.org/10.1007/s00426-012-0417-0

Download citation

Received: 19 January 2012
Accepted: 20 January 2012
Published: 07 February 2012
Issue Date: January 2013
DOI: https://doi.org/10.1007/s00426-012-0417-0

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Vision, action and language unified through embodiment

Introduction

Content of the special issue

Toward a common framework to study the embodiment of vision, action and language

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation