Naming and Request communicative actions performed with the same linguistic materials demonstrated significantly different spatio-temporal patterns of brain activation. At the earliest latencies (50–90 ms), Request-elicited brain responses were greater than those to Naming especially in the right sensorimotor and temporo-parietal areas related to action and theory of mind processing. Naming predominantly activated the left angular gyrus within the first 200 ms, possibly reflecting an emphasis on the access to referential semantic knowledge. These initial activations were followed (200–300 ms) by the engagement of the left temporo-parietal junction, anterior cingulate, and medial prefrontal cortex, and also the right inferior frontal gyrus and bilateral posterior temporal cortex. These results agree with the previous findings showing the early involvement of both the action and theory of mind networks in Request processing and reveal additional specific pattern interpretable as the cortical signature of Naming. They shed new light on the temporal dynamics of action and ToM-related physiological processes. Activation in the medial frontal and anterior cingulate regions, as part of the ToM network not previously detected with EEG, appeared later than the activation of the mirror neuron circuits in the fronto-parietal areas. Thus, these results suggest that action structures and intentions underlying speech acts are processed first (~100 ms) and other aspects of theory of mind and self/other mental simulation may emerge at a second processing step (200–300 ms). These findings are discussed in more detail below.
Early Action and Intention Processing in Speech Act Comprehension
In a previous study (Egorova et al. 2013), we reported surprisingly early brain signatures of linguistic-pragmatic processes. A shortcoming of that previous work was the block presentation of speech acts and resultant predictability of speech act types from context. High predictability of speech acts is characteristic of some but not all situations in every day communication (e.g. naming or requesting multiple items is typical when ordering food in a restaurant but not during a dinner conversation). To overcome the restriction to predictable speech acts, the present experimental setup increased the uncertainty of the upcoming speech act types. Although, in principle, uncertainty could increase the processing demands and thus require more time for understanding the communicative function of an utterance, the present results suggest that with single-trial presentation of the stimuli embedded in a wider range of action sequences, the differences between Naming and Requesting appear even earlier than observed before. The first brain activation differences between Naming and Requests were here significant as early as 50–90 ms after the critical word onset. Activity in this time window has previously been shown to be relevant for lexical processing of spoken words and originated predominantly in perisylvian areas (MacGregor et al. 2012). In the current study, however, the pattern of activation was different and included the right dorsolateral premotor cortex, posterior temporal cortex, angular gyrus and temporo-parietal junction for the Request condition.
Several studies identified activity in this early time window as relevant for predictive language processing. For example, somewhat similarly to the current experiment, Dambacher et al. (2009) found that the same words in identical sentence frames elicited differential ERPs in the left occipital and right frontal electrodes between 50 and 90 ms after the word onset when they appeared in highly predictable versus unpredictable contexts. Early contextual prediction effects 50–250 ms after the word onset were also reported by Van Berkum et al. (2005). Other studies on contextual prediction in language have found that predictive contexts elicit stronger activations for the predicted stimuli even before the onset of the critical words. For instance, in a study by Dikker and Pylkkänen (2012) using highly predictive contexts, increased theta-band (4–7 Hz) activity appeared in the left middle temporal cortex, occipital visual cortex and ventral medial prefrontal cortex already 400 ms before the word onset. DeLong et al. (2005) showed differences between high and low-cloze probability words in strongly predictive context 200–500 ms after the onset of the article preceding the critical word. The evidence of predictive processing can be found in phonologically triggered lexical processing (cohort activation), semantic and syntactic ambiguity resolution, in computation of cloze probabilities, or in conversational turn-taking (Van Berkum 2010; Kutas et al. 2011). Therefore it is possible that contextual prediction could be relevant for speech act comprehension as well, resulting in early or even pre-stimulus differences between Naming and Requesting.
In the current results, however, no pre-stimulus differences between the speech acts of Naming and Requesting were observed, as confirmed by the statistical analysis in the pre-stimulus interval following the context sentences. This was expected in the context of the present design characterised by multiple uncertainty levels, which made it impossible to predict with certainty the upcoming speech act type based on the preceding context alone, as several speech acts, including Rejection and Correction, could appear instead of the Naming/Requesting word utterances. Thus, the observed activation differences elicited by the critical speech acts cannot be adequately explained by predictive context. Rather, this early activation appears to be triggered by the critical words/speech acts per se and potentially reflects initial stages of speech act comprehension.
As soon as the critical word either Naming or Requesting an object appeared on the screen, instantly (50–90 ms) speech act specific activation was observed in the frontal and temporo-parietal brain areas known to support action and action sequence information processing. Although within the experimental setup both speech act types appeared in matched contextual embedding (each being preceded by a context sentence and followed by an overt action etc.), all brain areas showing speech act differences in the first time window exhibited the Request>Naming pattern. Requests characterised by a more complex action sequence structure seem to require more elaborate processing of action-related information in participants, which may be reflected in the relatively more expressed activation in bilateral mirror neuron systems for action processing. Therefore, this early difference may index processing of the speech act type and its characteristic action sequence structure. In the context of the current cognitive and neurobiological models, these processes can be described in terms of embodied mental simulation (as in Barsalou 2010) of speech acts subserved by action perception circuits including mirror neurons with specific neuropragmatic function (Pulvermüller and Fadiga 2010; Egorova et al. 2013).
In principle, these results are consistent with recent views on the role of rapid prediction in language production and comprehension, which suggested that the speakers and listeners form a “forward model” of sequences of utterances and then match the outcome to the prediction process with any sensory input (Pickering and Garrod 2013). However, we should also note a difference between this perspective and the previous pragmatic literature, which our present approach is based on. While rapid prediction approach is based on utterances, that is specific linguistic forms, i.e. words, phrases and sentences, our present proposal specifies action sequence structures in terms of intentional speech acts, each of which can be realised with a range of different utterances (see “Introduction” section). For those communicative contexts where the speech act of the Partner is uncertain and its realisation in terms of a specific utterance entirely unclear, predictions in terms of intentional speech act types carried by rapidly igniting action-perception circuits for neuropragmatic processing appear to provide a suitable explanation.
The specific structures showing stronger early activation in Request compared to Naming contexts are in the action perception areas—the right dorsolateral premotor cortex and right inferior-parietal cortex—but also in additional adjacent areas—in the right posterior temporal cortex and angular gyrus. These activations are open to interpretations in terms of both action and theory of mind processing.
The right dorsolateral premotor cortex has been implicated in representing hand-related actions (Aziz-Zadeh et al. 2006) and object monitoring over space (Schubotz and von Cramon 2001). The posterior temporal cortex has been related to biological movement with the inferior part of it (which is particularly active here, see Fig. 3) specifically relevant for the hand movements (Pelphrey et al. 2005). These brain areas seen active especially during Requesting could be engaged in representing the action of handing over the object from the Partner to the Speaker. In addition, the activation in these areas could also index mental activity focussing on the knowledge about action sequences, as Requests are characterised by a richer range of possible actions that typically follow it. Note that the overt actions actually observed in the experiment had no or little influence on these cortical activations, as the activity in these parts of the action system was recorded in response to the word stimulus, which appeared substantially before (SOA = 1,150 ms) the display of any overtly performed action. Note also that overt actions only appeared after 50 % of the trials.
The brain areas in the temporal and parietal lobes (right posterior STS, angular gyrus, and TPJ) were found here to be more active in response to the Request condition compared with Naming in the early time window. These are generally relevant in representing action, as well as action goals and social intentions. For example, the posterior superior temporal cortex has been implicated in supporting joint action execution (Redcay et al. 2012), prediction of intentions in the other person (Noordzij et al. 2010), perspective-taking and processing socially salient visual cues in situations that require inferences about mental states of others (David et al. 2008). The right angular gyrus has been related to action awareness (Farrer et al. 2008) and the temporo-parietal junction has been linked to visual attention, domain-general self-identification and agency (Decety and Lamm 2007). The right temporo-parietal junction, which is an area in close proximity of the parietal mirror neuron regions, but considered a part of the ToM network, has been related to mentalising (Saxe 2009). A number of meta-analyses and reviews about these areas in the right temporo-parietal cortex pointed to their specific relevance to intention recognition and social processing (Saxe 2009; Seghier et al. 2010; Seghier 2012). Some studies tried to disentangle the contribution of the different regions within the area, for example, by identifying the functional specialisation of the pSTS versus TPJ (David et al. 2008), or the angular gyrus versus intra-parietal sulcus using connectivity analysis (Uddin et al. 2010), while others have tried delineating different functions such as spatial attention versus mentalising within right TPJ using high-resolution fMRI (Scholz et al. 2009). Even with spatially precise neuroimaging methods it is difficult to map these functional areas. The spatial resolution of the MEG does not make it possible to attribute the reported rTPJ activation to the MNS or the ToM network with certainty. However, these results do indicate early involvement of the right temporo-parietal cortex in the processing of action and intention information contained in action sequences.
Lexico-Semantic Processing in Speech Act Comprehension
Following the putative early stage of action and intention recognition, the right angular gyrus and the posterior temporal activations persisted for the Request condition in the second time window (100–150 ms). At the same time, the activation in the left angular gyrus was relatively stronger for the speech act of Naming. This area has been previously reported for the processing of lexico-semantic information (Binder and Desai 2011), especially retrieval processes (Gesierich et al. 2012).
Our previous EEG experiment (Egorova et al. 2013) manipulated, in addition to the speech act type (Naming vs. Request), also the stimulus semantic category (Hand and Non-Hand-related words). In that study, the evidence for a Naming>Request activation pattern was very limited and only appeared in a subset of conditions/electrodes at ~180 ms. Importantly, at exactly the same time (175–185 ms) semantic differences (between Hand and Non-Hand-related words) were observed. In the current experiment, the Naming>Request pattern was also observed between 100 and 200 ms. This time period was previously shown to be relevant for lexical-semantic processing (Pulvermüller et al. 1995, 2009; Sereno and Rayner 2003). In the context of these previous results, the timing of the Naming-specific activation and the brain structures involved here could reflect neural correlates of lexico-semantic access crucial for establishing a link between the word and the object the word is used to refer to. It should be noted that Requests also involve some degree of referential-semantic processing. The Partner needs to understand what specifically is being requested. However, the relevance of the referential information is amplified in the case of Naming, as it is the only important semantic or pragmatic information to be processed during Naming. Therefore, greater engagement of the angular gyrus is observed in Naming, compared to Requests.
Note that the speech act of Naming mainly engaged areas in the left-hemisphere, as confirmed by the Hemisphere by Speech act interaction, whereas Requests activated the right hemisphere more strongly. This relative asymmetry is consistent with the laterality findings in the existing literature on semantics and pragmatics (Zaidel et al. 2000; Holtgraves 2012). Our present data thus confirm a stronger involvement of the right hemisphere in the processing of pragmatic and social-communicative information.
Processing of Intentions and Assumptions in Speech Act Comprehension
Finally, the 200–300 ms time window was characterised by activation of the entire neuropragmatic network, including classic MNS and ToM structures, where stronger brain responses to Requests compared with Naming were found. There was increased activity to Requests in the bilateral posterior temporal cortex and in the right inferior frontal gyrus, both previously attributed to the MNS (Iacoboni et al. 2005; Aziz-Zadeh et al. 2006). The activation pattern within the MNS observed in this later time window resembles the one reported in a previous study (Iacoboni et al. 2005), which compared activations to context scenes, action scenes, and intention. The engagement of the right IFG in the processing scenes in which both context and action indicated a specific action goal, suggests the role of this area in binding information about action and context.
Interestingly, Iacoboni et al. (2005) tested the participants using an explicit and an implicit task: the participants were either told to make inferences about the intention of the action or not. Remarkably, their results indicated that the IFG activation was independent of the task, whereas the explicit inferencing additionally elicited pre-SMA activations for action scenes, and anterior cingulate and the ventral PFC in response to the context/intention scenes. Notably, in a previous fMRI study investigating indirect Request processing (Van Ackeren et al. 2012), the participants were explicitly told to make inferences about utterances, and the results indicated the involvement of the pre-SMA (action system) and anterior cingulate and PFC (theory of mind) for such indirect Request processing. This result could therefore be explained either by the involvement of the MNS and ToM in explicit inferencing, or in processing indirectness, or, alternatively, suggest that they constitute the brain correlate of a Request. In the current study, which does not have such confounds, the involvement of the left temporo-parietal junction, anterior cingulate and ventral prefrontal cortex in the time window of 200–300 ms indicates the important involvement of the theory of mind network in processing intentions and assumptions of the communication partners. The results reported here were obtained in the absence of an inferencing task and likely reflect implicit comprehension of Requests independent of indirectness, clearly indicating that the dynamic neuropragmatic network involved in comprehending Request speech acts encompasses both MNS and ToM areas.
Although parts of the ToM network (rTPJ) could be engaged already in the early time window, the full activation of a widespread ToM system, including the medial frontal and cingulate cortex, only appeared in the 200–300 ms period. A similar latency distinction between rTPJ and vPFC has been suggested by several ERP studies of intention and trait identification, in which the rTPJ activation was shown to appear early (around 150 ms) in both explicit and implicit tasks, whereas the vPFC activation followed later (around 300 ms) and was only present in the explicit condition (Van der Cruyssen et al. 2009; Van Overwalle and Baetens 2009). With respect to the ToM-related activations observed here, the early temporo-parietal activation could be to a degree automatic, whereas later ToM involvement indexed by the prefrontal and anterior cingulate activations, may indicate explicitly controlled and therefore optional analysis of the higher-order intentions and mental states of the communication partners. In contrast to the early activation in the action circuits, the ToM network appears to be engaged in the processing of speech acts in a stepwise fashion emphasising their social-communicative function.
Note that any speech act is characterised by the general intention to perform a communicative action; in this regard, Naming and Requesting are similar, so the brain activation differences are unlikely to reflect this general aspect of communicative intent. However, some speech acts are characterised by more specific intentions directed towards the Partner, be it inducing an action (the intention to obtain an object from the Partner) or a state (the intention to earn approval of the Partner by naming an item correctly). As Naming seems to lack such partner-centred intentions, while Requests are characterised by the commitment to an intention to obtain the requested item, and to make the Partner undertake an action to achieve this goal, the activation differences in the fronto-parietal action system appear to reflect this second type of more speech act specific partner-oriented intentions.
Temporal Stages of Speech Act Processing
The results of this study suggest that such an important aspect of language use as conveying the communicative function of a single-word utterance is processed very fast, with the first neurophysiological differences between speech acts appearing within 100 ms of the onset of the word, followed by the lexical-semantic processing of the word between 100 and 200 ms, and concluded by the additional processing of action information and potentially optional explicit analysis of the mental states and intentions of the communication partners between 200 and 300 ms.
In our previous EEG study on processing Naming and Requests, in which word utterances were presented in blocks of 10 per speech act type (Egorova et al. 2013), the earliest differences between the conditions were reported at 110–130 ms. Under such a paradigm, the speech act types of all 10 utterances could be computed already at the beginning of the block, thus making individual predictive processing for each specific item unnecessary. In the current experiment, the presentation was more challenging: all speech acts were presented as single items with a new speech act context introduced in every trial forcing the computation of the speech act type anew as the sequence unfolded. Further, the predictability was additionally reduced by variability in the stimulation that followed after the context sentences, with only the minority of trials representing the speech acts of Naming or Requesting. Despite these substantial differences between the designs of these two studies and the use of two different neuroimaging methods (EEG vs. MEG, statistics performed in signal vs. source space), a remarkable similarity of the time course of speech act processing was observed. In both experiments, processes between 100 and 200 ms are likely to reflect lexico-semantic access (175–185 ms in the block EEG design and at 100–150 ms in the single-trial MEG design). In the time windows preceding and following the potential semantic processing Requests activated the brain more strongly than Naming (in the block design, Request dominance was seen at 110–130 and 255–350 ms, and here at 50–90 and 200–300 ms). Although the timing in the two experiments seems to be slightly shifted, the succession of processing stages is comparable in both designs.
The previous EEG study, in addition to the pragmatic variables, explicitly manipulated semantic word properties, and reported parallel processing of both pragmatic and semantic information early on, both in 110–130 and 175–185 ms time windows. Note, however, that only in the later time window was the direct pragmatic contrast Naming>Request significant, consistent with the current results. Note also that due to the differences in measurement sensitivity, the frontal activations in the EEG experiment were not in focus, which limits the possibility to compare the involvement of the anterior frontal parts of the ToM network between the studies.
Several other studies investigated the time course of action intention comprehension and showed a similar temporal pattern. For example, Ortigue et al. (2010) reported several distinct stages of action intention processing using EEG, namely a stage of automatic bilateral activation in posterior areas around 100–120 ms, followed by the left posterior/inferior parietal activation for processing object semantics around 120–200 ms, and concluded by context-dependent fronto-parietal activation between 200 and 500 ms.
Thus, the three processing stages that could be tentatively proposed to form the basis of speech act comprehension irrespective of the type of speech act are: (1) action and intention comprehension, (2) semantic processing, and (3) optional reprocessing of action information and aspects of ToM concerned with explicit self/other mental state analysis. Stage 2 seems more relevant for the Naming and stages 1 and 3 for the Requests, as seen in the time windows and the specific loci of activations here. It remains to be investigated how other speech act types are manifest in local brain responses observable in these specific time intervals.
The data so far obtained suggest that neuropragmatic processes draw upon brain regions for action, mentalising, and social interactive knowledge processing to compute different aspects of communicative meaning (Frith 2007; Spunt et al. 2011; Spunt and Lieberman 2013). The results of the present work were obtained with visually presented stimuli. While this ensured temporal precision of the measured neurophysiological response relative to the point in time when critical words can first be recognised, future studies should investigate speech act processing using more natural spoken words and sentences, and even the interplay between auditory and visually presented information (e.g. gestures and speech) in speech act understanding.
This study only focussed on processing of two speech act types, Naming and Requesting. Both are very common and pertain to the general class of assertives and directives respectively (Searle 1979). With respect to the brain networks supporting other speech act types, two possibilities exist. On the one hand, it could be that all the speech acts within the broader speech act classes share the same neural networks, for example, all directives (Requests, Orders, Commands, etc.) would rely to the same degree on the action and theory of mind networks and all assertives (Naming, Informing, Making statements) engage the brain areas that contribute to specific types of semantic processing (Pulvermüller 2013). On the other hand, it is also possible that each speech act has its own neural signature, which allows efficient differentiation of speech act types. Both possibilities are plausible. It is therefore important for future studies to investigate the brain basis of other speech act types, such as Acknowledgements, Promises, Complaints and many others, and identify the role of the action system in representing different speech act sequences, varying complexity (richness of the action sequence) and the influence of motor actions (such as handing over objects) as part of the sequence structure. Similarly, it is important to understand the factors that modulate the involvement of the theory of mind networks by manipulating the relevance of social inferencing in speech act recognition.