Introduction

Humans are extraordinary social beings: we interact with various and varying groups; we bicker and play with each other; and we cooperate, trade, and deceive. Understanding and predicting the behavior of others is of crucial importance in our everyday lives, and the ability to do so is based on Theory of Mind (ToM; also termed mentalizing, cognitive perspective taking). ToM, the ability to reason about or infer others’ mental states, has been a core topic of social sciences for more than 40 years [1] and ever since has been investigated with a broad range of paradigms that make use of diverse materials. Participants in mentalizing research have read or memorized stories [2, 3], played games [4], and watched comic strips or film sequences [5, 6], all aiming to elicit thoughts about other people’s minds. Table 1 and Fig. 1 present examples for different ToM tasks. ToM has been investigated across various age groups [7, 8], in humans and in animals [9, 10], in typically developing individuals [8], and in psychopathologies [11]. As data from different paradigms accumulated, it became clear that ToM is not a monolithic ability, but rather a multifaceted construct with distinct interrelated sub-processes. As a result, the existing paradigms for ToM assessment are heterogeneous, focusing on different aspects of mentalizing, and none of them can capture the concept in its entirety [12••]. In this review, we aim to provide a brief overview of the current state and recent trends in human ToM research. Most importantly, we want to illustrate the impact a specific paradigm can have on the experimental outcome in this framework.

Table 1 Examples for ToM tasks. This table provides an, by no means exhaustive, overview of classically employed ToM paradigms including short descriptions of the relevant ToM aspect, task specifications, and main fields of application
Fig. 1
figure 1

Overview of ToM task categories and typical examples. a Depiction of the Sally-Anne task as an example for a false belief task (based on Baron-Cohen et al. [14]). Participants need to understand that Sally holds a false belief (that differs from their own) in order to solve the task. b Example for a rational actions task (based on Brunet et al. [5]). Selection of the correct picture requires an understanding of the depicted agent’s goal. c The EmpaToM as an example for a more naturalistic and dynamic narration understanding task. Short video clips depict fictional characters telling short autobiographic stories. The content of these narrations can be neutral or emotional, and the stories can require mentalizing or not. Participants indicate how they feel (as a measure of empathic responding) and answer multiple-choice questions requiring inferences about the mental states of the narrator (ToM condition) or factual reasoning (control condition) (based on Kanske et al. [6]). d The Samson task as an example for a visual perspective-taking task. Participants judge the number of dots seen by the avatar or from their own perspective. 1 Congruent condition (avatar and participant see the same number of dots). 2 Incongruent condition (avatar and participant see different numbers of dots) (based on Samson et al. [31]). Slower responses in the incongruent condition are taken as an indication for the tendency to represent not just one’s own but also the avatar’s perspective. All parts of this figure are original

Developing ToM

Understanding other people’s mental states is a socio-cognitive competence that develops throughout childhood. Many researchers attribute this process to the sequential emergence of multiple interrelated concepts rather than a single event [7, 13]. Nevertheless, ToM advancements can be roughly divided into three stages: early ToM, which emerges in the first months of life; basic ToM, which is typically developed around the age of 4 years; and advanced ToM, which does not evolve until 6 to 8 years [14] and keeps developing throughout adolescence [15••]. Findings from neuroimaging studies suggest a common neuronal basis across the three types of ToM in 4-to-8-year-old children, with particularly strong similarities between basic and advanced ToM [16].

Early ToM

One of the most central debates in current ToM research concerns the mentalizing skills of young infants. The development of new paradigms with more implicit measures, such as spontaneous gaze behavior, paved the way for the investigation of ToM performance in children below the age of 2 years. Some studies suggested that infants as young as 7 to 15 months can master false belief (FB) tasks when implicit paradigms are used [17, 18]. More recently, however, the generalizability of this notion has been queried. For example, a meta-analysis revealed that infants’ correct performance in implicit FB tasks is highly influenced by the choice of paradigm [19••]. Children were more likely to pass the test when a Violation of Expectation (VOE) paradigm was implemented in the study, compared with anticipatory-looking (AL) or more interactive paradigms. In the VOE paradigm, an expectation, for instance about an agent’s behavior, is generated in an initial habituation phase after which the child is presented with either an expected or an unexpected event. The gaze behavior of the infant serves as indication for their inference about the agent’s mental state. This is both the benefit and the vulnerability of the paradigm. On the one hand, without any language requirements, even the youngest infants can participate in this task. On the other hand, without explicit responses, longer looking times in the test phase leave much room for interpretation; while they are typically taken as an indication of surprise about an event that is unexpected given the agent’s mental state, longer looking times could also reflect a more basic response to a novel stimulus [9, 19••]. Thus, deliberate construction of control conditions and habituation phases is necessary to prevent this potential confound—a requirement that many studies fail to satisfy [9, 20, 21]. Besides the choice of experimental paradigm, a broad range of task specifics can account for variance in the ToM performance of infants. These include the type of agent and the salience of its mental state as well as the movements of involved objects and whether or not deception was included in the task [19••].

A recent study revealed the significance of another characteristic of implicit ToM tasks. Fizke et al. [22] tracked the helping behavior of 2-to-3-year-old children in two versions of a FB task: one version included aspectuality whereas the other version of the task did not. Aspectuality denotes incompatible beliefs about an object or a person under two different aspects, for example knowing the person Clark Kent as himself versus knowing him as Superman without being aware of his private identity. Each of the two task versions used by Fizke et al. consisted of a true and a false belief condition. The toddlers reacted differently to the agent’s true versus false belief only when aspectuality was not involved in the task. This pattern was taken as an indication of conceptual deficits in infants and is in line with the finding that below the age of two, they are capable of tracking mental states and can master implicit FB tasks as long as an understanding of aspectuality or of other propositional attitudes is not necessary to pass the test [22,23,24,25].

Taken together, while spontaneous perspective taking in young infants appears to be a real phenomenon, it is highly dependent on formal and content-related aspects of the paradigm.

Basic ToM

As children grow older, direct questions can be used to examine their ToM skills. Classical investigations employing such elicited-response tasks showed that children from about 4 years of age are able to attribute mental states to others even when those states differ from their own [7]. Around this age, children acquire competence for a large variety of ToM tasks, and the high correlation between performance in these explicit first-order ToM tasks indicates the emergence of a conceptual capacity. Similarly, and in contrast to implicit paradigms, specifics of explicit FB tasks, such as characteristics of the protagonist or the type of question, appear to have no effect on performance [7]. This pattern speaks for a more tangible belief conception in children of 4 years and above, which is largely independent of FB task variations.

Whereas the reported within-task variance appears to be negligible, the content of the other’s mind has an impact on explicit ToM performance in pre-school children. Wellman and Liu [26] developed a scaled set of first-order ToM tasks and showed that understanding of different mental states in children aged 4 to 6 develops in a regular order with progressively broadening comprehension of subjectivity. Specifically, an understanding of desire and intention appears to emerge before an understanding of belief, while an understanding of hidden emotions arises much later. Findings from a recently developed auditory equivalent of the scale showed that children pass the tasks in almost the same order when auditory instead of visual material was presented, which indicates that the assessment of ToM development is modality independent [27•]. An auditory version of the scale could be especially useful for the assessment of children who show a delay in ToM development and face visual challenges, such as in children with congenital blindness [28].

Burnel et al. [29•] continued on this path and designed low verbal versions of Wellman and Liu’s tasks with largely similar outcomes. Taken together, these findings exemplify the sequential acquisition of specific ToM skills during childhood and emphasize the importance of a broad assessment of ToM performance during the pre-school years that goes beyond false belief understanding and includes scaled task batteries.

Besides the progressive understanding of mental states, linguistic abilities have a strong influence on ToM performance. The apparent differences in the age of ToM acquisition between studies can often be explained by differences in linguistic task demands [25, 29•, 30]. Together with the notion of a close correlation between ToM and language development [31•, 32], this finding demonstrates the impact of linguistic requirements in ToM assessment, especially when working with children.

Higher-Order ToM and Advanced ToM

Along with cognitive development, children acquire the competence to pass more complex mentalizing tasks, so called second-order ToM tasks. While first-order ToM refers to what people think about real events, second-order ToM goes one step further and encompasses what people think about other people’s thoughts. As a result, these tasks are inherently more complex and children are generally older when they first accomplish this level of mental state representation. Representations of second-order false beliefs are typically tested with the story vignettes approach by Wimmer and Perner [33]. Initial findings suggested that children pass second-order FB tasks under optimal conditions at the age of 6 or 7 years. However, by substantially reducing task complexity and linguistic demands, even 5-year-old children showed high success rates. Further facilitative effects have been reported when adding an extra question to prompt the mental state of the agent, such as “Does John know that Mary knows where the ice-cream man is now?” [32, 34].

Higher-order ToM includes even more levels than second-order ToM, whereas advanced ToM involves complex understandings of features such as irony, metaphors, or double deceptions. These more complex forms of ToM are acquired later than second-order FB reasoning, between 8 and 13 years [35], and improve throughout adulthood [36]. Recently, some of the most widely used paradigms to investigate these forms of social reasoning, in particular the Strange Stories Task [2], have been criticized for low internal consistency [37], and a multifactorial structure of these paradigms has been suggested [15••]. Specifically, (advanced) ToM seems to be an assembly of distinct socio-cognitive competences, including trait judgements, reasoning about rational behavior, and reasoning about ambiguity [12••, 15••]. Accordingly, capturing the development of advanced ToM throughout adolescence may require a carefully selected battery of tasks that allows targeting the specific underlying socio-cognitive processes.

Mature ToM

Two core questions dominate the investigation of fully developed socio-cognitive capacities. First, fanned by the rapid technical and methodological advances in imaging research, numerous studies addressed the neuronal underpinnings of ToM. Secondly, inter-individual differences in ToM performance and their relation to other constructs, such as executive functions, are informative about the nature of ToM. While paradigms typically used in neuroimaging research are relatively easy and often elicit performance that is at ceiling, research on inter-individual differences require tasks with a higher level of difficulty.

Neuronal Basis

The neuronal activation pattern that accompanies performance of ToM tasks has inspired imaging research for more than two decades. A wide range of experimental paradigms has been deployed, and consequently, findings have been heterogeneous. It is uncontested, however, that a distributed brain network is engaged during mentalizing [38, 39]. Two core regions of this network are the temporo-parietal junction bilaterally, which is most specifically engaged in reasoning about other person’s mental states [39, 40, 41••], and the medial prefrontal cortex [39], which has been suggested to be more generally involved in processing socially and emotionally relevant information [12••]. Other regions frequently associated with the mentalizing network include the posterior cingulate cortex and parts of the precuneus, the orbitofrontal cortex, the anterior temporal lobes, and the amygdala. Recent endeavors specifically investigated neuronal activation patterns during mentalizing in relation to the task that was employed and found that activation varies with study methodology [38, 39, 41••]. A direct within-participant comparison revealed distinct neuronal activation patterns for different ToM tasks” if this adds to clarity [42••] and specific features of the task, such as the mental state it taps into or whether belief reasoning refers to similar or dissimilar others, differentially engage specific regions of the ToM network [39]. As such, neuroimaging research supports the conceptualization of ToM as a multifaceted capacity with varying specifications depending on the context. Accordingly, future research should advance systematic comparisons of neuronal activation and their relation to different paradigms and task aspects [43]. This endeavor could provide valuable insights about the particular sub-processes that contribute to successful mentalizing.

ToM and Executive Functions

Like with so many other challenges in life, some people are better at ToM than others, and one important role in this context is played by executive functions (EF) [44, 45]. EF is an umbrella term for cognitive processes that foster goal-directed behavior and problem-solving, such as inhibition, updating of working memory, and cognitive flexibility [46]. The strong relationship between EF and ToM and the fact that both constructs comprise a large number of processes beg the question whether ToM tasks specifically measure mentalizing or whether—and to what extent—performance in these tasks relies on other, more general capacities. For instance, the inhibition of prepotent responses, that is critical in EF tasks, and the inhibition of one’s own mental states when inferring others’ mental states in ToM tasks might be very similar inhibition processes. Indeed, neuroscientific evidence suggests that areas associated with EF are involved in mentalizing [47]. A strong relationship has been demonstrated in first-order FB tasks, whereas the evidence for effects in second-order FB reasoning is less consistent [32].

Critically, the association of the two constructs can bias findings in ToM research, particularly in groups with limited or impaired EF, for example children, older adults, or patients with schizophrenia [8, 11, 45]. A well-designed task as well as the use of adequate comparison conditions is therefore especially important in these samples. In the case of schizophrenia, a fruitful approach to tap into ToM capacities irrespective of EF is the employment of instructions that only indirectly refer to ToM, for example sorting cartoon pictures (concerning the mental states of the displayed agents) in a logical order or explaining a joke [11]. Older adults, on the other hand, could benefit from verbal tasks because vocabulary increases with age [48]. Other important methodological parameters in this context include task complexity and time constraints as well as stimulus material and the modality of presentation [49].

Recent Advances

A central characteristic shared by most FB and other ToM tasks is the binary response format. The resulting pass-or-fail interpretation, together with the fact that performance in those tasks is usually at ceiling in adolescents and adults, makes it difficult to capture variance in mental state representation. Therefore, an important recent trend has been the extension of classical paradigms with continuous measures that allow for the investigation of inter-individual variability. For example, Bradford et al. [50] combined measures of correct performance, reaction time (RT), and electroencephalography (EEG) to investigate the role of perspective shifting in a ToM task. Other recent RT-based studies demonstrate a connection between visual perspective taking and cognitive perspective taking [51, 52]. Compared with exclusively relying on correct versus incorrect answers, the incorporation of RT measurement better allows for revealing inter-individual variability.

Another promising approach to capturing inter-individual variability in advanced ToM was introduced in the Edinburgh Social Cognition Test (ESCoT) [53]. The test employs cartoon-style dynamic interactions together with open questions that are rated based on the quality of the answer. With the dynamic stimulus material, the ESCoT also addresses another obvious yet often overlooked shortcoming of classic social cognition paradigms: their limited ecological validity. Some aspects of ToM are inherently interactive and therefore need to be studied in more complex, dynamic, and naturalistic settings. Other examples of new paradigms that incorporate this idea are the Strange Stories Film Task [54], that was based on the original stories from Happé [2], and the EmpaToM [6], that allows for a simultaneous manipulation and assessment of empathy and ToM with sufficient inter-individual variance in adults. A sample trial sequence of this video-based task is depicted in Fig. 1 (panel c), and the task is shortly described in the respective figure captions. In a recent pilot experiment, we employed eye tracking while participants performed the EmpaToM to investigate the relationship of basic gaze processes with empathic responding and ToM in a naturalistic social setting. Specifically, 41 participants (34 female, mean age 23.4 years) completed the EmpaToM on a CRT monitor while their gaze behavior was tracked with an EyeLink 1000 Desktop Mount eye tracker (SR Research Ltd., Ontario, Canada). We defined an area of interest around the eye region of the narrators in the video (80 × 230 pixels; see Fig. 2) and collected the percentage of fixations in this region and the percentage of time spent on the eyes. Due to technical difficulties and insufficient quality of eye data due to movements, data of 30 participants was available for further analysis (27 female, mean age 21 years). Results are presented in Fig. 2 (panel b). First, we found a substantial variance in the individual tendency to establish eye contact with the narrator during the video. Participants spent between 34 and 61% of the time looking at the eye region. In addition, participants who showed a higher empathy tendency spent less time overall looking at the eyes of the narrator during videos with negative valence (r = −.44, p = .015). This pattern is in line with the notion of a self-regulative role of gaze behavior in emotionally charged situations [55]. Hence, empathic participants may have downregulated their own emotions by looking away from the eye region during emotionally negative videos. Interestingly, the more time participants spent looking at the eyes of the narrator (relative to other areas) during videos with mental state interference was marginally positively related to performance in the subsequent ToM question (r = −.32, p = .085). This finding suggests that eye contact during a conversation might enhance the efficiency of mentalizing processes [56]. Given that present results are based on only 30 participants, that effects are relatively small, and that the study is entirely correlational, further studies are certainly necessary before strong conclusions can be drawn. However, we that think our pilot study suggests that probing the relation between basic perceptual and behavioral processes on the one hand and performance in ToM tasks on the other hand can be promising.

Fig. 2
figure 2

The EmpaToM. a Example for the region of interest (eye region) for one of the narrators in the EmpaToM. b Pilot findings of gaze behavior in relation to ToM. The histogram displays how much time participants spent looking at the eye region during the videos (in percent). The scatter plot shows the correlation between the relative duration participants looked at the eye region during ToM videos and performance in ToM questions (composite score integrating speed and accuracy). All parts of this figure are original

Rapid technical advances pave the way for even more naturalistic paradigms in adapting a second-person account. Live video feed, mobile eye tracking, or motion capture are promising ways to study social cognition in a more interactive and ecologically valid fashion (see Lehmann et al. [57] for a review). As virtual reality (VR) technology becomes more available, it is increasingly integrated in social cognition paradigms as well [58]. For example, in a recently developed VR task for the investigation of ToM in schizophrenia, participants run errands in a virtual shopping center [59]. The scenario involves social interactions which are complemented with multiple-choice questions requiring an interpretation of the encounter. The great opportunity of VR is the potential to bridge the gap between ecological validity and experimental control. Changes of specific variables, for example the gender of the interaction partner, can be easily implemented while keeping all other parameters constant. Moreover, VR facilitates reproducibility because, once created, scenarios can be shared across laboratories. In view of the replicability crisis, this is an opportunity of special importance.

Enhancement of Developing and Mature ToM

Even though ToM development follows a relatively consistent pattern across children, it can be promoted during childhood. In the first years of life, mental-state talk of the caregiver is related to children’s later understanding of the mind [60,61,62]. Storybook interactions with a special focus on the mental states of the character are an easy way for parents to support false belief understanding in this age group [63]. Later, during the first years of school, conversations about the mind and group discussions about mental states, which can be delivered by the teacher [64], can successfully enhance ToM skills [65,66,67]. While meta-analyses show that shorter periods of training with longer session durations seem to be more efficient, the discovery of the most effective training practices requires further research [68].

Interestingly, some studies incorporated additional outcome measures—with mixed results. For instance, training of first-order ToM can transfer onto more advanced forms of ToM [69], and a training that was mainly constructed to enhance children’s emotion understanding through conversational interventions on emotions also showed a positive effect on other social cognition aspects, such as ToM [65]. On the other hand, a storybook interaction approach intended to promote emotion understanding, social competence, and false belief understanding in pre-school children, only had an effect on the latter [63]. Training of an isolated feature, for example false belief understanding, cannot do justice to a multifaceted construct such as ToM. It is therefore not surprising that the increase in specific ToM skills in autistic children and adults after trainings with standardized tests often fail to transfer onto more generalized ToM measures or social competence in real life [70,71,72].

Recent research suggests that ToM performance can also be enhanced in healthy adults. A mental training protocol that targeted a rather wide range of socio-cognitive skills, such as flexible perspective taking on self and others and observing one’s own thoughts, led to increased performance in an advanced and high-level ToM task (EmpaToM, [73, 74]). The observed behavioral improvement was accompanied by changes in grey-matter volume in neuronal regions that are consistently associated with ToM [75].

The promotion of socio-cognitive capacities is of special interest in aging populations, as ToM has been found to decrease with age [76]. Fortunately, older adults benefit no less from ToM training than younger adults when a conversational approach is used [77]. Diversified ToM trainings that include practicing visual perspective taking, first- and higher-order ToM, and mentalizing in various real-life contexts seem suitable to enhance performance in different ToM measures in older adults [78, 79].

Taken together, ToM performance can be promoted throughout life, but the effects of social cognition trainings seem to critically depend on their content [70,71,72, 79]. An improvement of ToM in its entirety requires training of the whole spectrum of the concept. In this context, more true-to-life procedures are a promising avenue; 6 months after a 5-week VR-based social cognition training, autistic individuals reported increased social skills, such as maintaining a conversation and establishing relationships, in their everyday life [80].

Conclusions

In this article, we illustrate how the choice of paradigm and its characteristics shape the outcome of ToM assessment throughout all age groups. In young infants, spontaneous mentalizing skills as investigated with implicit designs largely depend on formal and content-related aspects of the task. In addition, linguistic requirements and the strong relationship between ToM and EF are critical when assessing ToM in childhood. A multiple-task battery allows a broad investigation, which enables a more comprehensive assessment of ToM capacities and helps to determine the current stage of ToM development in children [26]. In adults, behavioral observations and neuronal activation patterns exemplify the task-dependent and multifaceted nature of ToM. Similarly, while ToM performance can be promoted by training programs in both children and adults, the generalizability of training effects depends on the scope of the training, supporting the view that “you get what you give.”

Based on the findings reviewed in this article, we want to promote a multifaceted approach in the assessment of socio-cognitive competences. The application of multiple-task batteries instead of a monolithic treatment of ToM is of central importance in this context. In line with this point, we want to emphasize the significance of making deliberate and well-informed decisions about the paradigms, specific variations, and control conditions that are incorporated in research.

To achieve these objectives, further research needs to probe the precise relationship between task settings and their behavioral and neuronal outcomes in more detail. Existing meta-analyses on this issue provide a good basis [7, 12••, 19••, 41••, 76]. Systematic comparisons of different paradigms and their variations within the same population are vital for future research. Based on the notion that cultural variations exist in mentalizing [81, 82•], we believe that cross-cultural comparisons could be a fruitful addendum to this new line of research. A better understanding of the nature and the evolution of ToM could contribute to a well-grounded approach of future mentalizing assessment.

The incorporation of continuous measures and naturalistic stimuli are promising ways towards a more profound and comprehensive assessment of socio-cognitive capacities. This approach could be extended with a combination of diverse behavioral and physiological measures to capture the vast range of processes that contribute to and are involved during mentalizing. As an example, our abovementioned pilot findings suggest a relationship between basic attentional processes and advanced ToM capacity in adults: participants who spent more time looking at the eyes of narrators were somewhat better in understanding their mental states. Investigating the relationship between basic processes and ToM can pave the way for new approaches to promote mentalizing skills. Research revealed that both developing and mature ToM can be enhanced by relatively short training programs [69, 78]. Enhanced generalizability of these effects could be gained by training schedules that take the multifaceted nature of ToM into account. Furthermore, a better understanding of the exact mechanisms that drive training success is needed to further enhance the efficiency of these programs [68]. Of crucial importance in this context is a thorough investigation of the transfer effects of ToM trainings. These effects can shed light on the impact that mentalizing skills have outside of the laboratory, in terms of their contribution to enabling successful social interactions, as well as ensuring physical and mental health in everyday life.