Introduction

Social cognition refers to any process involved in the detection, elaboration, and interpretation of social information, that is, the ability to represent other's intentions, emotions, desires and beliefs, and appropriately respond to them (Happé & Frith, 2014). These capabilities are crucial for successful communication and social interaction, with significant implications in mental health, wellbeing, and quality of life in both typical and atypical development (Adolphs, 2009; Cotter et al., 2018; Henry et al., 2016). In fact, social cognitive impairments are among the earliest and predominant clinical symptoms of many acute and chronic neurocognitive disorders (Adenzato & Poletti, 2013; Bora et al., 2015; Henry et al., 2016).

The relevance of social cognitive functioning assessment has been formally notified in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders, where social cognition is included as one of the six cores of neurocognitive domains (APA, 2013). Accordingly, different social cognitive assessment measures are now available for clinical use as part of wider neurocognitive batteries. However, differently from the five traditional neurocognitive domains (learning and memory, executive function, complex attention, perceptual-motor function, and language), grounded on a long history of validation of neuropsychological assessment tools, studies on psychometric characteristics, and normative data of social cognition tests are relatively few. This is especially true for high-order social cognition abilities, such as Theory of Mind (ToM) or mentalizing, the ability to think to one’s own, and others’ mental states to comprehend and predict their behaviours (Baglio & Marchetti, 2016; Call & Tomasello, 2008; Premack & Woodruff, 1978; Wimmer & Perner, 1983). The nature of this capability is complex and multi-component, embracing an affective and a cognitive component, referring respectively to emotions and intentions driving behaviour. Also, the meta-representation of mental states presents different levels of attribution, such as the first- and second-order inference (Kalbe et al., 2010; Shamay-Tsoory et al., 2005), which have been classically investigated with first and second-order false belief tasks (Baron-Cohen et al., 1985; Perner & Wimmer, 1985; Wimmer & Perner, 1983).

Although a substantial number of ToM tools are presented in the literature, most of them do not assess all these components of mentalizing. For example, the already mentioned False Belief task evaluates only the recursive level of mentalizing, without considering separately cognitive and affective components. On the other hand, the Faux Pas Recognition test (Roca et al., 2014; Stone et al., 1998), as well as the Story-based Empathy task (Dodich et al., 2015), assess both the affective and cognitive components without taking account recursive thinking (e.g., first and second order beliefs). Even the more ecological measures of ToM, based on multimedia stimuli, as the Toward the Assessment of Social Cognition test (TASIT, McDonald et al., 2004), and the Movie for the Assessment of Social Cognition (MASC, Dziobek et al., 2006) focus only on cognitive and affective ToM aspects, and not on the level of recursive inference. Moreover, to our knowledge, only the Faux Pas test and Story-based Empathy task have been provided with normative data (Delgado-Álvarez et al., 2021; Dodich et al., 2015). Furthermore, it has to be mentioned that nor the Faux Pas test nor the Story-based Empathy task has been validated. To date, the ToM test validated and mostly adapted for different languages and age groups remains a measure of affective mentalizing, the Eyes Test (Baron-Cohen et al., 2001; Fernandez-Abascal et al., 2013; Vellante et al., 2013).

Among the available ToM measures, the Yoni task (Shamay-Tsoory et al., 2007) is currently used as a complete computerized ToM measure to evaluate first and second-order, affective and cognitive mental state attributions. The Yoni task has been proved to be valuable in the research context. Different versions of the Yoni task have been applied in many clinical conditions: the 64-item version has been used to investigate cognitive and affective ToM in patients with localized brain lesions (Abu-Akel & Shamay-Tsoory, 2011), schizophrenia, and criminal offenders (Shamay-Tsoory et al., 2010); Bodden and colleagues (2010) adopted a 60-item version as a measure of mentalizing ability in patients with Parkinson Disease and healthy subjects. Finally, in the Italian context, Rossetto and colleagues (2018) assessed ToM abilities with the translation of the Yoni task, in its 98-item version, in subjects with Mild Cognitive Impairment and Parkinson’s disease.

Considering the benefits of the Yoni task compared to other social cognitive tests, the Yoni task presents many advantages. Firstly, differently from the majority of social cognition tools, it allows a global comprehensive assessment of ToM, evaluating both cognitive and affective mental states, and also first and second-order beliefs. Also, Yoni presents simple visual stimuli with an ease of presentation modality purposively designed to minimize the influence of language, memory, and executive function on the subject’s performance. This aspect is crucial considering that other social cognition measures have been reported to be influenced by the level in other neurocognitive domains (Lugnegård et al., 2013; McDonald et al., 2015; Schneideret al., 2012a, 2012b). Finally, divergently from story-based tests such as the Strange Stories (Happé, 1994), requiring a contextual interpretation of interactions, the simple stimuli of Yoni allow excluding the effect of social norms and autobiographical memory on the test score (Zalla & Korman, 2018).

Although the advantages related to Yoni task utilization, some issues need to be reported. Heterogeneous versions of the task are available in the literature, and neither validity (e.g., convergent and discriminant validity) and reliability (e.g., inter-item reliability) have been explored nor normative data have been presented. Moreover, to include the Yoni task into neuropsychological screening, the existing versions of the tool are not agile in their current form for the huge time-related demand.

According to these premises, the present study aimed at: a) exploring the convergent and discriminant validity, and inter-item reliability of the Italian version of the Yoni task score (Rossetto et al., 2018); b) providing normative data for the 98-item version in the Italian population; c) developing two short versions of the Yoni task from the 98-items version (Rossetto et al., 2018), balanced for its subdomains (first- and second-order, affective and cognitive components).

Methods

A secondary analysis was performed on data collected during previous researches (Isernia et al., 2019, 2020; Rossetto et al., 2018) and ongoing research. The IRCSS Don Gnocchi Foundation Ethic Committee and/or Catholic University of Sacred Heart Ethic Committee approved all the original research protocols, and data were collected in line with the Declaration of Helsinki. All participants read and signed the written informed consent before taking part in the study.

Participants

Participant's data were considered for the secondary analysis whether in line with the following inclusion criteria:

i) age > 18; ii) absence of neurological and/or psychiatric conditions assessed by the clinical interview; iii) absence of oncological conditions; iv) a global cognitive function as assessed by MoCA or MMSE within the normal range; v) absence of auditory and/or visual disability; vi) absence of pharmacological treatment affecting cognitive functions; vii) if under pharmacological therapy, a stable treatment for at least three months before the participation in the research; viii) a high performance in the control items of ToM tasks.

All these subjects were recruited at IRCCS Don Gnocchi Foundation of Milan and Catholic University of Sacred Heart of Milan. They were students, staff of the clinic, and volunteers that were contacted by mail and/or telephone by a researcher.

ToM assessment

The Italian adapted version of the Yoni Task (Rossetto et al., 2018) was used to test ToM abilities. This task consists of 98 items, 84 mental items, and 14 physical (control) items showing a face named “Yoni” and four colored pictures placed around the face referring to different semantic categories (e.g., fruit, animals, means of transport) or faces. The items are presented on a computer screen by a software (e.g., e-PRIME) that allows to register time response as well as accuracy. The task requires the participant to choose the correct item to which Yoni refers based on a sentence on the top of the screen and specific cues such as Yoni’s eye gaze and/or facial expression, and/or the eye gaze/facial expression of faces around him. Participants were instructed to choose the correct answer among four alternatives by selecting them pointing on it with the computer mouse as fast as possible. All items differ in the complexity of the meta-representation they require, i.e., first-order levels (N = 24 items) or second-order levels (N = 60), and in the evaluation of affective ToM (“Yoni likes...”, N = 48 items) or cognitive ToM (“Yoni is thinking of...”, N = 36 items) (Fig. 1). The performance was rated separately for accuracy and response times. For the ToM accuracy score (ACC), only mental items were considered; each item scored 1 if correct and 0 if wrong, for a total score of 84. Also, the ACC scores gained from each subcategory were summed to obtain four subtotals: the total of first-order items (1ORD, range 0–24), the total of second-order items (2ORD, range 0–60), the total of affective items (AFF, range 0–48), and the total of cognitive items (COG, range 0–36). Response time (seconds elapsing between the stimulus presentation and the subject answer), total score (RT) was obtained by averaging only the response time of correct items. As ACC scores, separate scores related to subcategories of the scale were computed for RT.

Fig. 1
figure 1

(Source: from Rossetto et al., 2018)

Sample of items from the Yoni task: first- and second-order, cognitive and affective mental inference and physical (control) items

The Italian version of the Reading the Mind in the Eyes test (RMET, Baron-Cohen et al., 2001; Vellante et al., 2013) was considered a measure of advanced ToM based on judging mental states from the eyes. The RMET comprises 36 items, presenting black-and-white photographs of the eye region of males and females displaying a certain basic or complex emotion. Each stimulus is reported with four forced-choice alternatives consisting of four states of mind. The subject is invited to choose the answer that best fits with the state of mind depicted in the photograph. Each item scores 0–1, for a total score that ranges 0–36. A gender recognition task (Gender Test, GT) is also performed with the same items as a control task to exclude visual disability affecting task performance. For the present study, a computerized version of the task has been administered.

Statistical analysis

Jamovi 1.2.27 and IBM Statistical Package for Social Science v. 26 software were used for the statistical analysis.

Description of the participants characteristics

Summary statistics, such as means, and standard deviations, were run to report the demographic characteristics of participants. For each variable included in the analyses, normality distribution and modality were considered, and consequently, parametrical or non-parametrical analyses were run.

Validity and Reliability of 98-item Yoni

Cronbach’s alpha and McDonald’s omega were computed to verify the internal consistency of Yoni. Inter-item reliability was also tested by split-half parallel reliability Spearman-Brown ϱSP. Convergent and discriminant validity was tested by performing correlations (Spearman’s ρ) between Yoni and RMET, and Yoni and Gender Test.

Standardization of the 98-item Yoni task and normative data

Simultaneous multivariate regressions were run to explore the role of demographic variables (sex, age, and years of education) as possible predictors of Yoni accuracy and response time. Then, the contribution of predictors in the regression model was included in a formula to adjust raw scores to exclude the effects of demographic variables on Yoni accuracy and response time. Then, the specific adjustment to be added or subtracted from the raw score based on the subject’s sex, age, and years of education was computed to provide adjustment score tables. No adjustment was suggested for the upper and inner limit of Yoni’s accuracy raw score. Assumptions for the application of linear regression were graphically checked by means of residual distributions and homoscedasticity (histogram plot of residuals, plots of residuals vs fitted, Normal Q-Q plot, square root of standardized residuals vs. fitted values, standardized residuals vs. leverage) and appropriate statistical tests (Breush-Pagan Test and Non-constant Variance Score Test). All regression models were applied to inverse rank Blom’s transformed data to improve the normality of variable distribution and the adequacy of linear model approach. Finally, an adjusted total score of Yoni was computed separately for accuracy and response time, considering only mental and not physical items. For both accuracy and response time, the total score was set to range 0–1. Accuracy total score was computed by summing adjusted scores of first-order and second-order items and dividing the sum for the total number of mental items (ACC_TOT = first-order adjusted ToM + second-order adjusted ToM / Ni). The total response time score was created by averaging the sum of the first and second-order adjusted ToM response time on the total available time minus the minimum time necessary for the performance (RT_ToT = 1-((((First-order adjusted ToM RT—RTmin) / RTi) + ((Second-order adjusted ToM RT—RTmin) / RTi))) / 2). Furthermore, an additional score was proposed to detect the balance between the level of affective and cognitive ToM, separately for accuracy and response time, as follows:

$$Cognitive/Affective\;Accuracy\;index\;({CA}_{A}) = (Affective\;Adjusted\;\;ToM\;/\;{N}_{i}) - (Cognitive\;Adjusted\;\;ToM\;/\;{N}_{i}) / (Affective\;Adjusted\;\;ToM\;/\;{N}_{i}) + (Cognitive\;Adjusted\;\;ToM\;/\;{N}_{i});$$
$$Cognitive/Affective\;Response\;Time\;index\;({CA}_{RT}) = (((Affective\;Adjusted\;ToM\;/\;{RT}_{i}) - ((Cognitive\;Adjusted\;ToM\;/ {RT}_{i}))*-1).$$

A CAA/CART index near 0 suggested a balance between affective and cognitive ToM; a positive index indicated a higher affective ToM level than cognitive ToM, and a negative index suggested a higher level of cognitive ToM than affective ToM.

Summary statistics were performed to show Italian normative data of the Yoni 98 item version.

Development of Italian Yoni short forms

Cronbach’s alpha, McDonald’s omega, Cronbach’s alpha if dropped and McDonald’s omega if dropped were considered for all 98 item of Yoni to select the items’ pool for the development of the Yoni short versions. To this purpose, a balanced number of items for each ToM component (first-order; second-order; affective; cognitive) was considered, and only items contributing to a high Cronbach’s alpha, McDonald’s omega of the scale (alpha/omega 0.8–0.9) were included in the short version.

Sample size calculation

To compute sample size adequacy, statistical power for correlation coefficients and Cronbach’s alpha was performed considering statistical tests aimed to detect correlation and Cronbach’s alpha coefficients of 0.75 with a null coefficient of 0.6. Notably, we considered power calculations for correlations as valid also for the Cronbach’s alpha coefficients as they distribute as the test–retest correlation of the scale scores (Heo et al., 2015). Power calculations for multivariate regression were performed considering medium Cohen’s f2 effect size. According to the above power calculations, the sample size used (N = 175) is sufficient to guarantee a statistical power (1-β) above 90% for all the statistical tests here reported.

Results

Participants

Data of 175 healthy subjects (65 males, mean age = 38.4 ± 20.6, mean years of education = 14.8 ± 3.17) were included in the analysis. Table 1 shows the characteristics of the subjects for age groups.

Table 1 Characteristics of the participants for age groups

Internal consistency, inter-item reliability, convergent and discriminant validity of Yoni-98

The 98-item Yoni version showed a mean item accuracy of 0.90 ± 0.12, a high Cronbach’s α of 0.90, and a high McDonald’s ω = 0.89. The tool demonstrated a high inter-item reliability: split-half parallel reliability Spearman-Brown ϱSP median = 0.90, 95% HDI = 0.85–0.93. Also, it showed a good convergent validity by reporting a statistically significant correlation with RMET (Yoni total: ϱ = 0.260, p < 0.001; 2ORD: ϱ = 0.252, p < 0.001; AFF: ϱ = 0.287, p < 0.001; COG: ϱ = 0.197, p = 0.009; 1ORD COG: ϱ = 0.170, p = 0.025; 2ORD AFF: ϱ = 0.286, p < 0.001; 2ORD COG: ϱ = 0.187, p = 0.013). Finally, a good discriminant validity was reported with a no significant correlation with GT (Yoni total: ϱ = 0.04, p = 0.872; 1ORD: ϱ = 0.02, p = 0.797, 2ORD: ϱ = 0.01, p = 0.797; AFF: ϱ = 0.08, p = 0.312; COG: ϱ = -0.05, p = 0.512; 1ORD AFF: ϱ = 0.02, p = 0.797; 2ORD AFF: ϱ = 0.07, p = 0.361; 1ORD COG: ϱ = 0.04, p = 0.585; 2ORD COG: ϱ = -0.04, p = 0.584).

Effects of sex, age and years of education on Yoni-98

Table 2 shows the simultaneous regression models (Table 2).

Table 2 Simultaneous regression models of demographic variables on 98-item Yoni

The regression models revealed age as a significant predictor on second-order, affective ToM accuracy Yoni scores and all Yoni response time scores. Years of education were demonstrated to be significant predictors on all Yoni-98 accuracy scores except for first-order ToM, but not on response time scores. No predictive effect of sex on Yoni scores was observed.

Normative values to convert raw scores are proposed in Table 3, and an example of a scoring sheet is provided in Table 4.

Table 3 Adjustment for age and Years of education of Yoni-98 raw scores
Table 4 Yoni-98 scoring sheet example

Adjusted Yoni scores in the Italian population

Table 5 reports summary statistics of adjusted total scores of Yoni-98.

Table 5 Italian normative data of Yoni task

Development of Yoni short versions

By selecting items that highly contributed to the internal consistency of the scale, two short versions of Yoni were created (see Table 6), a 48-item (Yoni-48) and a 36-item (Yoni-36) versions. Both the Yoni short versions presented a high internal consistency (Yoni-48: α = 0.90/ ω = 0.90; Yoni-36: α = 0.88/ ω = 0.86) and a good mean item accuracy (Yoni-48: M = 0.87 ± 0.14; Yoni-36: M = 0.89 ± 0.14). Also, both version revealed a high convergent validity, correlating with RMET (Yoni-48; total: ϱ = 0.253, p < 0.001; 2ORD: ϱ = 0.251, p < 0.001; AFF: ϱ = 0.293, p < 0.001; COG: ϱ = 0.175, p = 0.021; 2ORD AFF: ϱ = 0.295, p < 0.001; 2ORD COG: ϱ = 0.185, p = 0.014. Yoni-36; total: ϱ = 0.233, p = 0.002; 2ORD: ϱ = 0.229, p = 0.002; AFF: ϱ = 0.286, p < 0.001; COG: ϱ = 0.151, p = 0.046; 2ORD AFF = ϱ = 0.287, p < 0.001; 2ORD COG: ϱ = 0.155, p = 0.041), and a good discriminant validity, showing no correlation with GT (Yoni-48; total: ϱ = -0.020, p = 0.789; 1ORD: ϱ = 0.062, p = 0.414; 2ORD: ϱ = -0.028, p = 0.712; AFF: ϱ = 0.051, p = 0.503; COG: ϱ = -0.033, p = 0.662; 1ORD AFF: ϱ = 0.061, p = 0.424; 1ORD COG: ϱ = 0.071, p = 0.351; 2ORD AFF: ϱ = 0.032, p = 0.674; 2ORD COG: ϱ = -0.023, p = 0.762. Yoni-36; total: ϱ = -0.020, p = 0.772; 2ORD: ϱ = -0.039, p = 0.612; AFF: ϱ = 0.041, p = 0.586; COG: ϱ = -0.040, p = 0.601; 2ORD AFF: ϱ = 0.002, p = 0.979; 2ORD COG: ϱ = -0.025, p = 0.745).

Table 6 Item pool of Yoni-48 and Yoni-36

Table 7 depicts the number of items dedicated to each ToM component evaluated in the full and short versions.

Table 7 Structure of the Yoni versions

Discussion

Yoni constitutes a useful ToM measure to investigate neurocognitive impairment in atypical conditions. However, the absence of validation and normative data makes it less suitable for adoption in clinical practice. To this purpose, our first aim consisted in exploring the psychometric characteristic of the Italian version of the Yoni task (Yoni-98; Rossetto et al., 2018) in terms of both validity and reliability.

As expected, our results revealed the Yoni-98 score as highly valid and reliable, reporting an adequate internal consistency for both clinical and research purposes (Streiner, 2003; Tavakol & Dennick, 2011) and a significant association with the RMET, which is one of the most used advanced ToM tests (Vellante et al., 2013).

Then, to enhance the reliable application of the tool, normative data were generated, and two composite scores were proposed to detect: 1) the global level of ToM and 2) the balance between affective and cognitive ToM ability.

To our knowledge, only a few works on ToM measures reported validation (e.g., Olderbak et al., 2015; Vellante et al., 2013) as well as normative data (e.g., Delgado-Álvarez et al., 2021; Dodich et al., 2015). This is the first study reporting normative data on the Italian version of Yoni, with many implications in terms of theoretical values and practical contributions.

Especially, disposing of normative data allows exploring the nature of human social behaviour processes net of confounding variables such as demographical characteristics. In fact, different contributions attest to the link between demographical variables and social cognition performance (Baksh et al., 2018; Chiasson et al., 2017; Isernia et al., 2020; Rosi et al., 2016), highlighting the need to consider this association in ToM test scores.

Finally, our work provides formulas to compute the total scores of the Yoni task, both in terms of accuracy and response time. The ratio underlying the creation of these scores responds to the complex nature of ToM processes. In fact, different orders of ToM meta-representations, such as the first and second-order ToM inference, are recognized as different levels of mentalizing competence, with a specific role in ToM achievement during child development (Valle et al., 2015; Wimmer & Perner, 1983). Our data revealed no effects of sex, years of education and age on the accuracy of the first-order ToM items. The absence of an influence of demographical variables on first-order mentalizing is in line with developmental research (Flavell, 1999; Hughes & Leekam, 2004; Repacholi & Slaughter, 2004), demonstrating the achievement of this ability already in the pre-school age. Instead, as expected, we found that only age was a predictor of the response time of first-order ToM items, mirroring the age-related behavioural slowing in sensorimotor tasks (Yordanova et al., 2004). Concerning the second-order ToM inferences, we highlighted the predictive role of age and years of education in both accuracy and response time. This evidence supports the complexity of this ToM component, resorting to high cognitive load processes, with a predominance of executive functions (Sandoz et al., 2014). Based on these findings, we generated normative data separately for first, and second-order ToM inference and subsequentially computed the total score.

Moreover, in line with the neuropsychological and neuroscience evidence demonstrating the co-existence of affective and cognitive ToM, with related separate brain networks and neurochemical systems (Abu-Akel & Shamay-Tsoory, 2011), new indexes have been proposed. In details, the CAA and CART index have been conceived to detect the balance between these two components.

The practical contributions of this work are vast. Especially, normative data are essential to include the Italian Yoni task into a battery for the neurocognitive abilities assessment. These scores might be relevant to detect a dissociate impairment in ToM function in the clinical setting. Indeed, several contributions attested the dissociation between affective and cognitive components of ToM in atypical development conditions (e.g., Isernia et al., 2019, 2020; Poletti et al., 2011; Rossetto et al., 2018): while some neurodegenerative diseases, such as Alzheimer and Parkinson's disease, show a specific impairment in cognitive but not affective ToM (Kemp et al., 2012; Rossetto et al., 2018), psychiatric disorders, as schizophrenia, present the opposite trend, with heavier damage in the affective mentalizing (Shamay-Tsoory et al., 2007). Instead, as reported by our data, a dissociation between the two ToM components does not emerge in healthy populations, reporting a CA index near 0, suggesting the balance between affective and cognitive mentalizing in typical life-span conditions.

Finally, to answer the need to provide quick tools for agile use during the neuropsychological assessment, we developed and validated two short versions of the Yoni task from the 98-items version, Yoni-48, and Yoni-36. These short versions have been created also with the purpose to balance the number of items assessing separate ToM subdomains, such as first- and second-order, affective, and cognitive components. Focusing on the psychometric proprieties of these two versions, we observed high reliability and a good convergent validity in both Yoni-48 and Yoni-36. In particular, while Yoni-48 showed higher reliability (α = 0.90), suitable for the adoption of the tool in clinical practice, Yoni-36 offers an optimal balance between ToM components assessed (8 items for each ToM domain). Both versions are able to provide a fast and complete evaluation of mentalistic abilities, essential features for the adoption in the clinic of these instruments.

Future contributions should provide evidence on the construct validity of the Yoni task by performing factor analysis. Also, normative data for the Yoni short versions both in the Italian healthy and clinical population have to be provided to adopt these instruments for the assessment of ToM in populations presenting social cognition deficit, such as people with schizophrenia, autism spectrum disorder, and neurodegenerative conditions. Also, the sensitivity and specificity of Yoni full and short versions should be investigated to reveal the potential diagnostic accuracy of the tool. Finally, the development of parallel forms of the Yoni tasks would offer the possibility to longitudinally evaluate ToM abilities in clinical practice, also in terms of rehabilitation outcomes.

This contribution is not without limitations: future works are needed to replicate these findings with a larger population, characterized with additional demographic factors, such as occupational background, social participation in daily life, etc. Also, future studies will need to test the convergent validity with additional ToM tools, assessing different ToM components, such as both affective and cognitive mentalizing, to confirm the results of the present work.

In conclusion, for the first time, this work presented a validation of the Italian version of the Yoni task and two short-versions of the test by proposing normative data and composite score formula. The implications of this contribution are vast both for research purposes and clinical practice. In fact, short versions with appropriate psychometric properties will allow a comprehensive examination of ToM abilities in a setting of neuropsychological screening where time is short, helping to plan and monitor a psychosocial rehabilitation intervention adequately (Rossetto et al., 2020).