Testing Children’s Mentalizing in Middle Childhood: Adopting the Child and Adolescent Reflective Functioning Scale with Clinical and Community Children

Beginning with Ensink’s seminal study (2015), the field entered a new era in which we were able to measure mentalizing in school-aged children. The goal of this work is to continue developing the state of the research within this tradition by exploring the psychometric properties of the Child Reflective Functioning Scale (CRFS) - a measure applied to the Child Attachment Interview and designed to assess RF during middle childhood - within both clinical and normative groups, and to examine if it differentiates between both groups. Participants were 159 Italian children (age range 8–12 years, Mage = 10.66, SD =1.83; 57% males) divided into two groups: 71 children with emotional-behavioral problems (‘clinical group’) and 88 children without emotional-behavioral problems (‘community group’). Demographic data and Verbal Comprehension Index were collected. A two-factor model of CRFS (self- and other-focused RF) was confirmed in both groups, revealing that these intrapersonal and interpersonal indicators of children’s RF are important domains of mentalizing abilities in middle childhood. The results revealed adequate inter-rater reliability of the CRFS. Controlling for the effect of potentially confounding demographic variables, significant differences on CRFS scales between clinical and community groups were found. The clinical group showed lower levels of CRFS scores than normative group, but importantly, the self-focused RF score uniquely predicted clinical/community status. Taken together, the findings showed that the CRFS is a reliable and validity measure for assessing RF in middle childhood with clinical and normative groups, contributing important information to the scientific literature on mentalizing in middle childhood. The CRFS is a reliable and valid measure used to assess RF in middle childhood, discriminating between clinical and community children (children with and without emotional behavioral problems). Testing the psychometric proprieties of CRFS through the use of clinical and community groups, we found that the measure achieved adequate internal consistency, high inter-rater reliability, with a two-factor solution of CRFS (self- and other-focused RF). In both the clinical and the normative samples, CRFS scores were associated with age, gender and verbal competence, while they were not associated with SES and family composition. Clinical group showed lower levels of CRFS scores of all three dimensions (general, self- and other-focused RF) than the normative group, and only self-focused RF predicted clinical/community status in middle childhood. New evidence regarding the similar factorial invariance of CRFS for clinical and community children suggest that the two intrapersonal and interpersonal components of children’s RF are reliable dimensions of mentalizing within clinical and community children during middle childhood. The CRFS is a reliable and valid measure used to assess RF in middle childhood, discriminating between clinical and community children (children with and without emotional behavioral problems). Testing the psychometric proprieties of CRFS through the use of clinical and community groups, we found that the measure achieved adequate internal consistency, high inter-rater reliability, with a two-factor solution of CRFS (self- and other-focused RF). In both the clinical and the normative samples, CRFS scores were associated with age, gender and verbal competence, while they were not associated with SES and family composition. Clinical group showed lower levels of CRFS scores of all three dimensions (general, self- and other-focused RF) than the normative group, and only self-focused RF predicted clinical/community status in middle childhood. New evidence regarding the similar factorial invariance of CRFS for clinical and community children suggest that the two intrapersonal and interpersonal components of children’s RF are reliable dimensions of mentalizing within clinical and community children during middle childhood.

• In both the clinical and the normative samples, CRFS scores were associated with age, gender and verbal competence, while they were not associated with SES and family composition. Clinical group showed lower levels of CRFS scores of all three dimensions (general, self-and other-focused RF) than the normative group, and only self-focused RF predicted clinical/community status in middle childhood.
• New evidence regarding the similar factorial invariance of CRFS for clinical and community children suggest that the two intrapersonal and interpersonal components of children's RF are reliable dimensions of mentalizing within clinical and community children during middle childhood.
One of the most fundamentally human capacities is the tendency to look to others' minds as sources of information and meaning about the world. Mentalizing refers to the individual's ability to hold others' minds in mind (Fonagy et al., 2002;Fonagy & Target, 1997). This capacity allows individuals to perceive both the self and others in terms of mental states (e.g., needs, desires, feelings, beliefs, goals, thoughts, intentions, and motivations), thereby rendering the contents of others' minds into something that is understandable, predictable, and contains meaningful sources of information. Mentalizing skills are thought to originate in the context of attachment relationships (Bateman & Fonagy, 2004;Fonagy & Target, 2006): secure attachment within early childcaregiver relationships provides a necessary precondition for mentalizing to emerge as a developmentally-acquired capacity that is enacted both explicitly and implicitly (Fonagy et al., 2007;Shai & Belsky, 2017). Conversely, disruptions in attachment relationships have been associated with impairments in mentalizing, which in turn confer risk for a variety of psychological disorders, such as borderline personality disorder (Bateman & Fonagy, 2004), antisocial personality disorder , eating disorders (Skårderud, 2007), and major depressive disorder (Luyten et al., 2012).
From the attachment perspective, the Reflective Functioning (RF; Fonagy et al., 1991) framework provides a methodology for assessing mentalizing, specifically the capacity to consider close relationships and the self in terms of mental states. In other words, RF is inherently a dynamic skill used within interpersonal relationships to make meaning of behavior and interactions. RF may be especially important in the context of challenging life and interpersonal circumstances (Ensink et al., 2015). Indeed, researchers have argued that RF is a protective factor in the context of life adversity, perhaps because it enables people to reflect upon their thoughts and feelings following negative life experiences, facilitating deeper processing of such adversity (Borelli et al., 2015;2019;Ensink et al., 2015).
Considering that RF does not develop and solidify early but emerges over time according to normative developmental milestones and the particular characteristics and circumstances of the child (Fonagy & Target, 1997), it is important to have several methods of measuring RF along the lifespan. In this respect, well-validated measures to assess mentalizing exist in adulthood through the application of an elaborate coding procedure to transcripts of the Adult Attachment Interviews (AAIs; George et al., 1985), the Parent Development Interview (PDI; Slade, 2005), and the Insightfulness Assessment Procedure (IAP; Oppenheim & Koren-Karie, 2013). Conversely, there are a paucity of well-validated measures to assess mentalizing in childhood and in adolescence. Indeed, only the Reflective Function Questionnaire -Youth (RFQ-Y; Ha et al., 2013), a self-report measure adapted from the adult version of the RFQ  exists in adolescence to measure child's mentalizing. This instrument assesses two types of impairments in RF, namely hyper-mentalization (described as mentalizing content without appropriate contextual or observable supportive data; Frith, 1994) and hypomentalization (described as mentalizing content completely opaque and accordingly unapproachable; Fonagy et al. 2016). In middle childhood, only the Friends and Family Attachment Interview (FFI; Steele et al., 2005) and the Child and Adolescent Reflective Functioning Scale (CRFS; Ensink et al., 2013), are suitable measures of RF.
Youth in middle childhood are actively developing cognitively, emotionally, and socially and their narratives reflect this dynamic process in which, in contrast to adults, the verbalizations of one's own and other's states of mind are not yet fully elaborated . Building upon the increasing language capacities and emotional-cognitive changes in middle childhood (Carr, 2017), the CRFS is an agespecific RF measure capable of distinguishing between developmentally determined limitations in narrative abilities and mentalizing difficulties (Ensink et al., 2015). It is expected that when speaking about oneself and one's close relationships, children can describe specific incidents with reveal something about the self, their interpersonal interactions, and their affective reactions. Generating these narrative descriptions requires that children engage in the retrieval of specific episodic memories. These episodic or autobiographical memories are expected to provide a good indicator of the child's knowledge of mental states and of both intrapersonal and interpersonal thinking .
The CFRS was adapted from the Adult Reflective Functioning Scale (Fonagy et al., 1998) and it was designed for use with the Child Attachment Interview (CAI; Shmueli-Goetz et al., 2008). It investigates specific subdomains of RF, including awareness of qualities of mental states, explicit efforts to understand mental states underlying behavior, recognizing that mental states develop in the contexts of developmental, psychobiological, and social processes, and mental states in relation to the interview . It permits the assessment of multiple dimensions of RF (Allen et al., 2008;Fonagy et al., 2002;Suchman et al., 2010): children provide self-descriptions as well as descriptions of their attachment relationships, which raters score to obtain indicators of mentalizing regarding self (i.e., the self-focused RF) as well as attachment figures (i.e., the other-focused RF); these scores are then combined to provide a global RF indicator (i.e., the global-RF). The original validation study of CRFS, based on an American sample, confirmed the reliability and validity of the CRFS coding system: the excellent intraclass correlation coefficients for global-RF, self-focused RF and other-focused RF; the discriminant validity; and the association between maternal RF (measured with the PDI) and child RF (measured with the CRFS) were all demonstrated through this initial study (Ensink et al., 2015).
Curiously, as illustrated in greater detail in the Appendix, beyond this preliminary study, no previous studies have been conducted on the CRFS to establish its psychometric properties, leaving several questions unanswered in the literature. To date, the CRFS has been most frequently used to assess RF with children exposed to abuse (Ensink et al., 2015Tessier et al., 2016) and in community samples of school-aged children (Bizzi et al., 2020a;Borelli et al., 2018;Rosso & Airaldi, 2016). Some studies using the CRFS have focused on specific clinical groups, as children with disruptive behavior disorders, somatic symptom disorders (Bizzi et al., , 2020b, adolescents with borderline traits (Sharp et al., 2020;Vanwoerden et al., 2019), and children with Type 1 diabetes (Costa-Cordella et al., 2021). And yet, somewhat surprisingly, no studies to date have examined RF using the CRFS using a combined clinical-community sample. In other words, no studies have tested whether RF scores on the CRFS discriminate between clinical and non-clinical samples. This constitutes a major gap in the literature in that we do not yet know whether RF scores on the CRFS are contribute clinically meaningful information.
Therefore, beginning with Ensink's seminal study (2015), and considering the growing body of evidence of the associations between child RF and psychopathology, the goal of this work is to continue developing the state of the research within this tradition by exploring the psychometric properties of the Child Reflective Functioning Scale (CRFS) using a different sample consisting of children with and without emotional-behavioral problems ('clinical' and 'normative' groups). Specifically, we aim: (1) to confirm the internal structure (two-factor solution) of CRFS as suggested by Ensink et al. (2015) study, by testing it on clinical and community groups; (2) to test measurement invariance of CRFS (self-and other-focused RF) in both clinical and community groups; (3) to test the inter-rater reliability of two independent coders; (4) to establish the CRFS' independence from potentially confounding demographic features (age, gender, socioeconomic status, family composition) and verbal competence in both clinical and community groups; and (5) to test whether CRFS scores discriminate between clinical and community groups.
The first two aims highlight an open question in the research literature as to whether RF is a multidimensional construct (Ensink et al., 2015). Within the adult mentalizing literature, scholars argue that RF is multidimensional, comprised of both self-and other-focused components (Suchman et al., 2010), with each of these dimensions uniquely associated with relevant constructs. For instance, self-focused RF is associated with maternal contingent behavior in substance dependent mothers of toddlers (Suchman et al., 2010) while child-focused RF is associated with lower overcontrol in a community sample of mothers of school-aged children (Borelli et al., 2017). Although the scoring procedure assumes the CRFS to be two-dimensional, having both self-and other-focused components, this assumption has never been tested empirically in both clinical and community groups. Thus, we use confirmatory factor analysis (CFA) to confirm the two-factorial solution of CRFS and multi-group CFA to test group invariance. Further, the inter-rater reliability has been demonstrated for the general RF score (Ensink et al., 2015) but not for two dimensions of CRFS (self-and other-focused RF), and thus is another important area in need of investigation. In addition, we contribute to the validity studies reported above by testing the associations between RF scores on the CRFS and demographics and verbal competence within both a clinical and community samples. On the basis of extant literature (Ensink et al., 2015;Rosso & Airaldi, 2016;Vanwoerden et al., 2019), we hypothesize that the CFRS assesses mentalizing independently of socio-demographic variables and verbal comprehension. Last, based on the argument that impairments in RF confer risk for a variety of specific psychological disorders (Bizzi et al., , 2020bCosta-Cordella et al., 2021;Ensink et al., 2015Ensink et al., , 2016Ensink et al., , 2017Sharp et al., 2020;Vanwoerden et al., 2019), we hypothesize that CRFS will be significantly lower within the clinical as compared to the community groups, and thus will discriminate between the two.

Participants
The overall sample consisted of 159 Italian children from North-West Italy, uniformly distributed with respect to age and gender (age range 8-12 years, M age = 10.66, SD = 1.83; 57% males), and their families approached without any material incentive offered. The overall sample was divided into two groups: 71 children with parent-reported emotional-behavioral problems ('clinical group') and 88 children without parent-reported emotional-behavioral problems ('community group'). The 'community group' was recruited from urban and rural schools of Genoa (Italy), while the children of 'clinical group' were consecutively admitted outpatients from the Child Neuropsychiatry and Psychology Units of the Gaslini Children's Hospital (Genoa, Italy) in which the assessment report constituted a diagnostic deepening in addition to that conducted by the psychologists of the hospital.
The present study adopted the following inclusion criteria. For both groups: (a) children were between 8 and 12 years of age (middle childhood), and (b) used Italian as their primary language. Within the clinical group, (c) children exceeded the clinical cut-off for the emotional-behavioral problems (total problems scored t ≥ 65) of the Child Behavior Checklist 6-18 Version (CBCL; Achenbach and Rescorla, 2001) while children in the community group had scores below this clinical cut-off (total problems score t ≤ 65). Accordingly, children in community group did not show emotional-behavioral problems in their total subscale scores (M = 49.94, SD = 8.55), whereas the clinical group's scores were higher (CBCL total problems score: M = 68.51, SD = 3.15).
Gender, age, SES (high > 36.000 €/y, moderate from 15000 to 36000 €/y, and low < 15.000 €/y), family composition (two parent families, single parent families, families with step-parents), and verbal competence of the overall sample are shown in Table 1. The groups did not differ in terms of gender and age, but significantly differed in SES, family composition and verbal competence. As a result, we included SES, family composition and verbal competence as covariates in subsequent analyses.

Measures
The Child Attachment Interview (CAI; Shmueli-Goetz et al., 2008) is a semi-structured interview used to assess the child's attachment representations for both parents. The current CAI protocol contains 19 questions (CAI revised edition VIII; Shmueli-Goetz et al., 2008) and the CAI manual recommends that both video-recordings and verbatim transcripts are used by coders in assigning ratings. The interview begins with a brief introduction in which the examiner explains to the child what he or she will be asked: "This is an interview about you and your family. I will ask you some questions first about you and then about your relationship with your parents. For each question I will ask you to give me some examples. This interview is not a test and there are no right or wrong answers. I just want you to tell me some things about you and your family from your point of view. The interview will last about half an hour, post-hoc test: significant difference (p < 0.05) between the two groups in the following categories: yearly house income lower than 15.000 €yearly house income greater than 36.000 € c 31.8% yearly house income greater than 36.000 € and 10.2% yearly house income lower than 15.000 € d 8.5% lives in stepfamilies and 22.5% lives in single-parent families e post-hoc test: significant difference (p < 0.05) between the two groups in the following categories: single-parent familiestwo parent families f 7.95% lives in stepfamilies and 2.3% lives in single-parent families maybe a little longer". The questions in the CAI tap the child's self-representation, representations of his/her primary caregivers, times of conflict, distress, illness, hurt, separation, and loss. Coders use eleven 9-point scales, then placing children into one of four best-fitting attachment classifications (secure, insecure-dismissing, insecure-preoccupied, disorganized) on the basis of the distribution of the scale scores as well as a consideration of the child's non-verbal behavior. The CAI's psychometric properties have been demonstrated through a series of studies exploring its reliability and construct validity in both clinical and community samples across multiple cultures (e.g., Bizzi, 2019;Bizzi & Pace, 2020;Bizzi et al., 2018Bizzi et al., , 2021aBizzi et al., , 2021bCavanna et al., 2018;Shmueli-Goetz et al., 2008;Venta et al., 2014). In the current study, the CAI's inter-rater reliability (ICC (3, k) = 0.946) was demonstrated excellent in all cases (N = 159).
The Child and Adolescent Reflective Functioning Scale (CRFS; Ensink et al., 2013) is a measure of RF designed for youth between the ages of 7 to 17 that was designed to be applied to CAI transcripts. The CRFS was adapted from Adult Reflective Functioning Scale (Fonagy et al., 1998) for use with school-aged children, purporting to assess children's ability to mentalize regarding themselves and their attachment figures. The CRFS manual  contains descriptions and examples of different levels and types of CRF. Children's narratives are coded on an 11point scale (−1 to 9) descriptively anchored at six points in terms of the degree to which children's responses reveal a propensity to consider interpersonal interactions and personal reactions in mental state terms. The different levels of RF are here reported. −1: bizarre, disorganized response or avoided mentalization; 0: absence of mentalization; 1: selfdescription in terms of behavior, non-mental characteristics; 2: descriptions without explicit reference to mental states; 3: some vague, basic but unelaborated references to mental states; 4: recognition that the experience of negative affect may elicit responses from others, which in turn can help to soothe or regulate the affect in various ways; 5: clear description of mental states showing a solid understanding; 6: clear but intentionally communicates of mental states; 7: understanding that different people may perceive a given behavior or situation differently often based on differing knowledge of the situation or false belief; 8: unusually nuanced understanding of reactions of self and other that also incorporates a sense of feelings and reactions changing over time; 9: sophisticated mentalization capacity (Ensink et al., 2015). Examples of the CAI questions analyzed for Child Reflective Functioning Scale's coding are reported in Table 2. To obtain a general indicator of children's RF (CRF-G), we calculated the mean RF of all the coded responses. To obtain an indicator of self-focused RF (CRF-S), the mean RF for the four items eliciting self-descriptions and the child's reactions in response to upsetting events was used. Furthermore, an indicator of other-focused RF (CRF-O) was calculated based on the mean RF on the nine questions regarding the child's relationships with their parents and a description of parents' reactions when get upset or when they argue. The CRFS' psychometric properties have been demonstrated by Ensink's studies (Ensink, 2004;Ensink et al., 2015). In this study, the scale alpha was 0.94, and item-total correlations ranged from 0.57 to 0.79, confirming that the CRFS could be used as a good indicator of RF. The Wechsler Intelligence Scale for Children-IV (WISC-IV; Wechsler, 2003) is a well-established and validated measure of the intellectual abilities of children aged 6-16. In this study, we employed the Verbal Comprehension Index (VCI) to measure the verbal competence, which is the sum of weighted points in similarities, vocabulary and comprehension subtests. In the similarities subtest, the child is required to explain in ways two words are similar that refer to common objects or concepts (e.g. "In what ways are apples and bananas alike?"); for the vocabulary subtest, the child verbally defines a set of words (e.g. "What is a hat?"); and for the comprehension subtest, the child responds verbally to questions about solutions to everyday problems expressing an understanding of rules and concepts (e.g. "What are the reasons for turning off the lights when no one uses them?"). In the Italian version (Orsini et al., 2012), Cronbach's α is 0.96 for the VCI, ranging 0.69 (comprehension) to 0.94 (vocabulary). In this study, Cronbach's α was 0.80 for the VCI.
The Child Behavior Checklist (CBCL 6-18; Achenbach & Rescorla 2001) is a parent-report questionnaire used to assess emotional and behavioral problems in children aged 8-16 years. It is comprised of 112 items; each item is scored on a 3-point scale, ranging from 0 to 2. The CBCL 6-18 provides a score for Total Problems and three main indexed scores: Internalizing problems (including anxious/ depressed, withdrawn/depressed and somatic complaints), Externalizing problems (including rule-breaking behavior and aggressive behavior), and Other problems (including social problems, thought problems, and attention problems). The CBCL has good psychometric properties (Achenbach & Rescorla, 2001), and the Italian version was validated by Frigerio et al. (2004), displaying good validity and reliability with good internal consistency, with cut-offs comparable to the American population, and interrater agreement similar to the values reported by Achenbach. In our study, Cronbach's α was 0.92 for the Total Problems score.

Procedure
The study was approved by the Ethics Committee of the Gaslini Children's Hospital (Genoa, Italy) for the clinical group and the Ethics Committee of the Department of Educational Sciences (University of Genoa, Italy) for the community group, according with the Declaration of Helsinki. Data were collected in the last two years. Parents were provided with a written document describing the procedures and purpose of the study; all parents and their children provided informed consent and assent, respectively, and were informed that they could decline to participate in any part of the study. Only two participants of the community group elected not to participate because they were too busy at the time of the data collection. Therefore, the final sample (N = 159) represented 98% of those eligible. The assessment (individual sessions lasting around 60 min) was conducted (in a private room at the hospital for the clinical group and at participants' homes for the community group) by a psychologist/researcher who had previously been trained in the administration of the CAI (directly by one of the authors for two consecutive days) and the WISC-IV. In a separate room, parents were asked to complete a sociodemographic questionnaire and the CBCL (for clinical group it was previously administered by the psychologist of the hospital).
In this study, CRFS coding was carried out by the first two authors; they were supervised and/or received consultation by the developer of the CRFS (the last author). These coders were naïve to all participant characteristics and did not administer the interviews.
The analyses were carried out with Statistical Package for the Social Sciences (SPSS) and JASP, an open-source statistics program that allows both classical and Bayesian analyses including CFA.

Statistical Data Analyses
A priori statistical power analysis was conducted using G*Power software (Faul et al., 2009). Considering differences between two independent means (two groups), effect size = 0.5, significance level (α) = 0.05, or power (1-β) = 0.90, 70 observations for each group were needed to test the psychometric proprieties of CRFS. Then, our aims were assessed following five steps. First, we examined the supposed two-factorial structure of CRFS with confirmatory factor analysis (CFA). Second, we used multi-group CFA to investigate the factorial invariance of CRFS in clinical and community groups testing for configural, metric, scalar and strict measurement invariance. We used fit indices to test model fit using cutoff values generally indicating a good fit (Hu & Bentler, 1999): chi-square/df ratio (≤ 2 acceptable), comparative fit index (CFI over 0.90 acceptable), Tucker-Lewis fit index (TLI over 0.90 acceptable), root mean square error of approximation (RMSEA between 0.05 and 0.08 acceptable) and standardized root mean squared residual (SRMR < 0.08 acceptable). Comparison between different factor solutions has been made with chi-square difference test and a drop in CFI greater than 0.005 (Chen, 2007). Third, reliability of the CRFS was estimated with Cronbach's alpha coefficient and mean inter-item correlation. Fourth, analyses of variance, t-tests, Pearson correlations were used to test the association between CRFS, demographic features and verbal competence. Fifth, analyses of covariance and logistic regression analyses were used to evaluate the ability of the CRFS to discriminate between clinical and community groups.

Confirming the Internal Structure and Measurement Invariance of CRFS
Initially, we explored the items and score characteristics of the CRFS (see Table 3), demonstrating their adequate distributional characteristics.
In order to assess Aim One, a confirmatory factor analysis (CFA) was conducted on clinical and community groups combined to explore the two-factor solution (selfand other-focused RF) initially proposed in the realm of parental RF by Suchman et al. (2010) and introduced in the realm of children's mentalizing by Ensink et al. (2015). No missing values were reported. Table 4 presents the different models: First, we tested the proposed two-factor model (Model 1), which yielded a satisfactory factor solution. However, Model 1 significantly improved the fit by allowing error correlation between a few questions (2b -2c; 6 -7) with similar wording/content due to modification Indices values higher than 10 (Byrne 2016), resulting in Model 1a. All the standardized coefficients for the two-factor model were statistically significant (p < 0.001) (see Table 6). Secondly, we tested a one-factor model (Model 2), which showed a worse fit than the previous two-factor model (X 2 (3) = 40.2, p < 0.001, Δ CFI = 0.04).
In order to assess Aim Two, measurement invariance of model 1 between clinical (N = 71) and community (N = 88) groups was tested with a multi-group CFA elaborated with increasingly restrictive models (see Table 5). First, we tested configural invariance (Model 3); note that here, obtaining a good model fit indicates that the two-factor model is associated with the same items across both groups. Secondly, we tested metric invariance (Model 4); here we obtained a good model fit, except SRMR value, but worse fit than the model 3 (X 2 (13) = 23.35, p < 0.05, Δ CFI = 0.01) due to factor loadings that were not equal across groups. Moreover, all items had substantial and statistically significant loadings on their respective factors in both groups. Third, we tested scalar invariance (Model 5), obtaining a poor model fit that was significantly worse than Model 4 (X 2 (11) = 29.24 (11), p < 0.001, Δ CFI = 0.02) due to different group means for clinical and community groups. In total, our findings confirm the two-factorial solution of CRFS proposed by Ensink et al. (2015) and indicate factorial invariance across both clinical and community groups.
Internal consistency reliability in the total sample for the subscales of the two-factor solution was acceptable both for self-focused RF (α = 0.75, mean inter-item correlation = 0.44) and other-focused RF (α = 0.90, mean inter-item correlation = 0.51). The same result occurred both in clinical (self-focused RF: α = 0.67, mean inter-item correlation = 0.34; other-focused RF: α = 0.89, mean inter-item correlation = 0.48) and community group (self-focused RF: α = 0.66, mean inter-item correlation = 0.34; other-focused RF: α = 0.85, mean inter-item correlation = 0.40). Internal consistency could not be improved by the deletion of any item. Thus, Aims One and Two were supported in

Evaluating Inter-Rater Reliability of CRFS
In order to assess Aim Three, we tested inter-rater reliability of CRFS scores (self-focused RF and other-focused RF), and their composite indicator (global-RF) using two independent coders (first and second author, supervised by the last author) that were blind in terms of clinical status of participants. In line with Ensink's study (2015), inter-rater reliability (self-focused RF: ICC (3, k) = 0.98; otherfocused RF: ICC (3, k) = 0.98; global-RF: ICC (3, k) = 0.98) was calculated on 30% of interviews. This value is indicated as "excellent" inter-rater reliability for all RF dimensions according to Cicchetti (1994). In sum, Aim Three was confirmed demonstrating a proper inter-rater reliability.

Association of CRFS with Demographic Features and Verbal Competence
In order to assess Aim Four, we tested the relations of RF with demographic features and verbal competence through r Pearson correlations, t tests for independent samples and analyses of variance. Firstly, we tested the association between RF and age in the total sample. Here we found a positive association with all RF scales (self-focused RF: r = 0.20, p < 0.05; other-focused RF: r = 0.25, p < 0.01; global-RF: r = 0.25, p < 0.01). Specifically, self-focused RF (r = 0.34, p < 0.01), other-focused RF (r = 0.34, p < 0.01) and global-RF (r = 0.37, p < 0.01) were positively associated with age in the community group. Only other-focused RF (r = 0.24, p < 0.05) was positively associated with age in the clinical group, whilst no associations were found with self-focused RF (r = 0.10, p > 0.05) and global-RF (r = 0.22, p > 0.05). Model 1 = the baseline two-factors model; Model 1a = the final two-factors model adding error covariances between items 2b -2c, 6 -7; Model 2 = the one-factor model; Model 3 = configural measurement invariance; Model 4 = metric measurement invariance; Model 5 = scalar measurement invariance Second, we tested the association between RF and verbal competence in total sample, finding a positive association with all RF scales (self-focused RF: r = 0.35, p < 0.001; other-focused RF: r = 0.41, p < 0.001; global-RF: r = 0.41, p < 0.001). Specifically, self-focused RF (r = 0.27, p < 0.05), other-focused RF (r = 0.38, p < 0.01) and global-RF (r = 0.37, p < 0.01) were positively associated with verbal competence in the clinical group. On the contrary, there was no association between RF and verbal competence in the community group (self-focused RF: r = 0.11, p > 0.05; other-focused RF: r = 0.15, p > 0.05; global-RF: r = 0.16, p > 0.05).
Third, we tested the association between RF and gender in total sample, finding a significant association with all RF scales. Specifically, gender was associated with all RF scores in the community group, such that females had higher scores on other-focused and global-RF in the clinical group (see Table 6). Fourth, we tested the association between RF and SES in total sample, finding a significant association with all RF scales, such that a yearly house family income greater than 36.000 € was associated with greater scores in RF. Interestingly, no association was found between RF and SES in clinical or community groups either alone (see Table 7).
Next, we conducted logistic regression analyses to assess whether CRFS capacity (i.e., self-and other-focused RF) discriminated between the two groups. Since analyses showed that SES, family composition, and verbal competence differed between the two groups, these variables were entered into the regression models as control variables. In the subsequent analyses, for each additional unit of selffocused RF, the odds of being categorized as a member of the clinical group decreased by 72% (B (SE) = −1.28 (0.38), OR = 0.28, 95% CI = 0.13 -0.56, p < 0.001). Other-focused RF, however, did not predict clinical/community group membership (B (SE) = −0.12 (0.30), OR = 0.89, 95% CI = 0.48, 1.61, p = 0.70).

Discussion
Over the past two decades, RF has emerged as an important transdiagnostic marker of psychological health, with significant ties to attachment security, emotion regulation, and clinical status (Katznelson, 2014). Although RF studies have mainly focused on parental and adult RF, children's RF is often neglected in the literature. Consequently, valid and reliable measures of RF for school-aged children have Single-parent families-two parent families 1 = yearly house income lower than 15.000 €; 2 = yearly household income between 15.000 and 36.000 €; 3 = yearly house income greater than 36.000 € *p < 0.05. **p < 0.01. ***p < 0.001 a post-hoc test: significant differences (p < 0.05) between groups in the following categories: yearly house income lower than 15.000 €yearly house income greater than 36.000 €; yearly household income between 15.000 and 36.000 € = yearly house income greater than 36.000 € been slow to emerge, representing a slowing progress in the field. The CRFS was developed to fill this need and researchers rushed to use the measure with children and adolescents with specific problems (Bizzi et al., , 2020bCosta-Cordella et al., 2021;Ensink et al., 2015Ensink et al., , 2016Ensink et al., , 2017Sharp et al., 2020;Vanwoerden et al., 2019). However, a particular focus on the psychometric proprieties of CRFS was lacking, particularly when it comes to clinical samples. In this sense, our study aims to fill this gap, testing several psychometric proprieties of CRFS and considering children with and without emotional-behavioral problems. Consistent with the Ensink et al. (2015) seminal study, our findings report adequate internal consistency of the CRFS as well as high inter-rater reliability. Additionally, these findings confirm the two-factor solution of CRFS (self-and other-focused RF) suggested, but not verified by Ensink's preliminary study. Moreover, this is the first study to add new evidence regarding the factorial invariance of CRFS, i.e., the factor structure (configural invariance) of the measure is similar for clinical and community children, suggesting that the two intrapersonal and interpersonal components of children's RF comprise internally consistent, reliable dimensions of mentalizing within clinical and community children during the middle childhood. This work constitutes an important contribution to the literature, adding to the body of work suggesting that RF can be conceptualized in terms of different components. Such a conceptualization, initially offered within the parental RF literature by Suchman et al. (2010), has advanced the field considerably, helping refine predictions.
Further, the analyses on demographic features and verbal competence found that CRFS scores were associated in both groups with age, gender and verbal competence, while they were independent of SES and family composition. More specifically, all CRFS scales were associated with age within the community group, while only other-focused RF was associated with age in the clinical group. Associations with children's age supported the idea that RF competencies are developmental acquisitions related to the increase of the child's meta-cognitive skills, as has previously been suggested by prior work (Vanwoerden et al., 2019). Indeed, as children reach middle childhood, their social network grows and includes more interpersonal experiences with individuals outside their family environment, which may act to further enrich social-cognitive abilities and consequently amplify their ability to reflect on their own minds and others' minds in a relational context (Borelli et al., 2016). Similarly, our study revealed gender differences; specifically, girls within the community group had higher RF than boys in all scales. However, in the clinical group, girls exhibited higher RF than boys only in other-focused and global-RF. This finding is consistent with the idea that girls in this age-range are more able to understand thoughts and emotions in others than boys (Bosacki & Astington, 1999). In addition, this mentalizing profile fits with idea that females exhibit better social and cognitive functioning, have better insight into their own mental states and they are generally more empathic than males (Baron-Cohen, 2003). Further, although mental state talk becomes a specific competence that across development departs from general language skills and contextual characteristics (Hughes et al., 2006), we found an association between verbal competence and CRFS within the clinical group. This supports the idea that language plays a crucial role in the development of mentalizing due to the fact that language is typically the tool used to represent and communicate the mental states (Vanwoerden et al., 2019). However, this finding leads us to argue that future research should aim to replicate these findings and collect verbal competence measures alongside the CRFS to explore the potential impact that verbal competence may have on child's mentalizing in different populations.
Controlling for cofounding demographic variables, our findings provide evidence that the CRFS is a reliable measure in discriminating between clinical and community children. The differences in RF scores between the two groups may suggest that the development of interpersonal problems, social information processing deficit isolation or aggression problems are due to difficulties to perceive both the self and others in terms of mental states (Midgley et al., 2017). In other words, this pattern suggests that poor RF may lead to emotional behavior problems, and mentalizing deficits may constitutes a transdiagnostic vulnerability factor for child psychopathology . This insight could be of great use to clinicians in their case conceptualizations, as mentalizing may be an important treatment target. However, further investigation in this area are needed.
In addition, these findings underscore the importance of conceptualizing RF as a multidimensional construct (Allen et al., 2008;Borelli et al., 2017;Fonagy et al., 2002;Suchman et al., 2010). Furthermore, the predictive power of only self-focused RF suggests that in middle childhood, the role of RF regarding others' mental states may not be mature enough to play a central role in child psychopathology. In other words, this suggests that emotional development become more apparent during adolescence and not earlier when social and peer relationships are more complex (Blakemore, 2017). Importantly, these findings are consistent with other workin another study of RF in middle childhood (measured using an interview following a stressor) only self-focused RF was related to maternal overcontrol (Borelli et al., 2017).
Considering the paucity of mentalizing measures in school-aged children, our findings are of paramount importance in that they permit the field to expand the previous studies already focused on the role of mentalizing in borderline personality pathology development during adolescence (Sharp et al., 2020;Vanwoerden et al., 2019) as well as in specific clinical conditions 2020b;Costa-Cordella et al., 2021) or abuse contexts (Ensink et al., 2015 in middle childhood. The capacity of CRFS to distinguish clinical from community children in middle childhood allows the clinician and researcher to have a measure useful in the assessment process as well as in the design of interventions for populations at risk and in the evaluation of child psychotherapy. Beginning with the growing interest regarding the role of RF in the process of psychological treatments for children and their families (Midgley et al., 2017), the development of psychometrically-sound measures like the CRFS will help to advance understanding of mentalizing processes as potential risk and protective factors included in increasingly complex etiopathogenetic models. In addition, the use of the CRFS requires relatively less investment in training, transcribing and coding the interview compared to what is needed to rate attachment. It is likely to present a worthwhile investment for clinicians and researchers given the lack of alternative measures. Furthermore, many child researchers have already been trained to use the CAI to assess attachment in school-aged children and could use the CRFS to examine the specific contribution of mentalization capacities (Ensink et al., 2015).
Despite the notable strengths of this study, there are several limitations that should be considered before making conclusions based on these findings. The sample size is relatively small, although adequate for testing the hypotheses of this study; the status of clinical or normative group is based only on the CBCL scores, therefore a multi-method assessment including teacher report and clinical interviews could be important to include in future research. The focus on a heterogeneous clinical group is another limitation of this study; studies including specific clinical samples (e.g., a sample with children with major depressive disorder only) will be an important complement to the current study in order to elaborate upon the relationships between mentalizing capacities and specific psychological difficulties. A longitudinal perspective (test-retest analysis) is lacking; research should investigate the stability of this measure to enforce the psychometric proprieties of this tool. Nonetheless, this study provides mounting evidence that supports the notion that the CRFS is a valid method to assess mentalizing during the middle childhood in clinical and community samples.
Author Contributions F.B. contributed to the study conception and design. Material preparation, data collection and analysis were performed by F.B. and S.C. The first draft of the manuscript was written by F.B. and S.C. and J.L.B commented on previous versions of the manuscript. All authors read and approved the final manuscript. English has been revised by the author of native English (J.L.B.).
Funding Open access funding provided by UniversitÃ degli Studi di Genova within the CRUI-CARE Agreement.

Compliance with Ethical Standards
Conflict of Interest The authors declare no competing interests.
Ethical Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee of the Gaslini Children's Hospital (Genoa, Italy) for the clinical group and the Ethics Committee of the Department of Educational Sciences (University of Genoa, Italy) for the community group, according with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Informed consent was obtained from all individual participants included in the study. Parents completed consent forms, and children completed assent forms. The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.