Background

CU traits (Callous-unemotional traits) are defined as affective-social deficits that characterize an extreme form of aggressive-dissocial behavior, marked by a pronounced inclination towards violence [1]. CU traits are predominantly examined in the context of psychopathy and are part of the concept of psychopathy in adulthood, but can occur as early as preschool age [2, 3]. Children and adolescents with callous-unemotional traits are characterized by a deficit in empathy and guilt, a tendency to manipulate others for personal benefit, and the expression of superficial emotions [4]. Specifically, CU traits identify a subgroup of children and adolescents characterized by persistent and severe conduct problems [4, 5]. These traits play a crucial role in capturing the affective dimension of psychopathy, functioning alongside other dimensions of psychopathic traits [1, 6]. Research has highlighted the significance of CU traits in relation to developmental outcomes like conduct problems [7, 8], impulsivity [9], aggressive behavior, and delinquency [10, 11].

The Inventory of Callous‑Unemotional traits (ICU)

To assess callous-unemotional traits, the Inventory of Callous-Unemotional Traits (ICU) [12] is commonly used in research studies [e.g., 13,14,15,16]. The questionnaire was designed to measure the callous and unemotional components of psychopathy. According to the ICU [12], callous-unemotional traits can be defined as a combination of three dimensions: (1) callousness, which involves a lack of concern or remorse for others; (2) uncaring, which refers to a lack of interest in others or one’s own performance; and (3) unemotionality, which involves a difficulty in expressing emotions [15]. Research conducted in various age groups and countries has also revealed alternative factor structures [e.g., 13, 17, 18], but the three-factor model has been consistently confirmed for different age groups (preschool: Ezpeleta et al. [2]; elementary school: Waller et al. [19]; adolescents: Pihet et al. [20], Roose et al. [21]), and methodologically operationalized to provide the best model fit in various samples [15, 22]. The meta-analysis by Cardinale and Marsh [22] shows satisfactory pooled Cronbach’s alpha values for the three subscales (callousness\(: \stackrel{-}{\alpha }\) = 0.75, uncaring: \(\stackrel{-}{\alpha }\) = 0.80, unemotional: \(\stackrel{-}{\alpha }\) = 0.71) and the total ICU (\(\stackrel{-}{\alpha }\) = 0.83). Additionally, their analysis points to the high validity of the ICU [22]. The three-factor model emerges as the model with the best fit for self-reported CU traits [15, 22] as well as parent- or caregiver-reported CU traits [2, 19].

The ICU is the only known measure that focuses solely on CU traits in children and adolescents including a three-factor structure. The ICU comes in five versions: ICU-Parent, ICU-Teacher, ICU-Youth, ICU-Parent Preschool and ICU-Teacher Preschool. However, despite these variations it does consist of the same 24 items (some items were formulated slightly differently through the versions while maintaining the same meaning) in each version and is divided into the three subscales measuring callousness, uncaring, and unemotional traits [12].

Callous-unemotional traits in different age groups

Studies highlighted that callous-unemotional traits are heritable, with estimates ranging from 36 to 67% [23]. Candidate gene studies link CU traits to the serotonin and oxytocin systems, while epigenetic changes in these genes are also associated with CU traits (see Moore et al. [23] for review). CU traits develop in early childhood (see Frick et al. [14] for review) and are moderately stable [4, 24]. In studies, aside from biological factors, environmental factors like parenting and attachment were also identified as crucial contributors to the development of callous-unemotional traits [25,26,27]. These environmental factors may be linked with the rise and stability of callous-unemotional traits [25].

During preschool age, children are still developing social-emotional competencies and learning how to regulate their emotions. Kimonis et al. [28] were able to show that children under six years of age who score high on the ICU show poor recognition of facial expressions, are less attentive towards distress cues, and are more likely to be antisocial, aggressive, and high on other psychopathy dimensions. However, preschoolers may also show some behaviors that resemble callous-unemotional traits, like a lack of remorse or guilt. These behaviors are typically not indicative of a stable personality trait and may be related to age-appropriate cognitive and emotional development, it’s important to recognize that personality can evolve over time as a function of development [18]. Nevertheless, Longman et al. [8] identified a large effect size for the relationship between conduct problem severity and callous-unemotional traits in early childhood.

In middle childhood and by elementary school age, children are better able to regulate their emotions and display more stable antisocial behavior. Even though antisocial behavior is less common in middle childhood than in adolescents [29, 30], some children may still exhibit callous-unemotional traits. These children may be at greater risk of developing conduct disorder [14].

During adolescence, the risk to engage in risky behavior and exhibit delinquent behavior increases [4, 31, 32]. Adolescents with callous-unemotional traits may display an increased tolerance for risk and may be less responsive to punishment [33]. Pihet et al. [20] discuss that ICU validation studies yielded different results regarding analyses of age differences in early and late adolescents: In the study by Essau et al. [15], 15- to 16-year-olds showed higher ICU scores compared to 13- to 14-year-olds and 17- to 18-year-olds. Ciucci et al. [13] identified higher ICU scores for eighth graders compared to sixth graders (with an overall age range from 10 to 16 years), while White et al. [34] did not find age differences for detained male adolescents, and Pihet et al. [20] did not find age differences in a community and institutionalized (youth welfare or juvenile justice institutions) sample of adolescents.

Current study

Our study aims to investigate the measurement invariance of the Inventory of Callous-Unemotional Traits (ICU) across preschool to late adolescence. By assessing measurement invariance, we aim to determine if the ICU can reliably measure CU traits across these age groups. Group validity analysis will assess whether the ICU consistently measures as intended across different age groups, ensuring fair assessments [35]. We’ll specifically examine the ICU’s three-factor structure, as previous research and the author [12] suggests its suitability. Analyzing this structure across different age groups will help determine its validity and reliability throughout childhood and adolescence, anticipating successful replication with a German sample.

Method

Participants and procedure

The data collection stems from larger projects led by the authors with the central purpose of systematically analyzing emotional, social, and behavioral development in children and adolescence. The ICU was assessed as part of a larger battery of tests. We report only those instruments and data relevant to the current research questions. The data presented are quantitative cross-sectional data.

Informed consent was obtained from the relevant daycare center management, the school board, and the principals of the participating schools. To recruit the sample, northern German schools and daycare centers were contacted or called and informed about the study. If the schools and daycare centers were interested in participating, information flyers and consent forms were distributed to the participating children and adolescents, guardians, and preschool teachers. In addition, a positive vote from the relevant Institutional Review Board was available. The data were collected between 2016 and 2021. All participants and their guardians were informed about the study, the voluntary nature of participation, and the confidentiality of their data and gave their active written consent to participate. Participants (children and teachers) were told that they can withdraw from the study at any time or skip any questions that they do not want to answer. No incentives were offered for participation.

A total of N = 2368 children and adolescents (51.5% male) with an average age of M = 11.76 years (SD = 3.72, Min = 5, Max = 19) took part in the study. Table 1 shows the demographic variables by age group.

Table 1 Demographic variables

Instruments

To measure callous-unemotional traits in childhood and adolescence, we utilized the German version of the Inventory of Callous-Unemotional Traits (ICU; [12]; German version by Essau et al. [15]). The ICU consists of 24 items (e.g., not caring to hurt someone; 11 items), uncaring (e.g., trying to do the best (reverse scoring); 8 items), and unemotionality (e.g., not showing emotions; 5 items) and previous research in both international and German samples have shown a three-factor structure (callousness, uncaring, unemotional) [e.g., 4, 15, 22, 36]. The items of the ICU were rated on a four-point scale ranging from (1) not at all true to (4) definitely true, with items requiring reverse scoring being recoded. Thus, higher scores on each dimension indicate higher levels of callousness, uncaring, and unemotionality. It is important to note that we assessed callous-unemotional traits through different sources of assessment, including preschool teacher-report and self-report measures. Therefore, slightly different versions of the ICU were used, and some items were formulated slightly differently while maintaining the same meaning. For preschool children in group one (N = 498), an external report from preschool teachers was chosen to assess CU traits, while for children and adolescents in the other three age groups (middle childhood, early and late adolescence), a self-report was used. Studies on callous-unemotional traits in middle childhood incorporate self-reports [e.g., 13] and other-reports [e.g., 30].We assume that we will arrive at more valid answers if we let the children answer the items themselves at an older age.

Data analytic procedure

Confirmatory factor analysis

In the statistical analyses, the factor structure of the ICU is first examined for each age group. The subsamples comprise N = 498 for preschool aged children, N = 631 for middle childhood, N = 646 for early adolescence, and N = 593 for late adolescence. Confirmatory factor analyses are therefore performed for each of the four groups individually (preschool age, middle childhood, early adolescence, and adolescence). For the confirmatory testing, the model structure of the ICU is examined as specified by Frick [12]. A three-factor model with the correlated factors callousness, uncaring and unemotional is assumed. Due to its statistical weakness (low factor loading), item 10, “I do not let my feelings control me” is excluded from the analyses, as it has also been handled in previous research [e.g., 2]. Error correlations are allowed for items with similar content statements (15*23, 4*17, 8r*17, 11*20, 3*23, 16*17, 1*19, 8*24, 12*6, 12*22, 6*19, 11*3, 8*21, 6*22, 21*17, 5*18, 8*5, 4*16, 8*16, 12*14, 2*9, 2*8, 2*23, 18*20, 2*24, 8r*1, 17*1, 21*16, 3*15, 13*14, 9*24, 23*24, 11*24, 4*8, 13*16, 19*22, 20*23, 20*15, 3*20, 9*6, 11*12, 9*15, 1*12, 9*5, 12*19, 17*24, 8*9, 3*16, 5*22, 9*21, 6*7, 8*19, 8*15, 7*24, 1*22). The Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI), and the Tucker-Lewis Index (TLI) are used to evaluate the model quality. RMSEA values < 0.08 and CFI and TLI values > 0.90 represent a good model fit [37]. In addition, Chi2 values are reported; however, the sensitivity of Chi2 for larger samples (N > 200) needs to be considered [38]. The Standardized Root Mean Squared Residual (SRMR) is not calculated due to missing values. Data with completely missing values are excluded from the analyses in advance (N = 124). Individual missing values are estimated within the model estimation using Full Information Maximum Likelihood (FIML; [39, 40]). Individual missing data exists for less than 9% of the sample. FIML provides reliable estimates even if the data is not normally distributed [41] or when estimating ordinal data [42]. To reduce the probability of alpha-errors, p-values are FDR-corrected (false discovery rate; [43]), which represents a liberal method in multiple testing in structural equation modeling [44]. The reliability was assessed using Cronbach’s alpha for the best fitting model for the different age groups.

Measurement invariance

Confirmatory multi-group factor analyses (MGCFA) are used to test for measurement invariance (MI) in the ICU for all four age groups. The sample size of the four groups for measurement invariance testing is between N = 498 for preschool children and N = 646 for early adolescents which exceeds the minimum sample size of N = 200 per group and is therefore sufficiently large [45]. According to Koh and Zumbo [45], different group sizes up to a ratio of 200:800 are not an obstacle for the test of measurement invariance. If the measurement model shows an acceptable fit in all groups, MI is tested at three different levels successively: configural, metric, and scalar [46]. The levels of MI are distinguished by varying model restrictions, which depend on whether factor loadings, intercepts, and residual variance are equivalent across groups [47]. To achieve configural measurement (first level of MI) invariance, only the factor structure in all groups must be equivalent, which means that the construct of CU traits being assessed by using the ICU-Questionnaire has a similar structure across all four groups The next level of metric measurement invariance is characterized by additional identical factor loadings across groups, implying that that each observed indicator (item) refers with similar strength to the latent construct being assessed. To achieve scalar invariance (the third level of MI), the factor structure has a similar structure across groups, and factor loadings and intercepts are constrained to be equal. Achieving scalar MI indicates that an identical observed score refer to a similar true score across groups [47]. In order to subsequently interpret group means, the residual variances must also be identical [47]. To decide whether measurement invariance is present for the ICU in each age group, changes in Root Mean Squared Error of Approximation (∆RMSEA) along with changes in Comparative Fit Index (∆CFI) and Tucker-Lewis Index (∆TLI) are observed. According to Chen [48], a model represents the data structure equally well in the different groups if the CFI does not decrease by more than 0.01 and the RMSEA does not increase by more than 0.015 from the configural to the metric model and from the metric model to the scalar model, respectively (and from the scalar to residual invariance). Full Information Maximum Likelihood was used for model estimation [39, 40]. All analyses are conducted using STATA 18.

Results

Prior to confirmatory testing of the ICU structure and measurement invariance, descriptive statistics and intercorrelations are calculated and presented in Table 2 and Table S 1. Item intercorrelations of each proposed factor are significant (with one exception for item 2 and item 8).

Table 2 Descriptive Statistics of the ICU items for each age group

Confirmatory factor analysis

For confirmatory testing of the ICU model structure, a three-factor model with the latent factors callousness, uncaring, and unemotional is tested. The goal is to find a model that represented the ICU structure equally well in all four groups (preschool age, middle childhood, early adolescence and late adolescence). Figure 1 illustrates the measurement model, which represents a good model fit across groups (Table 3). Table 4 summarizes the factor loadings and intercepts for each age group. All indicator items load significantly on each factor (with one exception for item 22 in middle childhood). The reliability was assessed using Cronbach’s alpha for the different age groups. Adequate values were found and presented in Table 3.

Fig. 1
figure 1

Measurement model of the ICU. Note Item labels in Table 2; r = reverse-coded items; error correlations are not displayed

Table 3 Model fit indices of the ICU measurement model for each age group
Table 4 Factor loadings and intercepts of the measurement model of the ICU for each age group

Measurement invariance

The model fit indices allow the assumption of configural measurement invariance of the ICU across the age groups (RMSEA < 0.08; CFI and TLI > 0.90). In comparing the configural and the metric model, the RMSEA suggests metric invariance (∆RMSEA ≤ 0.015), while the CFI and TLI indicate only configural invariance (∆CFI ≥ 0.010; ∆TLI ≥ 0.010). In the next step, scalar invariance is therefore analyzed. The model fit comparison shows clearly worse RMSEA (∆RMSEA ≥ 0.015) as well as worse CFI and TLI values (∆CFI ≥ 0.010; ∆TLI ≥ 0.010), so that the results do not confirm the presence of a scalar measurement invariance. When comparing the differences in the fit indices of the configural and metric models, the configural model is chosen after a conservative decision [48]. Complete measurement invariance of the ICU across the four different age groups could not be achieved, results only indicate configural measurement invariance. The values of the measurement invariance testing are presented in Table 5.

Table 5 Measurement invariance of the ICU across age groups

Discussion

Our study’s aim was to assess whether the ICU [12] measures CU traits consistently across different age groups from preschool to late adolescence. The results indicate configural measurement invariance. Accordingly, the construct of CU traits measured with the ICU has a similar structure across age groups. However, the latent and manifest variables have different meanings in the groups, and the parameters and mean values differ [47]. The results suggest that individual items are understood differently by different age groups. In other words, items are interpreted differently by preschool teachers, middle-aged children, and young and older adolescents. The results indicate that the ICU cannot be interpreted uniformly for children and adolescents of different age groups. Especially when looking at the different intercepts of the items of the unemotional factor, it becomes clear that there are large differences between the age groups here (e.g., “not showing emotions; item 6”). But, also, the intercepts for items of the callousness and uncaring factors differ strongly among the age groups for some items. For example, there are differences in the understanding of the item “feeling bad or guilty when doing something wrong; item 5” of the uncaring factor. Even though studies highlighted moderate stability for CU traits [4, 24], children may differ in the developmental precursor skills that are associated with CU traits. It is possible that younger children lack competencies on a cognitive or social-emotional level. Item 5 for example describes empathy or the ability to take the perspective of others, which may not be fully developed in younger children. Factor analytic studies for children (e.g., in middle childhood, Hawes et al. [5] and in preschoolers, Zumbach et al. [18]), found a best fitting factor structure that excluded most unemotional items. Similar, Kimonis et al. [49] discuss that the unemotional item set may need refining for young children. The situation is similar, for example, for the callousness factor item “concerned about the feelings of others; item 8”. Different meanings of items may be due to different stages of the development of children in different age groups, from preschool age to late adolescence. Possibly, the items may reflect developmental phenomena in younger children [cf. 18], transitioning to expressions of CU traits in older adolescence.

Implications

As just indicated, according to our findings, the ICU cannot be interpreted identically to children and adolescents of different age groups. This highlights the need for a more differentiated assessment. Frick et al. [14] already provided indications that the instrument for capturing CU traits is not yet fully exhaustive. Our results show that the ICU exhibits similarly good reliabilities across age groups and that the factor structure can also be replicated. However, we need to better understand CU traits, especially at young ages, by following and considering developmental trajectories from early preschool age to adolescence. So far, the use of the same instrument for all samples, from early childhood to adolescence, has assumed that all children and adolescents have the same developmental prerequisites for understanding and answering the items presented.

Many items in the ICU refer to competencies that are closely linked to social-emotional and partly cognitive development. However, these competencies are far from complete in childhood, so difficulties can arise in differentiating between CU traits and social-emotional developmental deficits. For example, items focusing on remorse, such as apologizing when hurting others, or attempting to make amends, are closely linked to feelings of shame and guilt. Shame and guilt are described as intrapersonal emotions and are considered complex emotions that are formed in childhood [50, 51]. Children are even not able to verbally distinguish between guilt and shame until they are about 10–11 years old [52]. Younger children may still have difficulties experiencing these emotions or are only just beginning to experience them.

In this context, cognitive development also plays an important role [53, 54]. Shame and guilt can only be felt if children understand that they may have evoked negative emotions in others. Therefore, it is difficult to assess whether items such as “not feeling remorseful when doing something wrong; item 18” or “feeling bad or guilty when doing something wrong; item 5” really capture CU traits in younger samples. By adolescence, children are able to anticipate feelings such as guilt and shame [55]. At this age, it becomes possible to distinguish whether a behavior is exhibited or not because a child is developmentally unable to do so or because CU traits are actually present.

Similar difficulties are evident with ICU items intended to capture not showing emotions, such as “not showing emotions to others; item 6.” In middle childhood, children increasingly prefer mental strategies for emotion regulation, such as distancing, to regulate their anger [56]. This may possibly be an alternative explanation for CU traits with high values on corresponding items. Younger children may still be in the process of developing emotional competence in general and therefore may show deficits in responding to the items.

This study points out possible misinterpretation of CU traits in younger children if developmental factors are not taken into account. The incorrect classification of normative developmental stages of children as pathological can lead to inappropriate interventions. The findings underscore the importance of considering age-related differences in emotional and cognitive development to avoid unintentional pathologization of typical behaviors. Early intervention strategies should therefore acknowledge the dynamic nature of CU traits during early childhood. By incorporating insights from developmental psychology, assessments can better account for age-specific variations in emotional and cognitive development. Before drawing conclusions on children’s CU traits, the emotional and cognitive precursor abilities of children should be examined.

Therefore, our study highlights the need for a differentiated instrument to capture CU traits that is able to distinguish CU behaviors from deviating developmental steps, especially in young samples.

Limitations and further research

In addition to our new findings, our study’s limitations also need to be mentioned. For the present study, it should be noted that the wording of the items of the ICU differed quite slightly for the preschool teacher-reports and self-reports for children in middle childhood and adolescents. This is accompanied by a possible deviation between the raters. Whereas in middle childhood and adolescence, the children themselves were the raters, for preschool-aged children, their teachers were asked to rate the ICU. At this point, it should be noted that preschool children are not yet able to answer the ICU questions independently. While research on callous-unemotional traits in middle childhood is also incorporating self-reports [e.g., 13, 57,58,59], the question remains regarding children’s comprehension of the underlying concepts. However, we assume that we will arrive at more valid answers if we let the children answer the items themselves at an older age. Separately testing the factor structures for rater-based and self-report versions might introduce methodological variability, potentially diverting focus from the study’s primary aim: exploring age-related differences in interpreting CU traits. Our overarching goal is to examine measurement invariance of the ICU across the entire span of childhood and adolescence. However, as no self-report version of the ICU is available for preschool children, the only way to achieve the study objective was to use a combination of rater based and self-report assessments. Additonally, Wang et al. [60] identified cross-informants (self-report, parent-report, and teacher-report) invariance. Moreover, younger children may possess limited introspective capacities, often relying more on external observations by parents or teachers. In contrast, adolescents typically demonstrate a more nuanced understanding of their own internal emotional experiences. However, to ensure a more precise examination of the measurement invariance of the ICU, it is pertinent to utilize consistent rater sources. Futher studies could use the option of using teacher and/or parent reports for all age groups.

The field of research on the ICU instrument is ambiguous. Existing studies were able to confirm different factor structures of the ICU for varying samples [5, 13, 16]. For our study’s approach, we have chosen the three-factor solution because it is suggested by the author of the ICU [12]. In selecting the three-factor model as the primary focus, the study aims to enhance the overall comparability of findings across different age groups [e.g., 2, 15]. There are a variety of alternative models [e.g. a two-factor model, including a callousness and an uncaring factor following the procedure of Hawes et al. [5] or a two-factor model with the factors callous-unemotional and empathic-prosocial derived from Willoughby et al. [17]. However, testing alternative models for each age group could introduce complexity and hinder the ability to draw meaningful cross-age comparisons. By adhering to the established structure, the research strives for a unified framework that facilitates a more comprehensive understanding of callous-unemotional traits across the developmental spectrum. In further research, however, alternative ICU models could certainly also be tested for measurement invariance.

In addition, we used error terms correlations. However, if items have similar meanings or refer to similar concepts, this can lead to an increased likelihood of correlations between the residuals. In such cases, modeling correlated errors can help to better reflect the factor structure and reduce potential biases [61].

Conclusion

In conclusion, our study assessed the ICU across different age groups (preschool, middle childhood, early, and late adolescence) and identified consistent structural patterns but varying interpretations of individual items. This highlights the need for a more differentiated assessment, as items could be interpreted differently across the age groups. The study challenges the assumption of uniform developmental prerequisites for understanding ICU items. Our findings underscore the difficulty in distinguishing callous-unemotional traits from normal developmental stages, especially in younger children. Nevertheless, future research is needed and should explore measurement invariance for alternative ICU models to enhance understanding and measurement across diverse populations and developmental stages.