Introduction

The preschool period is characterized by children’s rapid development, making it an important window for shaping healthy social, emotional, and cognitive functioning throughout their lifespans, as well as identifying and preventing mental health problems and intervening before problems evolve into disorders (Dougherty et al., 2015; Pool & Hourcade, 2011; Poulou, 2015; Wakschlag et al., 2019; Zaim and Harrison, 2020). Worldwide prevalence estimates indicate that every fifth child experiences mental health problems, and 13% of children meet the criteria for a psychiatric disorder (Belfer, 2008; Polanczyk et al., 2015). Additionally, comorbidity is common, as one third of children aged 1 to 7 years old who meet the criteria for a psychiatric disorder also meet the criteria for at least one additional psychiatric disorder (Vasileva et al., 2021). For some children the problems are transient, while for others the problems are more enduring, where children may struggle to return to a normal developmental trajectory (Blok et al., 2022; Powell et al., 2006). Approximately half the preschoolers with a psychiatric diagnosis still meet the criteria for a psychiatric disorder in middle childhood or early adolescence (Finsaas et al., 2018a). Moreover, psychopathology in early childhood predicts long-lasting deleterious effects on children’s functioning, even in children with subthreshold psychiatric problems (Finsaas et al., 2018b).

Mental health problems in preschool children remain under-identified, under-referred, and under-treated, even though they show the same prevalence estimates as older children (Bufferd et al., 2012; Egger & Angold, 2006; Horwitz et al., 2003, 2007; Wichstrøm et al., 2012). Identifying preschool children with mental health problems can be challenging due to the rapid development that occurs in this period (Keenan et al., 1998), where some behaviors may be considered appropriate at one age or in one context and inappropriate at another age or in another context. Many children attend early childhood education and care (ECEC) centers, where they spend prolonged periods with adults outside the family, therefore, the ECEC context represents a promising arena for early identification of and intervention for mental health problems. Thus, attention should be directed towards ECEC professionals’ perceptions of problem behaviors among children (Poulou, 2015). However, attempts by ECEC professionals to identify problem behaviors without the use of standardized instruments can lead to misclassification. Such misclassification may hinder children who need support from getting help or unnecessarily encumber those who do not need help (Stensen et al., 2022a). Developing psychometrically valid identification procedures and instruments to ensure that children in need of help receive that help is essential for their healthy development (Feeney-Kettler et al., 2010; Ivanova et al., 2011).

The Caregiver-Teacher Report Form (C-TRF) (Achenbach & Rescorla, 2000) is a frequently used instrument to gather information on psychopathology in preschool children from other caregivers than parents, such as ECEC professionals, and is applied by both researchers and clinicians. The C-TRF and its parent-reported equal, the Child Behavior Checklist, have shown promising psychometric properties and are often used as “gold standards” for testing other assessment or screening instruments against (Lavigne et al., 2016). The six factors constituting the C-TRF, measuring Emotionally reactive, Anxious/depressed, Withdrawn, Somatic complaints, Attention problems, and Aggressive behavior, have found psychometric support regarding factorial validity and internal consistency in several countries outside the United States where the instrument was developed (Ivanova et al., 2011). However, factorial issues with the Emotionally reactive and Anxious/depressed factors have also been reported due to the high correlation between these factors, and the factor Somatic complaints has shown sub-optimal psychometric properties (Ivanova et al., 2011; Stensen et al., 2022b).

Measurement invariance (MI) is an important psychometric property of any instrument and a requirement for meaningful comparisons of latent means between groups (such as children’s sex and age) (Putnick & Bornstein, 2016; van de Schoot et al., 2012). In other words, MI relates to how raters interpret items. If raters attribute different meanings to the same items across groups, it may jeopardize the validity of the instrument when comparing the latent means because the factors hold different meanings. For instance, it has been reported from studies using the C-TRF that boys are scored significantly higher than girls on externalizing problems (Attention problems and Aggressive behavior), while younger preschool children are scored significantly higher on both internalizing (Emotionally reactive, Anxious/depressed, Withdrawn, Somatic complaints) and externalizing problems than older preschool children (Marković et al., 2016; Rescorla et al., 2012). However, without information of the instrument’s MI it is uncertain if the C-TRF measures the same latent factors based on children’s sex and age or if it measures more or less similar factors due to informant’s interpretations of the items, making comparison of latent means more or less appropriate. For example, if higher aggressive behavior scores for younger preschool children compared to older preschool children are caused by the ECEC professionals attributing other behavioral expressions, such as frustration and anger, as aggressive behavior to a larger degree in younger children compared to older children, the latent means may not be directly comparable as aggressive behavior holds different meanings based on children’s age. Consequently, an awareness of invariant and non-invariant items is important for an instrument’s interpretability and validity.

The MI of the C-TRF has not yet been investigated. Thus, the aim of the current study was to examine the MI of the C-TRF in the context of Norwegian early childhood education and care, which may inform how ECEC professionals interpret the items included in the instrument. More specifically, based on previous studies (Marković et al., 2016; Rescorla et al., 2012) reporting significant sex and age differences in C-TRF scores, the current study examined the MI of the C-TRF for (1) girls vs. boys, and (2) for children below the age of three years old vs. three years or older as it is important for the interpretability of the instrument’s reported mean scores. As knowledge about the psychometric properties of assessment instruments applied in Nordic countries is scarce or lacking, especially for very young children (Peltonen et al., 2022), efforts to inform researchers and clinicians about the applicability and accuracy of instruments are important.

Methods

This study is based on combined baseline data from two ECEC projects conducted in central and south eastern Norway. Data from the project Children in Central Norway were collected from 2012 to 2014 and data from Thrive by Three were collected during 2018.

Procedure and Participants

Children in Central Norway

The parents of children enrolled in ECEC centers in three municipalities in central Norway received information regarding the project via letters and parent meetings before the project commenced. The letters provided parents with the option to enroll their child in the project either by returning a signed consent form to their ECEC center or by consenting digitally with their unique invitation code to the project’s online survey. Parental consent was obtained to allow the ECEC professionals who knew the child best to complete the online survey regarding the child. ECEC professional provided written consent using their own unique invitation code for the online survey. A total of 1,631 (77%) of the invited parents consented to enroll their child in the project, and 169 ECEC professionals reported on 1,430 children between 1 and 6 years of age (mean age = 44 months; 51% boys).

Thrive by Three

Data was obtained from Thrive by Three for 1,474 children (mean age = 21 months; 51% boys) and 184 units/groups from 78 ECEC centers from seven municipalities or city parts from central and south eastern Norway. An ECEC professional within the unit/group who knew the child best answered the C-TRF. When the sample from Thrive by Three was combined with the sample from Children in Central Norway, the total sample for this study included 2,904 children (mean age = 33 months; 51% boys) and 353 ECEC professionals. On average, ECEC professionals reported on 8.2 children each. For both projects, participation was voluntary and consent could be withdrawn without reprisal until the participation registry was anonymized.

Measurements

The C-TRF

ECEC professionals completed the C-TRF (Achenbach & Rescorla, 2000), which contains 99 items and one open-ended item describing symptoms of mental health problems in children aged from 1.5 to 5 years old. Each item has three response options: 0 = “not true (as far as you know)”, 1 = “somewhat or sometimes true” and 2 = “very often or often true”. The scores can be added up to obtain a Total problem score, ranging from 0 to 200, where a higher score indicates a higher symptom load. The C-TRF contains the following six latent factors: Emotionally reactive (seven items), Anxious/depressed (eight items), Withdrawn (10 items), Somatic complaints (seven items), Attention problems (nine items), and Aggressive behavior (25 items). The 66 items constituting these six factors were the focus of the current study, whereas the remaining 34 items constituting Other problems were excluded as they do not form a latent factor in the C-TRF.

Statistical Analyses

ECEC children were dichotomized by age into younger ECEC children (younger than 36 months) and older (36 months and older) children to reflect the organizational structure of ECEC centers in Norway. First, as children were nested within ECEC professionals, the internal consistency of the six C-TRF factors for the full sample was investigated to obtain the multilevel omega (ω) coefficient. Unless reliability is identical at each level of analysis, reporting only single-level estimates does not reflect the scales actual internal consistency (Geldhof et al., 2014). The ω coefficient was preferred over Cronbach’s alpha because the latter depends on rather strict assumptions, such as tau-equivalence and normally distributed scores, which can lead to biased estimates if violated (Dunn et al., 2014; McNeish, 2018; Peters, 2014; Sijtsma, 2009; Yang & Green, 2011). The multilevel ω with a 95% confidence interval was computed with the package “multilevelTools” (Wiley, 2022) in RStudio. The multilevel ω estimates are interpreted in the same way as the alpha, where estimates ≥ 0.70 are considered to indicate satisfactory internal consistency (Taber, 2018).

We then investigated the MI of the C-TRF using a series of multilevel multigroup confirmatory factor analyses (MGCFAs) grouped by (1) girls vs. boys, and (2) for children below the age of three years old vs. three years or older. The MGCFAs are carried out to determine whether the respondents attribute the same meaning to the latent factors and whether the means and scores can be interpreted similarly across groups (van de Schoot et al., 2012). This is accomplished by investigating the model fit indices while adding additional constraints to the models that follow a hierarchical structure ranging from configural (weak invariance) to scalar (strong invariance).

  1. 1.

    Configural invariance (equal factor structure across groups).

  2. 2.

    Metric invariance (equal factor loadings across groups).

  3. 3.

    Scalar invariance (equal thresholds across groups as the variables are ordered categorical).

Before examining the MI of the C-TRF, 10 items were removed from the analyses as Mplus warned that the variable variance/covariance matrix was non-positive definite. The warning possibly arose due to out-of-range parameters (i.e., negative variance or residual variance for a latent variable, a correlation ≥ 1.0 between two latent variables), and a linear dependency between more than two latent variables (Muthén & Muthén, 2017) (Fig. 1). Nine items were excluded due to empty cells in the bivariate frequency tables when these items were combined with certain other variables, leading to a correlation of 1. This problem concerned the factor Somatic complaints and the items “cruel to animals” (Aggressive behavior), and “nervous movements or twitching” (Emotionally reactive). In addition, the item “worries” (Emotionally reactive) loaded on several factors and was excluded from further analyses. When rerunning the analyses without these 10 items, the warning regarding non-positive definiteness was resolved.

Fig. 1
figure 1

Test model for multigroup confirmatory factor analyses

Note: An item overview can be found in Achenbach and Rescorla (2000)

Step 1 in testing the measurement invariance involved investigating the C-TRF-adapted five-factor baseline model across children’s sex and then across age (configural variance), where all parameters could vary freely. Step 2 tested a model in which the factor loadings were constrained between groups, while the thresholds could vary freely (metric variance). In Step 3, a model was tested in which both loadings and thresholds were constrained to be equal between groups (scalar invariance) was tested. Configural invariance was determined if the five-factor model showed a good fit across the groups tested. Metric invariance is established if the more constrained model still shows a good fit compared to the configural baseline model, whereas scalar invariance is established if the even more constrained model still shows a good fit compared to the metric invariance model (Hirschfeld & von Brachel, 2014).

The model fit was evaluated by inspecting the root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker-Lewis index (TLI). RMSEA values of ≤ 0.05 indicate a good fit and values between 0.05 and 0.10 indicate an acceptable fit (MacCallum et al., 1996). For the CFI and TLI, values of ≥ 0.95 are commonly used to indicate a good model fit (Hu & Bentler, 1999), however, Browne and Cudeck (1993) argue that these thresholds are too strict and recommend a threshold of > 0.90 to determine a good model fit and values of 0.80 to 0.90 determine an acceptable model fit. To evaluate invariance, Cheung and Rensvold (2002) recommended that a CFI reduction of ≤ 0.01 when adding additional constraints to the model indicate that the null hypothesis of invariance should not be rejected. The CFI difference between models was preferred as an indicator of invariance, as it is less sensitive to sample size and more sensitive to lack of variance than chi-square (χ2) statistics (Meade et al., 2008). Multilevel MGCFA analyses were performed with Mplus v.8.4 (Muthén & Muthén, 2017) using the weighted least square mean variance (WLSMV) estimator to account for the non-normal distribution of data. The WLSMV estimator is appropriate for ordered categorical data and produces accurate parameter estimates (DiStefano & Morgan, 2014).

Lastly, if scalar invariance was not found, an inspection of the modification indices (χ2) to locate non-invariant items was carried out. We then relaxed the constraints for the non-variant items one by one, starting with the item with the greatest expected parameter change (EPC), to determine if this improved the model fit. If the less constrained scalar invariant model showed a CFI estimate within the threshold of ≤ 0.01, compared to the metric model, partial scalar invariance was observed. No data were missing.

Results

The internal consistency of the C-TRF factors was highly consistent among ECEC professionals, while more variation was found at the within level (Table 1). Emotionally reactive almost reached the threshold of ≥ 0.70 for satisfactory internal consistency, while the upper confidence interval of the Anxious/depressed factor surpassed this threshold. Withdrawn, Attention problems, and Aggressive behavior were above the threshold of ≥ 0.70, with the highest estimate found for Aggressive behavior (0.92 at the within level). The lowest internal consistency was found at the within level for Somatic complaints (0.15).

Table 1 The multilevel omega coefficient with 95% confidence interval for the factors in the Caregiver-Teacher Report Form with all the 66 items included

Results from the multilevel MGCFA using the adapted five-factor structure indicated good model fit when the model fit indices were inspected (Table 2). The TLI estimate for the children’s sex group (0.897) was an exception as it fell within the acceptable model fit range, slightly below the threshold of > 0.90 for a good fit. The investigation of the measurement invariance of the C-TRF exhibited full scalar invariance for both sex and age (Table 3), as adding additional constraints to the models did not result in a significant drop (≤ 0.01) in the CFI, thus, indicating that the latent means could be meaningfully compared. Because full scalar invariance was found, further investigation to locate non-invariant items was unnecessary.

Table 2 Baseline (configural) model fit indices grouped by the children’s sex and age
Table 3 Measurement invariance indices of the Caregiver-Teacher Report Form using the comparative fit index estimates grouped by children’s sex and age

Discussion

The current study aimed to examine the MI of the C-TRF across children’s sex and ages, which is an important psychometric property for meaningful comparison of latent means across groups. Previous studies using this instrument have reported significant age and sex differences (Marković et al., 2016; Rescorla et al., 2012); therefore, the findings from the current study can inform whether the C-TRF mean scores are based on actual differences or on different perceptions of the same items across groups. The results of this study indicate that the factor Somatic complaints is problematic both in terms internal consistency and out-of-range parameters due to many zero cells when it is combined with other items. Consequently, this factor, along with three other items, were excluded. The adapted five-factor model showed full scalar invariance across children’s sex and also across age. Thus, ECEC professionals attributed the same meaning to these latent factors independent of children’s sex and age and the latent means were comparable across these groups.

The internal consistency of the six latent factors in the current sample showed estimates similar to the alpha and omega coefficients reported previously (Rescorla et al., 2012; Stensen et al., 2022b). That said, the lower mean age of children in the current study may have impacted the internal consistency for Somatic complaints compared toRescorla et al. (2012); Stensen et al. (2022b). The estimates from Table 1 indicate that this factor is consistently inconsistent between ECEC professionals (omega coefficient of 0.15 within and 0.96 between), suggesting that the applicability of this factor is highly questionable and removing it from the instrument or aggregated scores used for research and clinical purposes should be considered. Given the limited capacity that younger children have to communicate, ECEC professionals may have problems distinguishing somatic problems due to psychogenic causes from actual illnesses. For instance, an ECEC professional may attribute the inability of a child with stomachache to sit still to attention rather than to somatic problems. In addition to the Somatic complaints factor, three items were problematic. As with Somatic complaints, ECEC professionals may have difficulty capturing for ECEC very young children’s concerns because of the limited communicative capacity of these children. The item “worries” loaded on more than one latent factor and a previous study (Rescorla et al., 2012) has shown that the factors Emotionally reactive and Anxious/depressed are highly correlated. Consequently, ECEC professionals may misattribute this type of behavior. “Cruelty to animals” may be explained by low prevalence and lack of opportunity for ECEC professionals’ to observe such behavior, as animals (e.g., pets) are rarely found in the ECEC setting. Thus, it may be advisable to ask parents for information regarding this type of behavior as it may be more prevalent in the home environment. The last item, “nervous movements or twitching” may also be challenging for ECEC professionals to capture, as the motor skills of children develop rapidly in the preschool period. ECEC professionals may consider this type of behavior to be normal to motor problems rather than internalizing problems.

The adapted five-structure model showed full scalar invariance across both children’s sex and age. This indicates that the factors of the C-TRF capture the same meaning regardless of the child is 1.5 or 5 years old, even though some behaviors are considered more normative than others at a given age. ECEC professionals may operate with different thresholds for what they perceive as normative behavior regarding children’s age and sex (Stensen et al., 2021). Findings from the current study suggest that sex and age differences reported using the C-TRF (Marković et al., 2016; Rescorla et al., 2012) may reflect actual sex and age differences as perceived by ECEC professionals, as the latent means carried the same meaning across the groups tested. This supports the applicability of the C-TRF for assessing psychopathology in preschool children, as both reliability and validity are important for researchers and clinicians to interpret information accurately. However, the factor Somatic complaints and the three items with out-of-range parameters warrant awareness and should be excluded or used cautiously, because they may jeopardize the integrity of the instrument. If these 10 items are chosen to be retained for clinical or scientific purposes, they should be considered included in the category “Other problems” due to their factorial issues. This way the excluded items may still be a part of the C-TRF’s total problem score.

Strengths and Limitations

The current study included two large community samples that covered the full age range of children in ECEC centers. In particular, the inclusion of very young children was a major strength, as psychometric information on assessment instruments involving this cohort are scarce or lacking. This study adds to the knowledge of the applicability of the C-TRF in the Norwegian ECEC context, and the findings can be used for cross-cultural comparison. One possible limitation was the reliance on one ECEC professional per child cluster. Even though the ECEC professional who knew a child best was informed to complete the C-TRF, we cannot know if this actually occurred. In general, children interact with several adults during the day in an ECEC center. Consequently, an adult’s perception of a child’s behaviors within the same cluster may vary. ECEC professionals who perceive their relationship with a child as conflictual are reported to rate the child as having more problem behaviors than they actually have (Berg-Nielsen et al., 2012). However, an inter-rater approach was not possible in the current study.

Conclusion

This study adds to the knowledge of the MI of the C-TRF across children’s sex and age, which has not been examined previously. The results support the applicability of the C-TRF to gather information of psychopathology in preschool children using ECEC professionals as informants. However, researchers and clinicians should be aware of the factor Somatic complaints and the three items with out-of-range parameters, as they may jeopardize the integrity of the instrument if included.