Multi-informant Assessment of Internalizing Concerns: Rater Concordance and Implications for Decision-Making

von der Embse, Nathaniel; Kim, Eunsook; Ross, Dorie; Kilgus, Stephen; Koza, Thomas

doi:10.1007/s10862-023-10026-y

Multi-informant Assessment of Internalizing Concerns: Rater Concordance and Implications for Decision-Making

Published: 01 February 2023

Volume 45, pages 234–246, (2023)
Cite this article

Download PDF

Journal of Psychopathology and Behavioral Assessment Aims and scope Submit manuscript

Multi-informant Assessment of Internalizing Concerns: Rater Concordance and Implications for Decision-Making

Download PDF

Nathaniel von der Embse ORCID: orcid.org/0000-0003-2545-7691¹,
Eunsook Kim¹,
Dorie Ross¹,
Stephen Kilgus² &
…
Thomas Koza¹

1658 Accesses
2 Citations
Explore all metrics

Abstract

Rising rates of mental health challenges among youths have become a significant concern following the COVID 19 pandemic. Although strong evidence supports the implementation of universal screening as a preventative approach to address unmet mental health concerns, the research is less clear surrounding the use of such data in decision-making processes when significant discrepancies between informants (e.g., students and teachers) exist. The purpose of the study was twofold. First, the study aimed to determine the degree of rater concordance between teachers and students on students’ internalizing concerns. The second objective was to determine whether concordance on internalizing behaviors differs across ages/grades and if this differentially impacts distal (i.e., academic) outcomes. Results indicated that teachers and students demonstrated limited agreement on ratings of internalizing behaviors. However, when students and teachers agreed, higher and more positive emotional behaviors were linked to higher reading/math performance. Furthermore, patterns of informant dis/agreement and relationships between internalizing concerns and academic outcomes were similar across grade levels. Implications and areas for future research are discussed.

Efficient Assessment of the Whole Social–Emotional Child: Using Parents to Rate SEL Competencies and Concurrent Emotional Behavior Concerns

Article 02 March 2021

The Clinical Significance of Informant Agreement in Externalizing Behavior from Age 3 to 14

Article 25 November 2017

Teacher Recognition, Concern, and Referral of Children’s Internalizing and Externalizing Behavior Problems

Article 19 December 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Over the last ten years, an increasing number of youth have indicated mental health challenges (Twenge et al., 2019), with a significant rise in rates since the onset of the COVID pandemic (Radhakrishnan et al., 2022). Schools are a consistent setting for youth, allowing for early identification of mental health symptoms and thus early intervention services. However, the overall number of youth with mental health needs has strained existing school-based supports (Weir, 2020), with many students not receiving needed or timely treatment (Anderson et al., 2019). Without a comprehensive and universal prevention system in place, many youth may have unmet mental health needs resulting in significant and negative short and long-term outcomes (Perfect et al., 2016). These challenges are particularly pronounced for adolescents with internalizing concerns who are less likely to be identified for supports than youth with externalizing behavior concerns (Kalberg et al., 2011).

Over 15% of adolescents will display symptoms of internalizing concerns, including withdrawal, depression, and anxiety, which is the most commonly reported mental health problem (Mojtabai et al., 2016). Many mental health prevalence studies (e.g., Merikangas et al., 2010) utilize data from assessment tools designed to identify psychopathology or presenting symptoms aligned to a traditional and psychological deficit focus; the shift to a modern conceptualization of mental health needs from a Dual Factor Model of Mental Health (Suldo & Shaffer, 2008) that recognizes the manifestation of psychopathology and the lack of prosocial skills would suggest a substantially larger number of youth with needs. For example, in a sample of at-risk youth, nearly 30% reported low psychopathology and low well-being (i.e., “Vulnerable”; Suldo & Shaffer, 2008) and would be unlikely to be identified with traditional assessment methods.

Given the substantial numbers of students in need, there is an increased focus on prevention and early intervention for internalizing concerns in the setting in which youth have the most consistent access to critical support, schools (Collins et al., 2019). Schools have increasingly adopted multi-tiered systems of support (MTSS), focusing on prevention science and an integrated system of care, resulting in a treatment continuum based on student needs (Evans et al., 2013). For an integrated MTSS to be successful, effective tools and decision-making processes are necessary to identify pre-symptomatology and facilitate early intervention (Lyon et al., 2016). Universal screening, from a dual-factor perspective, offers an essential advancement toward realizing the potential of a preventative system of care (von der Embse et al., 2022).

Multi-Informant Universal Screening

Best practice guidance (see Romer et al., 2020) has emerged to inform how universal screening is implemented and used to inform service delivery. Research based-guidance is essential to inform practices as there has been a substantial increase in the number of schools using screening (Bruhn et al., 2014; Auerbach et al., 2019). For example, recent research has identified the stability of risk to inform the number of screenings (Dever et al., 2018; Jenkins et al., 2021) and the unique contributing information from both teachers and students across grades (von der Embse et al., 2019). In other areas, research is notably lacking. There exists a disconnect between best practice guidance for the use of multi-informant assessment (Achenbach et al., 1987) with rater discrepancies that have been identified as potentially problematic to effective decision-making (Youngstrom et al., 2000), including and across internalizing concerns (Baumgartner et al., 2020). As such, research is needed to inform both the theoretical conceptualization of multi-informant decision-making and practical guidance for educators and clinicians alike to improve the identification and treatment of internalizing concerns.

Research is needed to inform three areas of decision-making, including (1) who is identified, (2) what type of need is determined, and (3) the type of treatment necessary (De Los Reyes & Kazdin, 2004). Regarding who is identified, youth behavior is context-dependent such that a child may display significant internalizing concerns (e.g., withdrawal) in one classroom while demonstrating age-appropriate social-emotional coping in another setting. This is particularly challenging in later adolescence as there are increased rates of anxiety and depression with substantial variability in symptomology and the inconsistency in adult access to identify changes in youth emotional functioning (e.g., students in later grades with multiple teachers, Miller et al., 2014; van der Ende et al., 2012). Given these variabilities, the need for multiple raters is pronounced when identifying internalizing concerns (Navarro et al., 2020). Internalizing concerns that begin to increase in frequency and intensity in middle school years suggests the need to incorporate student self-report, yet rater discrepancies are expected to be even more pronounced as middle and high school teachers have less access to students than do elementary teachers. Best practice guidance indicates the need to incorporate teacher and student ratings (von der Embse & De Los Reyes, 2022), with the preferred rater for student social-emotional and behavioral concerns to be the teacher in grades Kindergarten through fifth grade (elementary), the student self-report in high school, and a combination during the middle school years (Romer et al., 2020). However, consensus on how to use discrepant ratings has yet to transpire.

Multiple models have been proposed to reconcile rater differences while acknowledging that rater differences in some instances are both expected and desired (De Los Reyes, 2013). These models generally fall within two domains, including person-centered or variable-centered approaches. A person-centered approach utilizes data aggregation techniques such as latent profile (LPA) or latent class analysis (LCA) to identify the profiles of respondents (Herman et al., 2018). Research using these techniques has identified unique profiles of rater groups (e.g., student-low risk and teacher-low risk, student-high risk and teacher-low risk) that differentially predict distal outcomes (von der Embse et al., 2021). Makol and colleagues (2020) used variable-centered analyses with a Trait score approach. The researchers identified informants with an expectation of variability dependent on rating context. This method shifts the aggregation from multiple items to multiple rater scores. Results indicated superior predictability beyond a single composite score (Makol et al., 2020).

Recently, von der Embse and colleagues (2019) utilized a tri-factor model with multi-informant data. The tri-factor model has the benefits of both a variable-centered approach (e.g., score aggregation) and a person-centered approach allowing for the separation of variance from raters, constructs (e.g., internalizing, externalizing concern), and items (Bauer et al., 2013). Results indicated differences amongst raters based upon demographic characteristics of the ratee and domain (internalizing; von der Embse et al., 2019). However, there was limited evidence to support how differences would influence applied decision-making necessary to inform who is at risk, in what domain, and what treatment is needed (De Los Reyes & Kazdin, 2004). Future research would be required to confirm trifactor model structure and incorporate the heterogeneity of latent classes (e.g., differences across ratee and construct).

Kim & von der Embse (2020) extended this work through a trifactor mixture model of multi-informant universal screening. Specifically, the proposed trifactor model was examined across targets and subgroups to determine if different models would be necessary. For example, the researchers examined if the informant’s perspective was variable depending on a specific population. Results from the study indicated changes to the model (and class membership) when predictors of latent factors were incorporated. Thus, the trifactor mixture model could be useful for multi-informant decision-making by (1) delineating dis/agreement on latent factors, including common and informant perspectives, (2) determining if incongruence between informants is based upon a fundamental difference in how the raters interpret items and respond to items, and (3) identifying predictors of dis/concordance. However, additional research would be necessary to extend this work in determining how the degree of difference in rating (i.e., concordance) may be important to identifying risk. In other words, the degree of difference in ratings between student and teacher may indicate either measurement problems or unique perspectives that would inform identification and treatment decision. Determining predictors and interpretation of dis/concordance is particularly important within the internalizing domain as rater differences are often more pronounced (Miller et al., 2014; van der Ende et al., 2012).

Purpose of the Present Study

Increasing mental health concerns have increased the need for consistent guidance on aggregating potentially disparate data for internalizing problems. Previous studies have focused on inconsistencies among classes of raters (Herman et al., 2018). However, extant research demonstrating discrepancies in ratings of internalizing behaviors between teachers, parents, and students (De Los Reyes et al., 2015), it is imperative to establish the degree of concordance between raters to inform intervention and data-based decision making. The present study aims to address the following research questions: What was the agreement between student and teacher raters on internalizing concerns (i.e., emotional behaviors)? and does agreement on internalizing behaviors change across ages/grades, and does this differentially impact distal outcomes?

Method

Sample

Researchers partnered with FastBridge Learning (FBL; www.fastbridge.org), a national assessment company, to extract a de-identified national sample. Schools with at least 80% of students rated on the Social, Academic, and Emotional Behavior Risk Screener (SAEBRS) were identified for inclusion such that the tool was used across the school rather than with limited or targeted populations. All data were deemed exempt by the University Institutional Review Board. The final dataset comprised 24,094 student responses, of which less than half (47.0%; n = 11,332) were female. The sample was representative of students across all grade levels from kindergarten through twelfth, with the average student age at 11.07 (SD = 2.61). The majority of students who responded were White (40.3%; n = 9,714), followed by African American students (16.7%; n = 4,026), and Hispanic/Latino (6.9%; n = 1,657). The remaining students did not report race and/or ethnicity (30.5%; n = 7,335). A small proportion of students (4.6%; n = 1,116) received special education services relative to national averages (15%, National Center for Education Statistics, 2020). The dataset did not include identifying information on state or location, thus the national representativeness of the sample is unknown.

Measures

Social, Academic, and Emotional Behavior Risk Screener (SAEBRS)

The SAEBRS (Kilgus & von der Embse, 2014) was used in the present study and is a universal screening tool that measures risk for social, academic, and emotional behaviors from a Dual Factor Model of Mental health; items include indicators of problem behaviors as well as prosocial skills. The SAEBRS is a 19-item teacher completed rating scale, and the mySAEBRS is a 20-item student completed rating scale (von der Embse et al., 2017) based upon a 4-point Likert type rating. The SAEBRS and mySAEBRS includes three subscales: Social—6 items for SAEBRS, 7 items for mySAEBRS, Academic—6 items, and Emotional—7 items behavior subscales, as well as a total behavior scale. Items reflect both positive skills as well as problem behaviors (reverse coded) such that higher scores are indicative of better functioning. The Emotional behavior subscale (item key words listed in Table 1) includes items that measure internalizing behavior problems as well as social-emotional skills. The Academic behavior subscale includes items that reflect academic enabler (DiPerna, 2006) skills as well as behaviors that prevent benefit from a high quality instructional environment. Social behavior subscale includes items that assess social skills as well as externalizing behavior problems. There have been nearly twenty published studies to support the psychometric properties of the SAEBRS (e.g., Kilgus et al., 2016; Kilgus et al., 2015; Kilgus et al., 2016; von der Embse et al., 2016). The emotional behavior (EB) subscale was of primary interest in the present investigation, given the variability in the rating of internalizing concerns across raters. Cronbach’s alpha and composite reliability omega for the EB subscales are 0.61 and 0.63 for student ratings and 0.83 and 0.84 for teacher ratings with the sample in this study.

Table 1 Descriptive Statistics

Full size table

Reading and Math

Previous research has provided evidence to suggest that emotional and behavioral functioning can result in poor academic outcomes (Eklund et al., 2017). Adaptive math (aMath) and adaptive reading (aReading) scores were evaluated to measure student reading and math performance. The aMath and aReading measures are adaptive computerized assessments and based on item response theory (IRT). The aMath assessment is a test of broad math proficiency for students in kindergarten through 12th grade. The difficulty of items and the proportion of items in each category are based on student performance throughout the assessment. The FastBridge Learning technical manual revealed good reliability (0.96), validity (0.66-0.84), sensitivity (0.84-0.95), and specificity (0.80 − 1.00) for students in first through fifth grade (Christ et al., 2014). The aReading assessment is a test of broad reading proficiency for students in kindergarten through 12th grade. Items indicate phonological and phonemic awareness, vocabulary, and reading comprehension. The FastBridge Learning technical manual revealed strong internal consistency (0.95), strong criterion-related validity (0.71-0.76), and specificity (0.71.93) (Christ et al., 2014).

Data Analysis

Data analysis proceeded in two phases. First, a trifactor model was used to examine the degree of agreement between students and teachers in emotional behavior ratings and the predictability of the agreement and disagreement on the outcome (reading and math achievement). For the degree of agreement, we utilized explained common variance that is estimated from the trifactor model. To address the second research aim, a multigroup trifactor analysis was employed. First, findings from prior measurement invariance studies were confirmed by replicating the trifactor model across grades. Based on the model results, the predictability of rater agreement on a distal outcome across grades was evaluated. Lastly, a polynomial regression and response surface analysis were used, and preliminary analyses to assess the relationship between agreement and distal outcome.

Trifactor Model

A trifactor model was proposed specifically for multiple informant data to decompose the variance into common variance between informants and specific variances unique to informants and item-specific variances (Bauer et al., 2013; von der Embse et al., 2019). The trifactor model utilizes a confirmatory factor model in which three distinctive factors are specified with specific constraints as described below. It should also be noted that the model specification in this study was based on prior research (von der Embse et al., 2019). The trifactor model for emotional behaviors is illustrated in Fig. 1. The unidimensionality of the observed variables was assumed. All variables of emotional behaviors from students and teachers loaded on the common factor. Because the common factor (emotional behavior or EB) is the agreement between informants, the factor loadings were constrained equal between informants for each variable (e.g., factor loading of item 1 for students = factor loading of item 1 for teachers). Two unique factors were specified to represent student unique and teacher unique perspectives on student emotional behaviors. Because student and teacher unique factors were assumed to be dissimilar, the student perspectives factor loadings were allowed to be different from those of teacher perspectives. Finally, although each item was supposed to measure the construct of emotional behaviors, there could be item-specific effects not explained that are modeled as an item-specific factor. Item specific factor loadings were constrained equal between two items (e.g., item 1 of students and item 1 of teachers). It was expected that item-specific factor loadings would be small and close to zero if the measure were well developed. Finally, measurement errors were estimated as usual in confirmatory factor analysis.

Indices to Evaluate Agreement and Disagreement

In the trifactor, all three types of factors (common, informant unique, and item-specific factors) are orthogonal. This unique feature of the trifactor model allows researchers to decompose the total variance of observed variables (student scores and teacher scores in tandem) into three uncorrelated sources and evaluate each factor’s unique contribution. Based on this feature, several indices can be utilized to assess the degree of agreement and disagreement between informants. We adopted statistical indices derived from bifactor models (Reise et al., 2012; Rodriguez et al., 2016). For example, the Explained Common Variance (ECV) is used to estimate the degree of unidimensionality (the degree to which scores represent a single factor rather than multiple factors or multidimensionality) in the bifactor model (Reise et al., 2010, 2013; Ten Berge & Sočan, 2004). In this study, ECV is used to evaluate the degree of agreement between informants (common factor variance as agreement variance).

$$ECV=\frac{\left(\sum {{\lambda }^{2}}_{iEB}\right)}{\left(\sum {{\lambda }^{2}}_{iEB}\right)+\left(\sum {{\lambda }^{2}}_{iStu}\right)+\left(\sum {{\lambda }^{2}}_{iTea}\right)}$$

where ${\lambda }_{i}$ denotes the standardized factor loading of item i for the corresponding factor (EB for the common factor of Emotional Behaviors, Stu for student unique perspective factor, and Tea for teacher unique perspective factor). Of note is that ECV is a model-based index (Reise et al., 2013; Rodriguez et al., 2016) by utilizing the standardized factor loadings estimated from the bifactor or trifactor models. Thus, the misspecification in the trifactor model will result in bias in the computed index (Rodriguez et al., 2016). Also, ECV would be interpretable when measurement invariance between informants holds for the common factor and thus the common factor is meaningful as agreement. In addition to ECV, when researchers intend to use the composite scores of combined teacher and student ratings, its viability can be assessed with indices such as omega hierarchical (${\omega }_{H}$; McDonald, 1999) along with omega for unique perspective factors (Reise et al., 2013).

Trifactor Model with Outcomes

After the degree of dis/agreement was estimated, an outcome variable (reading and math scores) was added to the trifactor model. Specifically, the common factor representing agreement and the two informant unique factors representing disagreement between informants are predictors of reading and math achievement scaled scores. Note that each outcome was tested separately. The regression coefficients of the common factor and the informant unique factors indicate how strong emotional behaviors are related to reading achievement (or math achievement) when informants agree about EB and when informants disagree and utilize their unique perspectives, respectively. The predictability of EB under agreement and disagreement will also be evaluated with r². Because the three predictors are orthogonal, it is possible to evaluate relative predictability by comparing r² values.

Multigroup Trifactor Model with Outcomes

To address the second research question, potential variability of dis/agreement of EB across grades and resulting predictability of the outcome variables by grade, a multigroup trifactor analysis was conducted with grade as a grouping variable. The equality of factor loadings across grades indicated no difference in EB agreement across grades. Because measurement invariance across grades was tested and established in previous research (von der Embse et al., 2019), measurement invariance is presented briefly within the results section. The multigroup trifactor model was then extended with reading or math as an outcome of the three factors under measurement invariance across grades. Next, the effects (regression coefficients) of the common factor and two informant factors on reading or math achievement for each grade were estimated. To determine whether such effects are different across grades, the model with those effects constrained equal, assuming no difference, and the model with such effects relaxed across grades was compared. To this end, a likelihood ratio test was conducted in which statistical significance indicates differences across groups. Information criteria (ICs) such as AIC, BIC, and sample-size adjusted BIC in favor of the model were compared with smaller values of ICs and evaluated the changes in model fit between models (see below for the model fit criteria). Grades 10, 11, and 12 were collapsed, given the relatively small sample.

The commonly used fit criteria: chi-square goodness of fit p-value ≥ 0.05, comparative fit index (CFI) ≥ 0.90, root mean square error of approximation (RMSEA) ≤ 0.08, and standardized root mean square residual (SRMR) ≤ 0.08 were used to evaluate model fit (Hu & Bentler, 1999) and a maximum likelihood with robust mean and variance (MLR) was used for model estimation. Mplus version 7.4 was used for data analysis, and R.4.0.3 and SPSS version 26 were used for data management, missing data analysis, and descriptive statistics. The response surface plot was generated with EXCEL (the file was downloaded from the Jeffery Edwards website^{Footnote 1}).

Supplementary Analysis: Polynomial Regression With Response Surface Plot

Although the trifactor model is advantageous for the multiple informant data analysis, because common and informant unique factors are completely separated as two orthogonal sources of score variances, it is difficult to evaluate how the degree of agreement (agree more or less) predicts the outcome. A polynomial regression (e.g., Edwards & Cable, 2009) was used to model the degree of agreement and disagreement as predictors of the outcome. The polynomial regression model was built with teacher scores of EB, student scores of EB, and their squared terms and interaction terms included as predictors of the reading achievement (and math), as shown below.

$$\widehat{read}={\beta }_{0}+{\beta }_{1}{EB}_{S}+{\beta }_{2}{EB}_{T}+{\beta }_{3}{EB}_{S}^{2}+{\beta }_{4}{EB}_{S}{EB}_{T}+{\beta }_{5}{EB}_{T}^{2}$$

where the subscripts S and T represent Student and Teachers, respectively, ${\beta }_{0}$ is the intercept, and ${\beta }_{1}$through ${\beta }_{5}$ are regression coefficients. The response surface plot based on the polynomial regression model allows researchers to visually inspect the relations of agreement and disagreement with the outcome. For example, if the outcome becomes higher as informants agree more and is maximized when the informants completely agree, it is expected that $\left({\beta }_{3}-{\beta }_{4}+{\beta }_{5}\right)<0$, and the intercept and slope of the first principal axis of the response surface are 0 and 1, respectively. For the details and demonstrations of polynomial regression and response surface analysis, refer to Edwards & Cable (2009). To estimate polynomial regression, maximum likelihood was used, and confidence intervals were created with bootstrapping of 5000 random draws.

Missing Data

It should be noted that there are missing data in the teacher ratings of student emotional behaviors, reading, and math scores (missing rates = 40%, 39%, and 31%, respectively). Although the assumption of missing completely at random (MCAR) was rejected based on the Little’s MCAR test: χ²(174) = 2121.30, p < .001, the missingness in one variable was not correlated with any other study variables (r near zero in most cases with maximum 0.14). Full information maximum likelihood (FIML) estimation method was used under the assumption of ignorable missingness. However, the supplementary analysis using polynomial regression was conducted with a subsample of the complete cases (n = 8843 for reading) because FIML was not applied to exogenous variables in Mplus.

Results

Descriptive Statistics

The descriptive statistics and correlations between variables are presented in Tables 1, 2 and 3. There was no discernable deviation of normality across variables based on the skewness and kurtosis. The correlations of the teacher rating variables were generally higher (0.27 ~ 0.63) than those of the student rating variables (-0.06 ~ 0.44). The correlations of teacher variables with reading (0.12 ~ 0.29) and math (0.11 ~ 0.29) were also higher than those of student variables (0.02 ~ 0.16 with reading; 0.02 ~ 0.12 with math). The correlations of teacher EB composite scores with reading and math were 0.22 and 0.23, respectively; those of students were 0.11 and 0.08, respectively.

Table 2 Descriptive Statistics of Emotional Behaviors Mean Composite Scores by Grade

Full size table

Table 3 Correlations of Study Variables

Full size table

The Trifactor Model and the Degree of Agreement

The fit of the trifactor model was overall acceptable: χ²(65) = 2765.221, p < .001; RMSEA = 0.041, 90% CI = [0.040, 0.043]; CFI = 0.945; SRMR = 0.051. The standardized factor loadings are presented in Fig. 1. The computed ECV, the degree of agreement, was very low (0.12), indicating students and teachers have limited agreement on internalizing behaviors. The EB common factor explained only 12% of the common variance. On the other hand, 88% of the common variance was explained by student unique factor (32%) and teacher unique factor (57%). The proportion of common variance due to the common factor was very small compared to the informant unique factors. The EB rating scores of students and teachers mostly reflect their perspectives of EB, and each individual may not be sufficient for a complete understanding of student internalizing concerns. If the composite scores with student and teacher ratings combined were used for emotional behaviors, the composite scores would hardly reflect the common factor (${\omega }_{H}$ = 0.06 for the common factor).

The Predictability of Agreement and Disagreement on Reading and Math

To assess the predictability of dis/agreement on student reading scores (or math), the trifactor model was fit with reading as an outcome. That is, the effect of common factor on reading was estimated for the predictability of agreement; the effects of teacher and student unique factors on reading were estimated for the predictability of disagreement. First, the trifactor model showed reasonable fit with reading as an outcome: χ²(76) = 3146. 128, p < .001; RMSEA = 0.041, 90% CI = [0.040, 0.042]; CFI = 0.940; SRMR = 0.052; with math: χ²(76) = 3146. 128, p < .001; RMSEA = 0.041, 90% CI = [0.040, 0.042]; CFI = 0.940; SRMR = 0.052. The standardized regression coefficients are shown in Table 4.

Table 4 Standardized Regression Coefficients

Full size table

All three factors were positively associated with reading and math. When students and teachers agreed, higher (i.e., more positive) emotional behaviors were related to higher reading/math performance. Students with higher emotional behaviors from the student and teacher unique perspectives also showed higher reading performance. However, the predictability of the student unique factor was lower compared to those of common and teacher unique factors. Emotional behaviors, overall, explained 11% of the variance in reading scores and math scores, while common and teacher factors explained 5% each, and student factor explained only 1% or less.

The Predictability of Agreement and Disagreement on Reading by Grade

First, the degrees of dis/agreement were evaluated as equal across grades by testing the equality of factor loadings across grades (namely, metric invariance) in the multigroup trifactor model. As demonstrated in von der Embse et al., (2019), metric invariance was satisfied. The fit of the metric invariance model was acceptable: χ²(1144) = 5403. 656, p < .001; RMSEA = 0.045, 90% CI = [0.044, 0.046]; CFI = 0.916; SRMR = 0.070. Also, the fit was not deteriorated much from the configural invariance model, which again supported metric invariance: Δχ²(Δdf = 264) = 738.136, p < .001; ΔRMSEA = − 0.003 (indicating the metric invariance model fit even better); ΔCFI = 0.009; ΔSRMR = 0.008. Thus, the degrees of dis/agreement on emotional behaviors were not different across grades. Because von der Embse et al., (2019) found that one item intercept of EB was not invariant across grades, one item intercept varied across grades (partial scalar invariance model: χ²(1237) = 5911. 384, p < .001; RMSEA = 0.045, 90% CI = [0.044, 0.046]; CFI = 0.908; SRMR = 0.071) and proceeded to build the multigroup trifactor models with reading and math as outcomes, respectively. The predictability of dis/agreement on the outcome was compared across grades by examining regression coefficients constrained equal, and the model with those coefficients freely estimated. The results of these two models are shown in Table 5.

Table 5 Predictability of Agreement on Two Outcomes

Full size table

For both reading and math, there was no strong evidence for the relaxed model in which the regression coefficients were allowed to be different across grade levels. Although the Satorra-Bentler scaled chi-square difference was statistically significant (p < .001) and the AIC supported the relaxed model, other indices favored the constrained model in which all regression coefficients were constrained equal across grades. The AIC is well known for its over-sensitivity to trivial differences when the sample size is sufficiently large (Akaike, 1974). Thus, the constrained model was selected considering BIC and saBIC with more weights. In addition, even though 27 parameters were constrained as equal across groups, the deterioration of model fit was negligible (no or near-zero changes in CFI, RMSEA, and SRMR). Taken together, the predictability of dis/agreement on reading and math was not different across grades. In the constrained model, the regression coefficients of common, student unique, and teacher unique factors were 0.23, 0.11, and 0.21, respectively, for reading; 0.24, 0.06, and 0.32, respectively, for math across all grades, which confirmed the findings of the single-group trifactor model shown in Table 4.

Supplementary Analysis Using Polynomial Regression

With the subsample with the complete data, polynomial regression was utilized with reading (or math) as an outcome and polynomial terms as predictors (student scores and teacher scores of EB and their quadratic and interaction terms) to assess how the degree of agreement or disagreement was related with reading (or math) performance. Because the results are similar between reading and math, results of reading outcomes are presented. Based on the estimated polynomial regression coefficients presented in Table 6, a response surface plot was created (Fig. 2). Results are interpreted based on the response surface plot. First, as illustrated by the almost flat surface, student reading did not improve as teachers and students agreed more about EB: β3 – β4 + β5 = 0.01, 95% CI [-0.02, 0.03], which was not statistically significant; Slope = 5.14, 95% CI [2.86, 23.70]. Although the trifactor model showed that students and teachers mostly disagreed about EB (very low ECV), when they agreed, EB linearly predicted reading performance, as illustrated in the right upper panel (EB S = EB T) in Fig. 2: as EB is higher, reading performance is higher (β1 + β2 = 0.37, 95% CI [0.34, 0.40]). When they disagreed (EB S = -EB T), reading performance tended to be lower curvilinearly as student rated higher on EB (see the right lower panel); it tended to be higher curvilinearly as teachers rated higher. The R-square of the polynomial regression model was 0.08.

Table 6 Polynomial Regression Coefficients for Reading

Full size table

Discussion

The first purpose of this paper was to examine the degree of agreement between informants regarding ratings of internalizing concerns. The ECV statistic resulting from the trifactor model indicated low agreement between student and teacher informants, as indicated by the low proportion of variance due to the common factor (which represented informant agreement within the trifactor model). A more notable proportion of the variance was attributed to the informant unique factors (which represented informant disagreement in the model), suggesting ratings of student internalizing concerns are largely driven by informant perception.

Results from this investigation are not conclusive in the context of clinical utility. It could be that both teacher and student ratings are necessary to achieve a complete understanding of a student’s internalizing symptoms. The students offer information regarding their internal states, as well as their perception of their external manifestations of their internalizing concerns. Meanwhile, the teacher provides perhaps a more accurate perception of the student’s external manifestations, which are grounded in more objective, third-party observations of the student’s behavior. In contrast, it could be that one informant (e.g., the teacher) affords unique but uninformative data regarding student internalizing concerns. However, experience, stress, and preconceived notions and implicit biases influence teacher ratings of student social-emotional, and behavioral needs thus calling into question which rater offers more reliable information (Splett et al., 2018). Unfortunately, the current study does not provide a means of discerning the difference between that which is unique and accurate or informative. In addition, there are some internalizing behaviors (e.g., anxiety) that could be positively associated with academic performance. To this end, future research should employ external, objective indicators of student internalizing concerns to determine the extent to which each informant’s perspective contributes to an understanding of those concerns. Of course, the field remains somewhat divided on what such objective indicators should be, as most measures of internalizing concerns are grounded in rater perception, while other observation-based tools can miss crucial elements of emotional functioning. This suggests the need for basic research like this study and measurement work that informs the creation of accurate and objective tools.

It should also be noted that the measurement invariance findings suggested the pattern of informant dis/agreement was equal across grades. This result is compelling given documented developmental differences in internalizing concerns. Young children are more likely to externalize their emotionality (Leadbeater & Homel, 2015), potentially allowing teachers to better perceive child emotions, thereby promoting greater informant agreement. This agreement might lessen over time as emotions move more inward for older children and adolescents, making them less observable to teachers (Zee & Moritz Rudasill, 2021). Yet, the current findings would suggest that the pattern of dis/agreement remains constant, suggesting students and teachers have unique information to provide across all grades, and thus that both informants should be considered when evaluating internalizing concerns.

The second purpose of this study was to examine whether informant concordance changed across grades and whether this differentially impacted distal outcomes. There were two particularly notable findings in relation to this purpose. First, the measurement invariance findings suggested the pattern of informant dis/agreement was equal across grades. This finding is significant given developmental differences in internalizing concerns. As noted above, young children are more likely than their older peers to externalize their emotionality, potentially allowing teachers to better perceive their internalizing concerns, thus promoting informant agreement. One might then expect the agreement to lessen over time as student emotions move more inward for older children and adolescents. Yet, the current findings would suggest that the pattern of dis/agreement remains constant, suggesting students and teachers have unique information to provide at all ages. Second, multigroup trifactor analyses indicated that the common factor was positively associated with reading and math scores, showing that agreement between teachers and students predicted student academic performance. This was further supported by the polynomial regression, which demonstrated that when teachers and students agreed, emotional behavior positively predicted reading and math performance. Yet, the rater-specific factors showed a slightly different pattern. Though the teacher unique factor was related to math and reading scores (explaining 5% of the variance in these scores), the student unique factor was weakly related (explaining only 1% of the variance in math and reading scores). These findings could suggest a teacher’s unique perceptions of student emotional functioning are in part related to their academic performance. This would align with previous research, which has demonstrated that teachers perceive student frustration and anxiety related to academic tasks (e.g., high-stakes tests; Lyneham et al., 2008), as well as enjoyment and motivation related to academics (Dicke et al., 2012).

Taken together, findings suggested the relationship between emotional behavior and academic outcomes was consistent across grades and could hold implications regarding the nature of emotionality and its relationship with academic functioning. That is, student emotions, including perceptions of which are agreed upon or unique to each informant, do not become more or less tied to academics across grades. Of course, there could be exceptions to this rule, with higher-achieving students experiencing greater academic-related emotions over time in accordance with increased academic demands (Suldo & Shaunessy-Dedrick, 2013). Results from the present study offer several potentially important benefits for research and practices. First, given consistency in agreement across grades, students from all grade levels, including young children (kindergarten through third grade) may offer informative ratings of emotional (or internalizing) behaviors. Second, teacher ratings demonstrated a relatively stronger relation to distal academic outcomes compared with student self-ratings. In comparison to prior suggestions that students should be the preferred informant in later (high school) grades (Romer et al., 2020), results from this study suggest even middle and high school teachers with relatively limited access to students can offer potentially important information of student emotional functioning. Thus, when school teams are considering how to implement universal screening across various grades and settings, both student and teacher raters could offer important information for decision-making.

Limitations and Future Directions

Certain limitations to this study should be noted, as they present opportunities for future research in this area. First, though the national dataset under examination was quite large, it contained sparse demographic data for students. Though the current findings could be broadly generalizable, it is largely unknown to whom and where the findings generalize. Thus, it is imperative for future research to consider datasets with more complete data, thereby yielding a better understanding of result relevance. Second, the examined outcomes were limited to academic functioning. Future research should consider a broader range of outcomes, including other ecologically valid school-based variables, which are of frequent interest to educators (e.g., office discipline referrals, attendance, nurse visits). Third, we employed a single indicator of internalizing concerns that was founded upon seven items, which were completed by students and teachers. Future research should consider longer and more varied indicators of emotional functioning, while also considering caregiver perspectives. Fourth, the current dataset was cross-sectional, amassed at a single time point. Subsequent studies ought to examine longitudinal datasets, which would permit the examination of the change in patterns of dis/agreement over time, as well as whether changes in emotional functioning and dis/agreement predict academic outcomes.

Conclusion

This study highlighted the expected relatively low levels of agreement between teacher and student report of internalizing concerns. However, when raters did exhibit agreement, higher agreement was indicative of later higher reading and math performance. These relationships were demonstrated across grade levels. These results highlight the promise of using multi-informant decision making to inform intervention supports and ultimately improve student outcomes.

Notes

http://public.kenan-flagler.unc.edu/faculty/edwardsj/downloads.htm.

References

Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101(2), 213–232. https://doi.org/10.1037/0033-2909.101.2.213.
Article PubMed Google Scholar
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705.
Article Google Scholar
Anderson, J. K., Ford, T., Soneson, E., Coon, J. T., Humphrey, A., Rogers, M., Moore, D., Jones, P. B., Clarke, E., & Howarth, E. (2019). A systematic review of effectiveness and cost-effectiveness of school-based identification of children and young people at risk of, or currently experiencing mental health difficulties. Psychological Medicine, 49(1), 9–19. https://doi.org/10.1017/S0033291718002490.
Article PubMed Google Scholar
Auerbach, E. R., Chafouleas, S. M., & Briesch, A. M. (2019). State-Level Guidance on School-Based screening for Social, Emotional, and behavioral risk: a Follow-Up study. School Mental Health, 11(1), 141–147. https://doi.org/10.1007/s12310-018-9278-z.
Article Google Scholar
Bauer, D. J., Howard, A. L., Baldasaro, R. E., Curran, P. J., Hussong, A. M., Chassin, L., & Zucker, R. A. (2013). A trifactor model for integrating ratings across multiple informants. Psychological methods, 18(4), 475–493. https://doi.org/10.1037/a0032475.
Article PubMed PubMed Central Google Scholar
Baumgartner, N., Häberling, I., Emery, S., Strumberger, M., Nalani, K., Erb, S., Bachmann, S., Wöckel, L., Müller-Knapp, U., Rhiner, B., Contin-Waldvogel, B., Schmeck, K., Walitza, S., & Berger, G. (2020). When parents and children disagree: informant discrepancies in reports of depressive symptoms in clinical interviews. Journal of Affective Disorders, 272, 223–230. https://doi.org/10.1016/j.jad.2020.04.008.
Article PubMed Google Scholar
Bruhn, A. L., Woods-Groves, S., & Huddle, S. (2014). A preliminary investigation of emotional and behavioral screening Practices in K-12 schools. Education and Treatment of Children, 37(4), 611–634. https://doi.org/10.1353/etc.2014.0039.
Article Google Scholar
Christ, T. J., Aranas, Y., Johnson, L., Kember, J., Kilgus, S., Kiss, A., McCarthy Trentman, A., Monaghen, B., Nelson, G., Nelson, P., Newell, K., Van Norman, E., White, M., & Windram, H. (2014). Formative Assessment System for Teachers Technical Manual. Minneapolis, MN: FastBridge Learning.
Google Scholar
Collins, T. A., Dart, E. H., & Arora, P. G. (2019). Addressing the internalizing behavior of students in schools: Applications of the MTSS model. School Mental Health, 11(2), 191–193.
Article Google Scholar
De Los Reyes, A. (2013). Strategic objectives for improving understanding of informant discrepancies in developmental psychopathology research. Development and Psychopathology, 25(3), 669–682. https://doi.org/10.1017/S0954579413000096.
Article Google Scholar
De Los Reyes, A., Augenstein, T. M., Wang, M., Thomas, S. A., Drabick, D. A. G., Burgers, D. E., & Rabinowitz, J. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900.
Article PubMed Central Google Scholar
De Los Reyes, A., & Kazdin, A. E. (2004). Measuring informant discrepancies in clinical child research. Psychological Assessment, 16(3), 330–334. https://doi.org/10.1037/1040-3590.16.3.330.
Article Google Scholar
DiPerna, J.C. (2006) Academic enablers and student achievement: Implications for assessment and intervention services in the schools. Psychology in the Schools 43(1) 7–17 https://doi.org/10.1002/pits.20125
Article Google Scholar
Dever, B. V., Dowdy, E., & DiStefano, C. (2018). Examining the Stability, Accuracy, and predictive validity of behavioral–emotional screening Scores across Time to inform repeated screening procedures. School Psychology Review, 47(4), 360–371. https://doi.org/10.17105/SPR-2017-0092.V47-4.
Article Google Scholar
Dicke, A. L., Lüdtke, O., Trautwein, U., Nagy, G., & Nagy, N. (2012). Judging students’ achievement goal orientations: are teacher ratings accurate? Learning and Individual Differences, 22(6), 844–849. https://doi.org/10.1016/j.lindif.2012.04.004.
Article Google Scholar
Edwards, J. R., & Cable, D. M. (2009). The value of value congruence. Journal of Applied Psychology, 94(3), 654–677. https://doi.org/10.1037/a0014891.
Article PubMed Google Scholar
Eklund, K., Kilgus, S. P., von der Embse, N., Beardmore, M., & Tanner, N. (2017). Use of universal screening scores to predict distal academic and behavioral outcomes: a multi-level approach. Psychological Assessment, 29, 486–499. doi:https://doi.org/10.1037/pas0000355.
Article PubMed Google Scholar
Evans, S. W., Simonsen, B., & Dolan, G. (2013). School-level practices. In L. Eber, S. Barrett, & M. Weist (Eds.), Advancing education effectiveness: interconnecting school mental health and school-wide positive behavioral interventions and supports (pp. 86–107). Eugene, OR: University of Oregon, College of Education.
Google Scholar
Herman, K. C., Cohen, D., Reinke, W. M., Ostrander, R., Burrell, L., McFarlane, E., & Duggan, A. K. (2018). Using latent profile and transition analyses to understand patterns of informant ratings of child depressive symptoms. Journal of School Psychology, 69, 84–99. https://doi.org/10.1016/j.jsp.2018.05.004.
Article PubMed Google Scholar
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118.
Article Google Scholar
Jenkins, A., Thoman, S., Wang, Y., von der Embse, N., Kilgus, S., & Reynolds, F. (2021). Identifying type and evaluating stability of behavioral risk. Psychology in the Schools, 58(11), 2225–2237. https://doi.org/10.1002/pits.22587.
Article Google Scholar
Kalberg, J. R., Lane, K. L., Driscoll, S., & Wehby, J. (2011). Systematic screening for emotional and behavioral Disorders at the High School Level: a formidable and necessary Task. Remedial and Special Education, 32(6), 506–520. https://doi.org/10.1177/0741932510362508.
Article Google Scholar
Kilgus, S. P., Bowman, N. A., Christ, T. J., & Taylor, C. N. (2017). Predicting academics via behavior within an elementary sample: an evaluation of the Social, Academic, and emotional behavior risk screener (SAEBRS). Psychology in the Schools, 54(3), 246–260. https://doi.org/10.1002/pits.21995.
Article Google Scholar
Kilgus, S. P., Eklund, K., von der Embse, N. P., Taylor, C. N., & Sims, W. A. (2016). Psychometric defensibility of the Social, Academic, and emotional behavior risk screener (SAEBRS) teacher rating scale and multiple gating procedure within elementary and middle school samples. Journal of School Psychology, 58, 21–39. https://doi.org/10.1016/j.jsp.2016.07.001.
Article PubMed Google Scholar
Kilgus, S. P., Sims, W. A., von der Embse, N. P., & Riley-Tillman, T. C. (2015). Confirmation of models for interpretation and use of the Social and Academic Behavior Risk Screener (SABRS). School Psychology Quarterly, 30(3), 335–352. https://doi.org/10.1037/spq0000087.
Article PubMed Google Scholar
Kilgus, S. P., Sims, W. A., von der Embse, N. P., & Taylor, C. N. (2016). Technical adequacy of the Social, Academic, and Emotional Behavior Risk Screener in an Elementary Sample. Assessment for Effective Intervention, 42(1), 46–59. https://doi.org/10.1177/15345084156232693.
Article Google Scholar
Kilgus, S. P., & von der Embse, N. P. (2014). Social, Academic, and Emotional Behavior Risk Screener (SAEBRS). Unpublished technical manual.
Kim, E. & von der Embse, N. & von der, Combined approach to multi-informant data using latent factors and latent classes: Trifactor mixture model, https://doi.org/10.1177/0013164420973722 (2020).
Leadbeater, B. J., & Homel, J. (2015). Irritable and defiant sub-dimensions of ODD: their stability and prediction of internalizing symptoms and conduct problems from adolescence to young adulthood. Journal of Abnormal Child Psychology, 43(3), 407–421. https://doi.org/10.1007/s10802-014-9908-3.
Article PubMed PubMed Central Google Scholar
Lyneham, H. J., Street, A. K., Abbott, M. J., & Rapee, R. M. (2008). Psychometric properties of the school anxiety scale—teacher report (SAS-TR). Journal of Anxiety Disorders, 22(2), 292–300. https://doi.org/10.1016/j.janxdis.2007.02.001.
Article PubMed Google Scholar
Lyon, A. R., Whitaker, K., French, W. P., Richardson, L. P., Wasse, J. K., & McCauley, E. (2016). Collaborative care in schools: enhancing integration and impact in youth mental health. Advances in School Mental Health Promotion, 9(3–4), 148–168. https://doi.org/10.1080/1754730X.2016.1215928.
Article PubMed PubMed Central Google Scholar
Makol, B. A., Youngstrom, E. A., Racz, S. J., Qasmieh, N., Glenn, L. E., & De Los Reyes, A. (2020). Integrating multiple Informants’ reports: how conceptual and measurement models may address long-standing problems in clinical decision-making. Clinical Psychological Science, 8(6), 953–970. https://doi.org/10.1177/2167702620924439.
Article Google Scholar
McDonald, R. P. (1999). Test theory: a unified treatment. Mahwah, NJ: Erlbaum.
Google Scholar
Merikangas, K. R., He, J., Burstein, M., Swanson, S. A., Avenevoli, S., Cui, L., Benjet, C., Georgiades, K., & Swendsen, J. (2010). Lifetime prevalence of Mental Disorders in US adolescents: results from the National Comorbidity Survey replication–adolescent supplement (NCS-A). Journal of the American Academy of Child & Adolescent Psychiatry, 49(10), 980–989. https://doi.org/10.1016/j.jaac.2010.05.017.
Article Google Scholar
Miller, L. D., Martinez, Y. J., Shumka, E., & Baker, H. (2014). Multiple informant agreement of child, parent, and teacher ratings of child anxiety within community samples. Canadian Journal Of Psychiatry, 59(1), 34–39. https://doi.org/10.1177/070674371405900107.
Article PubMed PubMed Central Google Scholar
Mojtabai, R., Olfson, M., & Han, B. (2016). National Trends in the prevalence and treatment of Depression in Adolescents and Young adults. Pediatrics, 138(6), e20161878. https://doi.org/10.1542/peds.2016-1878.
Article PubMed PubMed Central Google Scholar
Navarro, M. C., Orri, M., Nagin, D., Tremblay, R. E., Oncioiu, S. I., Ahun, M. N., Melchior, M., van der Waerden, J., Galéra, C., & Côté, S. M. (2020). Adolescent internalizing symptoms: the importance of multi-informant assessments in childhood. Journal of Affective Disorders, 266, 702–709. https://doi.org/10.1016/j.jad.2020.01.106.
Article PubMed Google Scholar
Perfect, M. M., Turley, M. R., Carlson, J. S., Yohanna, J., & Saint Gilles, M. P. (2016). School-Related outcomes of traumatic event exposure and traumatic stress symptoms in students: a systematic review of Research from 1990 to 2015. School Mental Health, 8(1), 7–43. https://doi.org/10.1007/s12310-016-9175-2.
Article Google Scholar
Radhakrishnan, L., Leeb, R. T., Bitsko, R. H., Carey, K., Gates, A., Holland, K. M., Hartnett, K. P., Kite-Powell, A., DeVies, J., Smith, A. R., van Santen, K. L., Crossen, S., Sheppard, M., Wotiz, S., Lane, R. I., Njai, R., Johnson, A. G., Winn, A., & Kirking,. Anderson, K. N. (2022). Pediatric Emergency Department Visits Associated with Mental Health Conditions Before and During the COVID-19 Pandemic — United States, January 2019–January 2022. MMWR Morb Mortal Wkly Rep 2022, 71(8), 319–324. https://doi.org/10.15585/mmwr.mm7108e2
Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor Models and Rotations: exploring the extent to which Multidimensional Data Yield Univocal Scale Scores. Journal of Personality Assessment, 92(6), 544–559. https://doi.org/10.1080/00223891.2010.496477.
Article PubMed PubMed Central Google Scholar
Reise, S.P. (2012) The Rediscovery of Bifactor Measurement Models. Multivariate Behavioral Research, 47(5) 667–696 1 https://doi.org/10.1080/00273171.2012.715555
Reise, S. P., Scheines, R., Widaman, K. F., & Haviland, M. G. (2013). Multidimensionality and Structural Coefficient Bias in Structural equation modeling: a bifactor perspective. Educational and Psychological Measurement, 73(1), 5–26. https://doi.org/10.1177/0013164412449831.
Article Google Scholar
Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016). Evaluating bifactor models: calculating and interpreting statistical indices. Psychological Methods, 21(2), 137–150. https://doi.org/10.1037/met0000045.
Article PubMed Google Scholar
von der Romer, N., Eklund, K., Kilgus, S., Perales, K., Splett, J. W., Sudlo, S., & Wheeler, D. (2020). Best Practices in Social, Emotional, and Behavioral Screening: An Implementation Guide. Version 2.0. Retrieved from smhcollaborative.org/universalscreening
Splett, J., Smith-Millman, M., Raborn, A., Brann, K., Flaspohler, P., & Maras, M. (2018) Student teacher and classroom predictors of between-teacher variance of students’ teacher-rated behavior. School Psychology Quarterly 33(3) 460–468 https://doi.org/10.1037/spq0000241
Article PubMed Google Scholar
Suldo, S. M., & Shaffer, E. J. (2008). Looking beyond psychopathology: The dual-factor model of Mental Health in Youth. School Psychology Review, 37(1), 52–68. https://doi.org/10.1080/02796015.2008.12087908.
Article Google Scholar
Suldo, S. M., & Shaunessy-Dedrick, E. (2013). The psychosocial functioning of high school students in academically rigorous programs. Psychology in the Schools, 50, 823–843. https://doi.org/10.1002/pits.21708.
Article Google Scholar
Ten Berge, J. M., & Sočan, G. (2004). The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika, 69(4), 613–625. https://doi.org/10.1007/BF02289858.
Article Google Scholar
Twenge, J. M., Cooper, A. B., Joiner, T. E., Duffy, M. E., & Binau, S. G. (2019). Age, period, and cohort trends in mood disorder indicators and suicide-related outcomes in a nationally representative dataset, 2005–2017. Journal of Abnormal Psychology, 128(3), 185–199. https://doi.org/10.1037/abn0000410.
Article PubMed Google Scholar
van der Ende, J., Verhulst, F. C., & Tiemeier, H. (2012). Agreement of informants on emotional and behavioral problems from childhood to adulthood. Psychological Assessment, 24(2), 293–300. https://doi.org/10.1037/a0025500.
Article PubMed Google Scholar
von der Embse, N., Eklund, K., & Kilgus, S. (2022). Conducting behavioral and social-emotional assessments in MTSS: screen to Intervene. Routledge. https://doi.org/10.4324/9780429355790.
von der Embse, N.P., Kilgus, S.P., Iaccarino, S., & Levi-Nielsen, S. (2017). Screening for student mental health risk: Diagnostic accuracy, measurement, invariance, and predictive validity of the Social, Academic, and Emotional Behavior Risk Screener-Student Rating Scale (SAEBRS-SRS). School Mental Health, 9,273–283. https://doi.org/10.1007/s12310-017-9214-7.
von der Embse, N., Kim, E., Jenkins, A., Sanchez, A., Kilgus, S. P., & Eklund, K. (2021). Profiles of Rater Dis/Agreement within Universal Screening in Predicting Distal Outcomes. Journal of Psychopathology and Behavioral Assessment, 43(3), 632–645. https://doi.org/10.1007/s10862-021-09869-0.
Article Google Scholar
von der Embse, N., Kim, E. S., Kilgus, S., Dedrick, R., & Sanchez, A. (2019). Multi-informant universal screening: evaluation of rater, item, and construct variance using a trifactor model. Journal of School Psychology, 77, 52–66. https://doi.org/10.1016/j.jsp.2019.09.005.
Article PubMed Google Scholar
von der Embse, N. P., Pendergast, L. L., Kilgus, S. P., & Eklund, K. R. (2016). Evaluating the applied use of a mental health screener: structural validity of the Social, Academic, and emotional behavior risk screener. Psychological Assessment, 28(10), 1265–1275. https://doi.org/10.1037/pas0000253.
Article PubMed Google Scholar
Weir, K. (2020). September). Safeguarding student mental health.Monitor on Psychology, 51(6). http://www.apa.org/monitor/2020/09/safeguarding-mental-health
Youngstrom, E., Loeber, R., & Stouthamer-Loeber, M. (2000). Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. Journal of Consulting and Clinical Psychology, 68(6), 1038–1050. https://doi.org/10.1037/0022-006X.68.6.1038.
Article PubMed Google Scholar
Zee, M., & Moritz Rudasill, K. (2021). Catching sight of children with internalizing symptoms in upper elementary classrooms. Journal of School Psychology, 87, 1–17. https://doi.org/10.1016/j.jsp.2021.05.002.
Article PubMed Google Scholar

Download references

Acknowledgements

None.

Funding

The authors declare no funding received for this manuscript or study.

Author information

Authors and Affiliations

College of Education, University of South Florida, 4202 E. Fowler Avenue, Tampa, FL, 33620, United States
Nathaniel von der Embse, Eunsook Kim, Dorie Ross & Thomas Koza
University of Wisconsin-Madison, Madison, Wisconsin, United States
Stephen Kilgus

Authors

Nathaniel von der Embse
View author publications
You can also search for this author in PubMed Google Scholar
Eunsook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dorie Ross
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Kilgus
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Koza
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Dr. Nate von der Embse designed the study and coordinating the writing of the manuscript. Dr. Eunsook Kim conducted the analyses. Thomas Koza and Dorie Ross edited the manuscript and wrote the reference section. Dr. Steve Kilgus drafted the discussion section. All authors contributed to and have approved the final manuscript.

Corresponding author

Correspondence to Nathaniel von der Embse.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

von der Embse, N., Kim, E., Ross, D. et al. Multi-informant Assessment of Internalizing Concerns: Rater Concordance and Implications for Decision-Making. J Psychopathol Behav Assess 45, 234–246 (2023). https://doi.org/10.1007/s10862-023-10026-y

Download citation

Received: 29 July 2022
Revised: 23 January 2023
Accepted: 25 January 2023
Published: 01 February 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10862-023-10026-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-informant Assessment of Internalizing Concerns: Rater Concordance and Implications for Decision-Making

Abstract

Similar content being viewed by others

Efficient Assessment of the Whole Social–Emotional Child: Using Parents to Rate SEL Competencies and Concurrent Emotional Behavior Concerns

The Clinical Significance of Informant Agreement in Externalizing Behavior from Age 3 to 14

Teacher Recognition, Concern, and Referral of Children’s Internalizing and Externalizing Behavior Problems

Multi-Informant Universal Screening

Purpose of the Present Study

Method

Sample

Measures

Social, Academic, and Emotional Behavior Risk Screener (SAEBRS)

Reading and Math

Data Analysis

Trifactor Model

Indices to Evaluate Agreement and Disagreement

Trifactor Model with Outcomes

Multigroup Trifactor Model with Outcomes

Supplementary Analysis: Polynomial Regression With Response Surface Plot

Missing Data

Results

Descriptive Statistics

The Trifactor Model and the Degree of Agreement

The Predictability of Agreement and Disagreement on Reading and Math

The Predictability of Agreement and Disagreement on Reading by Grade

Supplementary Analysis Using Polynomial Regression

Discussion

Limitations and Future Directions

Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation