Executive functioning (EF) is a multi-component set of cognitive processes that encompasses the planning and execution of goal directed behavior over time (Best et al., 2009). EF is highly linked with academic functioning in children over time (e.g., Best et al., 2011; Clark et al., 2010) in students with and without specific learning disabilities (LD; Best et al., 2009) and with and without ADHD (Rinsky & Hinshaw, 2011). Organization, time management, and planning (OTMP) skills are behavioral manifestations of EF highly related to academic functioning (Barkley, 1997; Harrier & DeOrnellas, 2005). OTMP challenges may manifest in numerous ways in school, such as forgetting assignments, misplacing books or supplies, and losing track of time while completing a task. OTMP challenges are widespread among students, regardless of diagnostic presentation. For example, there is evidence that approximately 50% of students with ADHD and 30% of students with LD have OTMP deficits (Best et al., 2009), and as many as 15% of students with no identified disabilities have OTMP problems (Abikoff & Gallagher, 2009).

Impairment related to OTMP skills deficits begins in elementary school and often worsens as children proceed through school (Abikoff & Gallagher, 2009; Power et al., 2006). OTMP deficits have been shown to impact academic functioning in later grades, as OTMP deficits in middle grades predict lower GPA in high school (Langberg et al., 2011). Given the prevalence of OTMP challenges and the link between organizational challenges and academic performance, it is important to intervene to remediate these deficits to improve child outcomes. Remediation is also appropriate because OTMP difficulties contribute to child-parent conflicts as well (Abikoff et al., 2013).

A number of programs have been shown to be successful in remediating OTMP challenges (Bikic et al., 2017), including Organizational Skills Training (OST; Abikoff et al., 2013), Homework, Organization, and Planning Skills (HOPS; Langberg et al., 2018), and the Challenging Horizons Program (CHP; DuPaul et al., 2021; Evans et al., 2016). Evaluations of these programs have demonstrated reductions in OTMP deficits, homework problems, and in some cases academic underachievement among elementary, middle school, and high school students, suggesting that OTMP skills are responsive to intervention and an appropriate target for skills training interventions. To identify students in need of interventions for OTMP deficits and quantify the success of interventions in reducing these problems, it is important to have psychometrically sound tools to accurately assess and monitor OTMP skills.

Assessment of Executive Functioning and OTMP Skills

EF is commonly measured via performance-based tests and informant interviews. Performance-based tests (e.g., Wisconsin Card Sort Task, Stroop Test, Tower Test; for review and meta-analysis, see Toplak et al., 2013) measure EF skills, such as task switching, inhibition, planning, or working memory, using neurocognitive tasks administered under timed, standardized conditions. Although these performance-based measures may be useful in identifying deficits in neurocognitive processing, a criticism of these tests is that they lack ecological validity by failing to reflect impairment in daily life (Barkley & Fisher, 2011; Kamradt et al., 2014). EF may be measured by interviews with parents or teachers, but these methods generally are not standardized or norm-referenced.

EF can also be measured through behavior rating scales that rely on informant report. There are multiple rating scales available to assess EF. For example, the Barkley Deficits in Executive Function Scale—Children and Adolescents (BDEFS- CA; Barkley, 2012) is a comprehensive parent-report measure of EF for children and adolescents aged 6–17. There are five subscales on the BDEFS-CA: Self-Management to Time, Self- Organization/Problem Solving, Self-Restraint, Self-Motivation, and Self-Regulation of Emotion. The Self-Management to Time and Self-Organization/Problem Solving subscales, in particular, assess some aspects of OTMP skills, but no subscales on this form assess organization of materials. In addition, the Behavioral Rating Inventory of Executive Function, Second Edition (BRIEF-2; Gioia et al., 2000) is a commonly used measure of EF with parent, teacher, and child self-report rating forms, which assesses behavioral, emotional, and cognitive regulation. Several subscales of the BRIEF-2 assess OTMP skills, including Plan/Organize, a 10-item scale on both the parent and teacher forms, and Organization of Materials, a 5-item scale that is only on the teacher form. None of the subscales measure time management skills.

Another measure, the Adolescent Academic Problems Checklist (AAPC; Sibley et al., 2014), was developed for students in middle and high school (ages 11–17) based on the concerns of parents and teachers of adolescents with ADHD. A factor analysis of the AAPC identified an academic skills factor and a behavior problems factor. Although many of the items on the academic skills factor pertain to OTMP skills, the AAPC was not designed as a comprehensive OTMP skills assessment measure. In addition, the AAPC was developed for adolescents with ADHD, so it is not known how it would perform with younger children and those without ADHD. In addition, measures of homework performance, such as the Homework Problem Checklist (HPC; Anesko et al., 1987) and the Homework Performance Questionnaire (HPQ; Power et al., 2007), include items related to OTMP skill deficits, but these measures have been designed primarily to examine functional outcomes of these skill deficits and not to assess the deficits directly.

The Children’s Organizational Skills Scale (COSS; Abikoff & Gallagher, 2009) is a multi-informant (i.e., parent, teacher, child) measure of OTMP skills developed for children ages 8–13. The COSS is unique in its focus on assessing an extensive range of OTMP skills as manifested in real-world conditions at school and home. Due to the importance of OTMP skills for academic success and the broad range of OTMP skills assessed by the COSS, this measure has potential benefits for identifying students with OTMP skills deficits as young as age 8 and evaluating the effectiveness of interventions designed to improve these skills. Indeed, the COSS has been demonstrated in numerous studies to be sensitive to the effects of interventions targeting OTMP deficits (e.g., Abikoff et al., 2013; Evans et al., 2016; Langberg et al., 2018).

Research on the COSS is limited. The technical manual (Abikoff & Gallagher, 2009) describes initial research about the factor structure. Based on a series of exploratory and confirmatory factor analyses (EFAs, CFAs), the test developers identified a three-factor model consisting of 26 items for the COSS-P and 28 items for the COSS-T. During the confirmatory factor analytic stage of COSS development, parceling, or creating composites of items, was used to account for items expected to share variance. Parceling may improve model fit by reducing multi-collinearity and the skewness/kurtosis of individual items, as well as improving internal consistency (Plummer, 2000). However, parceling can inflate fit statistics when important assumptions have not been met. To our knowledge, only one additional study of the COSS has been published (Molitor et al., 2017). This study examined the factor structure of the 26 items of the COSS-P identified by the test developers in a sample of middle school students with ADHD (N = 619). This study confirmed that a model with a bifactor solution, that is, a general factor and three subfactors, provided acceptable model fit. Of note, a well-fitting factor solution was derived without resorting to the parceling of items.

Although the study by Molitor et al. (2017) represents a clear step forward in understanding the factor structure of the COSS, there are several limitations worth noting. Molitor’s work examined only the parent-report version of the COSS using a middle school sample; it is not known if the bifactor model identified would apply to the teacher-report version and elementary students. Additionally, this study examined only students diagnosed with ADHD. This is a significant limitation because many children without ADHD have OTMP challenges (Best et al., 2009) and might benefit from an intervention to improve OTMP skills. As such, there is a need to know how the COSS functions in students with OTMP challenges who may not be diagnosed with ADHD or classified as special education students.

Purpose of Investigation

The purpose of this investigation was to evaluate the psychometric properties of the COSS-Parent Form (COSS-P) and COSS-Teacher Form (COSS-T) in two studies. The first study was designed to evaluate the factor structure and discriminant validity of the parent and teacher versions of the COSS using a large sample of students in grades 2 to 8 derived from a non-clinical population and a clinical population. In addition, Study 1 further examined sex differences in COSS scores, which were identified in prior research (Abikoff & Gallagher, 2009). The second study examined the factor structure and convergent validity of both versions of the COSS in a referred sample of children in grades 3 to 5 identified as having OTMP deficits by teacher nomination.

Study 1

Method

Study 1 had three aims. Aim 1 focused on exploring the factor structure of the parent and teacher versions of the COSS; Aim 2 sought to confirm the best-fitting model among options identified in Aim 1; and Aim 3 investigated the discriminant validity of the COSS, specifically with regard to its ability to differentiate cases with ADHD from a comparison group. Separate, equal subgroups of the sample were used to examine Aims 1, 2, and 3. This study was not preregistered.

Participants

Data sets used in this study were derived from the separate non-clinical and clinical samples that were used to standardize and examine the validity of the COSS (Abikoff & Gallagher, 2009); the sample included 1,139 children rated on the COSS-T and 1,155 children rated on the COSS-P. The non-clinical samples included 66% (n = 762) of the parent-rated students and 77% (n = 877) of the teacher-rated students. The data sets did not indicate whether teachers and parents rated more than one child. Data were collected between April 2004 and May 2007 across 19 different states in the U.S. and 2 provinces of Canada (Alberta, Ontario). The non-clinical sample of children was collected to resemble the U.S. population according to 2000 U.S. Census Bureau data with regard to race and ethnicity. The total sample included a relatively equal distribution of children across three age groups (8–9, 10–11, 12–13 years). Mean age was 10.5 (SD = 1.7) in the parent-rated sample and 10.8 (SD = 1.7) in the teacher-rated sample. In the parent sample, 81% of students lived in the U.S. and 19% were from Canada. Of those living in the U.S., 35% were from the Northeast, 39% from the South, 16% from the Midwest, and 10% from the West. In the teacher sample, 73% of students lived in the U.S. and 23% were from Canada. Of those living in the U.S., 43% were from the Northeast, 35% were from the South, 15% were from the Midwest, and 7% were from the West. Demographic characteristics of the combined non-clinical and clinical samples rated by parents and teachers are reported in Table 1. Only parents who were able to complete measures in English were included in the samples. As indicated, males were somewhat overrepresented in the parent sample (56.1%) but less so in the teacher sample (51.9%); 29% of the parent sample and 16% of the teacher sample reported diagnoses of ADHD made by a qualified professional, confirmed by record review. Race and ethnic group were not reported independently. Respondents were given only one option to report race, which could include multi-racial.

Table 1 Demographic characteristics of non-clinical and clinical sample (study 1)

In the clinical sample rated by parents, the mean age was 10.6 (SD = 1.7) and the mean grade level was 5.2 (SD = 1.8); 27% were female and 73% were male. In this sample 81% had ADHD alone, 6% had ADHD and one or more other diagnoses; and 13% had non-ADHD disorders. In the clinical sample rated by teachers, the mean age was 10.8 (SD = 1.7) and the mean grade level was 5.0 (SD = 2.4); 53.2% were female and 46.8% were male. In this sample, 65% had ADHD alone; 6% had ADHD and one or more other conditions; 26% had non-ADHD diagnoses; and 3% were unspecified.

Measures

COSS—Parent and Teacher versions: The COSS (Abikoff & Gallagher, 2009) includes parent and teacher versions, as well as a child-report version, and is used to assess OTMP functioning at school and home. The 66-item COSS-P and 42-item COSS-T were administered to informants in this study. In addition to items assessing OTMP skills, each version includes interference items (8 for COSS-P, 4 for COSS-T) that assess the extent to which difficulties with OTMP interfere with a child’s functioning. Items are rated using a 4-point Likert scale (1 = Hardly ever or never to 4 = Just about all of the time). The technical manual of the COSS (Abikoff & Gallagher, 2009) indicates that a series of factor analyses identified a three-factor solution consisting of 26 items on the COSS-P and 28 items on the COSS-T. The three factors are similar across the two versions, although item content varies. The factors include: Task Planning (i.e., ability to meet deadlines and outline steps to complete tasks); Organized Actions (i.e., competent use of aids, such as calendars and assignment records, to promote organization); and Memory/Materials Management (i.e., tracking assignments, recalling due dates and managing supplies).

Data Analytic Plan

The combined non-clinical and clinical samples of teachers and parents were randomly divided into three equal and independent subsamples using SPSS v. 25.0. Descriptive analyses were conducted across age, grade, and assigned sex to confirm equivalency across subsamples and identify non-normality in the data. Skewness and kurtosis values fell within acceptable limits of ± 2 and ± 7, respectively (Kim, 2013) for each item on the parent and teacher forms. Demographic data for the combined non-clinical and clinical samples is presented in Table 1.

Aim 1: Exploratory Factor Analysis (EFA). Using one of the subsamples from the combined non-clinical and clinical COSS-T and COSS-P samples (n = 385, n = 383, respectively), initial EFAs were conducted in SPSS v. 25.0 and MPlus (Muthén & Muthén, 2017) v 8.0. Initial EFAs were conducted with all of the COSS items assessing OTMP skills. Principal axis factoring with promax rotation in SPSS was selected instead of principal components analysis to better identify latent structures (Briggs & McCallum, 2003). Minimum average parcels (Velicer, 1976), parallel analysis (Horn, 1995), scree plots, and eigenvalues greater than 1 were used to determine the number of factors for rotation. To explore bifactor solutions, additional EFAs were conducted in MPlus using oblique rotation (BI-GEOMIN estimation method; Muthén & Muthén, 2017) and weighted least squares mean and variance adjusted (Beauducel & Herzberg, 2006). Factors comprised of three or more items and with Cronbach’s alpha greater than 0.7 were retained, and item loadings were interpreted if pattern coefficients were greater than 0.40.

Because a goal of this study was to examine a shorter form of the COSS-P and COSS-T, item reduction techniques were implemented. Items were removed from analyses based on (a) low pattern coefficients, (b) low communalities, (c) poor reliability, and (d) theory and prior research. Findings from these preliminary analyses essentially mirrored those described in the technical manual (Abikoff & Gallagher, 2009). Therefore, additional exploratory factor analyses of the COSS-P were conducted with the 26 items from the COSS-P identified in initial factor analytic work (Abikoff & Gallagher, 2009) and confirmed in a more recent study by Molitor and colleagues (Molitor et al., 2017). Additional exploratory factor analyses of the COSS-T followed a similar plan by examining the 28 items from the COSS-T identified in initial factor analytic work (Abikoff & Gallagher, 2009).

Aim 2: Confirmatory Factor Analysis (CFA). CFAs were conducted in MPlus (Muthén & Muthén, 2017) with a second independent sample derived from the combined total sample of parents (n = 385) and teachers (n = 380) to test models identified in the exploratory factor analytic stage using 26 items from the COSS-P and 28 items from the COSS-T. Model fit was determined a priori based on established criteria: RMSEA < 0.08; CFI and TLI > 0.90 (Chen et al., 2008; Hu & Bentler, 1999; Kline, 2010). The analyses used weighted least square mean and variance adjusted estimation method because of its utility with ordinal data (Beauducel & Herzberg, 2006; Bowen & Masa, 2015). Design-driven correlated residuals were allowed for items with highly similar wording (Cole et al., 2007). One item on each factor was constrained to the value of one; other parameters were allowed to freely estimate. In addition, omega and omega hierarchical coefficients were computed to determine the internal reliability of factors in a multi-dimensional model (Watkins, 2017). An omega coefficient > 0.70 generally indicates an acceptable level of factor reliability, although this statistic fails to differentiate the precision of the total factor score from that of the subfactors. Omega hierarchical coefficients are highly useful in distinguishing reliable variance of the total factor from that of the subfactors; Omega hierarchical coefficients greater than 0.50 indicate an acceptable degree of reliability, and values greater than 0.75 are preferred (Riese, 2012; Watkins, 2017). Relatively low omega hierarchical values (< 0.50) for subfactors compared to the general factor preclude the meaningful interpretation of the subfactors as separate, unambiguous indicators of functioning (Watkins, 2017). Although both omega and omega hierarchical coefficients are reported in this study, greater emphasis is placed on interpreting omega hierarchical values.

Aim 3: Discriminant Validity and Subgroup Analysis. The third participant subsample (COSS-T n = 380; COSS-P n = 385) derived from the total non-clinical and clinical sample was used to investigate the relationship between COSS scores and the child’s diagnostic status. Receiver operating characteristic curve (ROC) analyses evaluated the ability of the total score of the COSS to accurately discriminate between children in the clinical sample who met criteria for ADHD (n = 110 for parent sample; n = 61 for teacher sample) and those in the comparison group, which included primarily children in the non-clinical sample (n = 272 for parents; n = 313 for teachers) as well as a limited number of children in the clinical sample who did not have ADHD. The area under the curve (AUC) is commonly used to determine how accurately a measure can discriminate between a clinical case and normal case. The coordinates of the curve are generated as a function of the measure’s sensitivity and 1-specificity.

Additionally, scores on salient factors of the COSS were evaluated using ANOVA to determine if there were differences on these factors as a function of child sex and grade level (grades 2–5 vs. grades 6–8). Grade level was examined categorically to determine potential differences between students at the elementary vs. middle school level. Analyses were conducted initially for children in the non-clinical samples (n = 726 in the parent sample; n = 779 in the teacher sample) to estimate likely values in the general population. The analyses were repeated for children in the clinical samples (n = 399 in parent sample; n = 360 in teacher sample). Finally, to evaluate the relationship between parent and teacher ratings of OTMP skills, bivariate correlations between COSS-P and COSS-T scores on the general factor in both the non-clinical and clinical samples were calculated.

Results

Aim 1: Exploration of the COSS Factor Structure

COSS-P: The results of Bartlett’s Test of Sphericity (Bartlett, 1951) indicated the correlation matrix was not random (p < 0.001) and the Kaiser–Meyer–Olkin statistic was 0.98, well above the minimum needed for factor analysis. Therefore, the factor matrix was deemed appropriate for factor analysis. Exploratory analyses suggested that a 4- or 3-factor solution was superior to solutions extracting more factors. However, factor loadings indicated that items loaded on multiple factors, suggesting the presence of a general factor. As such, a bifactor solution was examined using MPlus, which allowed items to load on a general factor, incorporating all of the shared variance of the construct, as well as on subfactors.

Examination of bifactor solutions yielded two potentially acceptable models with a general factor and three or four subfactors. Subsequent analyses revealed that the solution with four subfactors was not acceptable because the alpha coefficient for the fourth factor was 0.43, well below the recommended minimal level of 0.70 (Brown, 2002). As such, the bifactor model with a general factor and three subfactors appeared to be an acceptable solution, and this model was examined in relation to a standard three-factor model and one-factor model in subsequent confirmatory analyses. (See Supplementary Table 1 for the results of the EFA of COSS-P items using a bi-factor model with three subfactors.)

COSS-T: The results of Bartett’s Test of Sphericity (Bartlett, 1951) indicated the correlation matrix was not random (p < 0.001) and the Kaiser–Meyer–Olkin statistic was 0.96, well above the minimum needed for factor analysis. Therefore, the factor matrix was deemed appropriate for factor analysis. As with the parent form, items demonstrated considerable cross-loadings on multiple factors, indicating a large amount of shared variance across factors and suggesting that a bifactor solution may provide a superior fit. To examine the viability of a bifactor solution, exploratory factor analysis using the BI-GEOMIN estimation method (Muthén & Muthén, 2017) indicated that a general factor with three subfactors may exhibit good model fit and should be tested at the CFA stage. As with the COSS-P, the alternative models tested at the CFA stage included a standard three-factor and one-factor solution in addition to a bifactor model with a general factor and three subfactors. (See Supplementary Table 2 for the results of the EFA of COSS-T items using a bi-factor model with three subfactors.)

Aim 2: Confirmation of Best-Fitting Model

Using a second subsample of the combined non-clinical and clinical sample for parent and teachers (n = 385 and 380, respectively), model testing for the parent and teacher forms was conducted. For each form, a bifactor model with three subfactors, a one-factor, and a three-factor model were tested. Items proposed for each factor were based on items identified by the test developers (Abikoff & Gallagher, 2009) and investigated by Molitor et al. (2017). Results are presented in Table 2.

Table 2 Fit statistics for confirmatory factor analyses of the COSS-P and COSS-T

COSS-P: Findings revealed that, for the COSS-P, the superior solution was the bifactor model with three subfactors (RMSEA = 0.057 [0.051-0.061]; CFI = 0.978; TLI = 0.974). The explained common variance (Rodriguez et al., 2016) for the general factor was 0.72, indicating the general factor explained over 70% of the total variance of the measure. Explained common variances were 0.11, 0.13, and 0.04 for subfactors 1–3, respectively. Omega coefficients for the general factor and subfactors all exceeded 0.85. The omega hierarchical coefficient was high for the general factor (0.87) and acceptable for subfactor 1 (0.57), but low for subfactors 2 and 3 (0.27, 05, respectively), indicating the weakness of these latter two factors. In addition, the pattern coefficients on subfactor 3 generally were low, confirming the weakness of this subfactor. The first subfactor included items related to Task Planning; the second subfactor included items related to Organized Actions; and the third subfactor was comprised of items related to Memory/Materials Management. See Table 3 for pattern coefficients for the bifactor model.

Table 3 Pattern coefficients for the COSS-P bifactor model derived from confirmatory factor analysis in study 1

COSS-T: The best-fitting model was the bifactor solution with three subfactors, which met criteria for acceptability with an RMSEA = 0.052 [0.047–0.058], CFI = 0.988, and TLI = 0.991. The explained common variance for the general factor was 0.77, and explained common variances for subfactors 1–3 were 0.11, 0.02, and 0.10, respectively. Omega coefficients for the general factor and subfactors all exceeded 0.85. The omega hierarchical coefficient was high for the general factor (0.90) and subfactor 1 (0.78) but low for subfactors 2 and 3 (0.05, 0.19, respectively), indicating the weakness of these latter two factors. In addition, pattern coefficients on subfactor 2 were generally low, confirming the weakness of this factor. The subfactors of the COSS-T also included items pertaining to Task Planning, Organized Actions, and Memory/Materials Management, respectively. See Table 4 for pattern coefficients for the bifactor model.

Table 4 Pattern coefficients for the COSS-T bifactor model derived from confirmatory factor analysis in study 1

Aim 3: Ability of COSS to Differentiate Children with ADHD from Comparison Group

Using the third subsample derived from the combined non-clinical and clinical sample, ROC analyses evaluated the ability of the total score of the COSS to discriminate between children with ADHD and those in the comparison group. The AUC values for the COSS-P and COSS-T were significantly different from the null AUC (0.50) for all of the comparisons. Both the COSS-P and COSS-T Total Scores demonstrated acceptable ability to differentiate the ADHD and comparison groups, indicating a strong association between level of OTMP difficulties and ADHD diagnostic status (COSS-P AUC = 0.84 [CI = 0.80–0.89]; COSS-T AUC = 0.85 [CI = 0.80–0.90]. ROC analyses were conducted again with the total parent and teacher samples to increase sample size, and AUC values were essentially the same.

Subgroup Differences as a Function of Child Sex and Age Group

Scores on the general factor and the subfactors were evaluated using ANOVA to determine if there were differences on these factors in the non-clinical and clinical samples as a function of child sex and grade level (grades 2–5 vs. grades 6–8). Results for the general factor and Task Planning subfactor are reported below. Given that the Organized Actions and Memory/Materials Management subfactors were weak factors, these are reported in Supplemental Material 3.

COSS-P: In the non-clinical sample there were differences on the general factor as a function of sex (F[1, 722] = 61.85, p < 0.001, partial eta2 = 0.079) and grade level (F[1, 722] = 13.30, p < 0.001, partial eta2 = 0.018), but not the interaction of sex and grade level. The magnitude of differences was medium for sex (females M = 1.84 [SD = 0.52], males M = 2.16 [SD = 0.54]) and small for grade level (Grades 2–5 M = 2.07 [SD = 0.52]; Grades 6–8 M = 1.93 [SD = 0.52]). In the clinical sample, there was a difference on the general factor of the COSS-P as a function of sex (p < 0.001), but not grade level or the interaction of sex and grade level. On the Task Planning subfactor of the COSS-P, there were differences as a function of sex (F[1, 722] = 26.37, p < 0.001, partial eta2 = 0.035) and grade level (F[1, 722] = 8.90, p < 0.01, partial eta2 = 0.012), but not the interaction of sex and grade level. The magnitude of the difference was small for sex (females M = 1.61 [SD = 0.64], males M = 1.8 [SD = 0.67]) and small for grade level (Grades 2–5 M = 1.81 [SD = 0.64], Grades 6–8 M = 1.66 [SD = 0.64]). In the clinical sample, there were no differences on Task Planning as a function of sex, grade level, or the interaction of these variables.

COSS-T: In the non-clinical sample there were differences on the general factor as a function of sex (F[1, 784] = 102.80, p < 0.001, partial eta2 = 0.116), but not grade level or the interaction of sex and grade level. The magnitude of differences was medium to large for sex (females M = 1.68 [SD = 0.53], males M = 2.07 [SD = 0.54]). In the clinical sample, there was a difference on the general factor of the COSS-T as a function of sex (p < 0.001), but not grade level or the interaction of sex and grade level. On the Task Planning subfactor of the COSS-T, there were differences as a function of sex (F[1, 784] = 30.70, p < 0.001, partial eta2 = 0.038), but not grade level or the interaction of sex and grade level. The magnitude of the difference was small for sex (females M = 1.54 [SD = 0.63], males M = 1.78 [SD = 0.62]). In the clinical sample, there were no differences on Task Planning as a function of sex, grade level, or the interaction of these variables.

Cross-Informant Correlations on the COSS

The total sample included 396 students who were rated by their parents and teachers; 263 in the non-clinical sample and 133 in the clinical sample. In the clinical sample, 116 (87.2%) students had a diagnosis of ADHD. Cross-informant correlations for scores on the general factor were 0.59 for students in the non-clinical sample and 0.51 for those in the clinical sample.

Study 2

Method

Study 2 had two aims. Aim 1 sought to confirm whether the best-fitting factor structure identified in the general population sample in Study 1 was a good fit in a referred sample of students with organizational difficulties contributing to academic impairment and Aim 2 examined the convergent validity of the COSS in relation to other measures of homework and academic performance in the referred sample. This study was conducted in the context of a preregistered randomized controlled trial; see https://clinicaltrials.gov/study/NCT03443323.

Participants

The sample for this study was comprised of children in grades 3 to 5 referred by teachers for organization, time management, and planning (OTMP) challenges as part of a cluster randomized controlled trial (RCT) to examine the effectiveness of the Organizational Skills Training program (Abikoff et al., 2013) adapted for schools, which was implemented by school professionals. Students attended 22 schools in the metropolitan area of a large city in the northeast region of the U.S. Children were nominated for study participation by teachers if their OTMP deficits interfered with their academic performance based on endorsement of at least one of four interference items on the COSS-T. Only caregivers able to complete measures in English were included in this study. Once parents/guardians provided informed consent, and the child provided assent, parents and teachers were asked to complete the appropriate version of the COSS for baseline assessment of the child’s skills; baseline measures from the RCT were used in the present study. Only one parent in a family rated the child. Students were rated by 76, 3rd to 5th grade teachers across 22 schools between 2017 and 2022. On average, teachers each rated 2.4 students (SD = 1.5). Enrolled participants were 184 children (65.8% boys) with a mean grade level of 3.89 (SD = 0.76) and a mean age of 9.66 (SD = 0.88). Additional demographic characteristics are presented in Table 5. Respondents were asked to report race and ethnicity separately and were given the option to report more than one race.

Table 5 Demographic characteristics of referred sample (study 2)

Measures

Demographics: Caregivers completed questionnaires to assess demographic characteristics of the family and child including child assigned sex, child race and ethnicity, child grade, and caregiver level of education. To understand family level of advantage, a geo-mapping tool comprised of multiple indicators (e.g., education, employment, housing quality) was utilized (Area Deprivation Index [ADI]; Kind & Buckingham, 2018). Scores on the ADI have been converted to percentile ranks with higher values indicating a greater level of resource deprivation.

COSS: The parent and teacher versions of the Children’s Organizational Skills Scale (COSS; Abikoff & Gallagher, 2009) were used in this study to assess OTMP functioning at school and home. Items from the COSS-P and COSS-T for the general factor and subfactors identified in Study 1 were used in analyses.

Homework Problem Checklist: (HPC; Anesko et al., 1987). The HPC is a 20-item parent-report measure that assesses student homework performance. Parents are asked to indicate the frequency of occurrence of a range of homework behaviors on a four-point scale (0 = never to 3 = very often). The psychometric properties of this instrument have been evaluated extensively and evidence of factorial and convergent validity have been demonstrated (Power et al., 2006). The HPC has been shown to be sensitive to treatment effects (e.g., Abikoff et al., 2013) and demonstrates high internal consistency in the current sample (alpha = 0.93). The mean item value for the total score was used in the analyses.

Homework Performance Questionnaire—Teacher Version: (HPQ-T; Power et al., 2007). The HPQ-T assesses students’ homework behavior during the past 4 weeks. Teachers are requested to rate the percentage of time each homework behavior occurs, or the percentage of work completed, on a five-point scale (0 = 0–39% to 4 = 90–100%). The seven-item Student Self-Regulation factor was used in the analyses. This factor has been supported through factor analysis and demonstrates high internal consistency in the current sample (alpha = 0.91). The mean item score on this factor was used in analyses.

Academic Proficiency Scale: (APS; Abikoff et al., 2013). The APS is a teacher-report measure designed to assess child proficiency in six academic subjects relative to standard expectations (1 = Well below standard expected at this time of year; 3 = At standard; 5 = Well above standard). The mean item score of ratings across six academic subjects is the unit of analysis. Previous research has demonstrated adequate internal consistency and evidence of sensitivity to the effects of organizational skills training (Abikoff et al., 2013). In the current sample, the alpha coefficient was 0.91.

Academic Competence Evaluation Scales: (ACES, DiPerna & Elliott, 2000). The ACES is a teacher-report scale that assesses the academic competence of students in kindergarten through grade 12. The Reading/Language Arts and Math subscales of this measure were used. Alpha coefficients and test–retest correlations for these subscales have been shown to be above 0.90. Scores on these subscales were combined to create a mean academic competence score. The internal consistency of this composite in the current sample was high (alpha = 0.93).

Data Analytic Plan

The bifactor model identified in Study 1 with a non-clinical and clinical sample was tested in a sample of children referred by teachers using the analytic strategy for CFA described for Study 1. Convergent validity of the COSS-T and COSS-P was examined via bivariate correlations among mean item scores for scales derived from the following measures: COSS-P general factor and Task Planning subfactor, COSS-T general factor and Task Planning subfactor, HPC, HPQ, APS, and ACES.

Results

Aim 1: Confirmation of Factor Structure

The bifactor solution with three subfactors was analyzed in the subsample of referred students with OTMP deficits. The COSS-P had strong model fit (RMSEA = 0.057 [0.051-0.063], CFI = 0.978, and TLI = 0.974). The COSS-T also yielded acceptable model fit (RMSEA = 0.069 [0.061-0.078], CFI = 0.928, and TLI = 0.915).

For the COSS-P, explained common variance for the general factor and subfactors 1 to 3 (i.e., Task Planning, Organized Actions, and Memory/Materials Management) were 0.51, 0.09, 0.33, and 0.08, respectively. Omega coefficients for the COSS-P all exceeded 0.85. The omega hierarchical value for the general factor was 0.72 (acceptable) and values for the subfactors 1 to 3 were 0.32 (low), 0.74 (acceptable), and 0.14 (low), respectively. For the COSS-T, explained common variance for the general factor and subfactors 1 to 3 (i.e., Task Planning, Organized Actions, and Memory/Materials Management) were 0.66, 0.12, 0.17, and 0.05, respectively. Omega coefficients for the COSS-T all exceeded 0.85, and omega hierarchical values for the general factor was 0.85 (high) and for subfactors 1 to 3 were 0.50 (acceptable), 0.32 (low), and 0.14 (low), respectively.

Aim 2: Convergent Validity

Because analyses in Studies 1 and 2 generally indicated that the COSS general factor and Task Planning subfactor were strong or acceptable factors, these factors were included in analyses examining intercorrelations among measures. The COSS-P general factor and Task Planning subfactor were significantly correlated (r = 0.80, p < 0.001), and the COSS-T general factor and Task Planning subfactor were significantly correlated (r = 0.64, p < 0.001). Cross-informant correlations for the general factor and Task Planning subfactor between the COSS-T and COSS-P were minimal and not significant (0.07, -0.01, respectively). Correlations between the COSS scores and scores on other measures demonstrated a pattern of higher correlations for within-informant compared to cross-informant associations. For example, scores on the COSS-P general factor and Task Planning subfactor were significantly correlated (p < 0.001) with scores on the parent-rated HPC (0.75, 0.71, respectively). Similarly, correlations between the COSS-T general factor and Task Planning subfactor and scores on other teacher-rated measures were all significant (range = −0.17 to -0.52, p < 0.001). In contrast, cross-informant correlations between the COSS-P general factor and teacher-rated measures were negligible to low (range = −0.13 to 0.16), as was the correlation between the parent-rated HPC and COSS-T general factor and Task Planning subfactor (0.07, 0.03, respectively). It was noted that the correlation between the COSS-P Task Planning subfactor and the teacher-rated ACES (r = -0.23, p < 0.01) and APS (r = -0.23, p < 0.01) were significantly correlated. For the complete correlation matrix, see Table 6.

Table 6 Convergent validity correlation table (study 2)

Discussion

The studies conducted in this investigation were designed to examine the factor structure and validity of the COSS, which has been commonly used to identify children with OTMP difficulties and evaluate the effectiveness of interventions to improve OTMP skills. The findings from Study 1 indicated that models with an acceptable level of fit were identified in a large sample of students derived from a non-clinical and clinical population of students in grades 2 through 8 for both the COSS-P and COSS-T. The best-fitting model for each measure was a bifactor solution with a general factor and three subfactors (Task Planning, Organized Actions, and Memory/Materials Management). The level of explained common variance for the general factor was relatively high for the COSS-P and COSS-T, indicating that the general factor was robust. In contrast, explained common variances for the subfactors were relatively low, indicating that each subfactor accounted for a low proportion of systematic variance not explained by the general factor. In addition, the omega hierarchical coefficient (i.e., internal reliability specific to each factor in the context of a bifactor model) for the general factor was high and acceptable for subfactor 1 (Task Planning), but these coefficients were low for subfactor 2 (Organized Actions) and subfactor 3 (Memory/Materials Management). Although the findings of factor analyses identified subfactors similar to the three-dimensional model described in the COSS manual (Abikoff & Gallagher, 2009), there are questions about the psychometric integrity of subfactors 2 and 3. As such, Study 1 provided strong evidence of the factorial integrity of the general factor consisting of 26 items for the COSS-P and 28 items for the COSS-T. In addition, this study provided evidence to support the psychometric value of subfactor 1 (Task Planning).

The results of Study 2 confirmed that the bifactor model identified in the non-clinical sample had acceptable fit in a sample of students in grades 3 to 5 referred for OTMP deficits. There was evidence of a robust general factor on the COSS-P and COSS-T. The findings from Study 2 suggested that subfactor 1 of the COSS-T (Task Planning) and subfactor 2 of the COSS-P (Organized Actions) may provide a relatively precise estimation of these subfactors among elementary students referred for OTMP deficits.

Overall, the findings of both Studies 1 and 2 corroborate the results of Molitor et al. (2017) who demonstrated the viability of a highly similar model (bifactor model with three subfactors) with middle school students identified as having ADHD using the COSS-P. Further, the findings of Studies 1 and 2 extend the research of Molitor and colleagues by demonstrating the goodness of fit of this bifactor solution with the COSS-T as well as the COSS-P. In addition, the results extend the research of Molitor and colleagues by demonstrating that this bifactor solution is applicable to elementary school students referred for OTMP deficits (Study 2) as well as students in non-clinical and clinical samples in elementary and middle school (Study 1).

The emergence of a highly robust general factor is consistent with the conceptualization of OTMP skills as the behavioral manifestation of an integrated set of executive functions (Pennington & Ozonoff, 1996). Further, the presence of a strong general factor aligns with clinical observations that student completion of independent activities, such as homework assignments, includes the integration of numerous OTMP skills, such as recording assignments, managing materials to complete assignments, designing a work completion plan organizing the approach to work, initiating and sustaining execution of the plan, completing work, and organizing materials to submit the final product (Langberg et al., 2018).

Further support for the validity of the general factor was revealed by ROC analyses examining the ability of scores on the general factor to discriminate children with ADHD, a disorder commonly associated with executive functioning deficits (Rinsky & Hinshaw, 2011), from the comparison group. The general factor of both the COSS-P and COSS-T demonstrated AUC values > 0.80, suggesting these measures were able to discriminate children with ADHD from those in a non-clinical sample with a relatively high degree of accuracy (Youngstrom, 2014).

Two issues may have accounted for variable cross-informant correlations on the COSS. First, it is possible that degree of communication between parents and teachers at the time measures are completed may contribute to variations in parent-teacher agreement (de Nijs et al., 2004). It would be expected that greater collaboration between parents and teachers would be related to higher correlations between parent- and teacher-rated forms. In Study 2 referrals were generated solely by teachers and there was no requirement that parents agree the child had OTMP skills deficits. Thus, even though teachers viewed the students’ OTMP skills as problematic, teachers and parents did not necessarily communicate with one another before completing COSS forms, decreasing the likelihood of agreement between teachers’ and parents’ ratings. In contrast, for the clinical subgroup in Study 1, it is presumed there was some collaboration between parents and teachers in the assessment process, thereby improving the level of agreement between these informants. For the general sample of children included in Study 1, greater parent-teacher agreement might be expected because the sample included a high percentage of children who likely had few, if any, OTMP problems at school or home, leading to higher parent-teacher correlations.

Results from Study 2 provided evidence that the COSS assesses dimensions of functioning related to homework and academic performance, with a notable degree of shared informant variance. Within-informant correlations were markedly higher than cross-informant correlations, a pattern consistent with findings in child and adolescent multi-informant studies (De Los Reyes et al., 2015). Parents’ total score and Task Planning score were significantly and strongly correlated with the score on the parent-rated HPC, reflecting the association among child behaviors observed at home. Teacher-rated total scores and Task Planning scores also had significant correlations (low to moderate range) with teacher-rated academic competence, again behaviors observed in the same setting.

In contrast, the correlations between COSS-P and teacher ratings of homework and academic functioning were very low. Similarly, although scores on the COSS-T had low to moderate correlations with teacher-rated measures of homework, they had a negligible and non-significant correlation with the parent-rated HPC. However, there were two cross-informant correlations that were higher. Correlations of -0.23 were identified between parent ratings of Task Planning and both teacher-rated measures of academic competence, suggesting that some OTMP behaviors observed by parents were associated with teacher views of student academic functioning. Overall, the correlation analyses affirmed the value of the total score and Task Planning score of both versions of the COSS and identified a pattern of meaningful within-informant correlations with measures of homework and academic performance.

Implications for Research and Clinical Practice

The findings of this study have numerous implications for research and clinical practice involving the assessment of behavioral manifestations of executive functioning deficits (i.e., OTMP deficits). First, because the findings indicated that each measure has a robust, internally reliable general factor, scores on the total score have substantial value as an overall index of OTMP deficits (norms available upon request). This factor should be given priority in screening, psychoeducational assessment, and outcome evaluation. Second, given evidence of the psychometric integrity of subfactor 1 (Task Planning), especially when using the COSS-T, this subscale would appear to be suitable for screening, psychoeducational assessment, and intervention outcome assessment. Third, given concerns about the factorial integrity of subfactors 2 and 3 (Organized Actions, Memory/Materials Management, respectively), it is questionable whether these subscales should be used for screening or outcome evaluation. There may be clinical value in analyzing the pattern of OTMP deficits for a particular child by examining scores on subfactors 2 and 3, but this practice should be used with caution and clinical decisions should not be based primarily on subfactor scores. Fourth, the substantial variations in cross-informant correlations on the total score of the COSS-P and COSS-T found across samples indicate that research is needed to identify variables contributing to these differences and that a comprehensive assessment of OTMP deficits requires the use of both versions of the COSS. Fifth, because sex differences were identified on the COSS-P and COSS-T in the general sample (Study 1), which were also reported by Abikoff and Gallagher (2009), it is essential to use sex-based norms and account for sex in statistical models when using these measures. The finding of sex differences in OTMP skills deficits is consistent with widely reported sex differences (male > female) in the prevalence of ADHD in general and clinically-referred samples (Barkley, 2014). In future research it may be useful to explore whether a briefer version of the COSS-P and COSS-T can be identified and validated. The availability of brief measures would improve the efficiency of screening for OTMP deficits and monitoring student progress in resolving OTMP deficits over time.

Limitations

The findings of this investigation need to be considered in the context of the following limitations. First, although a large portion of the sample used in Study 1 was derived from a general population with an attempt to reflect census tract distributions from 2000, the Study 1 sample deviated from the general US population in several ways. The Study 1 sample consisted of children from both the U.S. (81%) and Canada; those from the Midwest and Western regions of the U.S. were underrepresented; and female children were underrepresented (44% for COSS-P; 48% for COSS-T). In addition, although the racial composition of the sample was diverse, Black children were underrepresented (10.4%). Further, the sample included a relatively high percentage of children with a diagnosis of ADHD (29% for COSS-P; 16% for COSS-T). Second, the teacher-referred sample included in Study 2 was drawn from a large metropolitan area in the Northeast part of the U.S. Although the sample was derived from schools that were intentionally recruited to reflect the racial/ethnic diversity of the region, children of Asian descent and those of American Indian or Native Alaskan descent were underrepresented. Third, data were not collected about the gender identity of child participants and only parents able to complete measures in English were included in the studies. These limitations should be addressed in future research.

Conclusion

These studies contribute to the research literature by demonstrating the structural integrity of a general factor for both teacher and parent versions of the COSS in a non-clinical and clinical sample of children in grades 2 through 8 derived mostly from the U.S. In addition, this investigation demonstrated the integrity of a general factor for children in grades 3 through 5 referred by teachers for OTMP difficulties. The results indicated that the COSS-P and COSS-T assess multiple dimensions of OTMP deficits, but the internal reliability of specific subscales in relation to the total scale score was generally weak with the exception of Task Planning. As a result, evidence to support the scoring and interpretation of the Organized Actions and Memory/Materials Management subscales generally was lacking. Correlations between parent and teacher ratings on corresponding factors of the COSS varied substantially across samples, highlighting the need for research to identify variables contributing to cross-informant differences on OTMP skills deficits and other clinical constructs and indicating that a comprehensive assessment of student OTMP deficits requires the use of both the COSS-P and COSS-T. Further, the findings corroborated an earlier report (Abikoff & Gallagher, 2009) of meaningful sex differences (males > females) for the total score on both versions of the COSS, indicating that the use of sex-based norms is required. Research examining the validity of briefer measures of OTMP skills deficits, including a short form of the COSS, could prove useful for screening and monitoring progress.