Introduction

Evidence from neurobiological and behavioral research demonstrates a close interplay among sensory, motor, cognitive, and emotional functioning in children (Davis et al., 2009; Diamond, 2000; Hadders-Algra, 2016; Mancini et al., 2018). For example, deficits in sensorimotor functioning are a core feature across different developmental conditions (Fournier et al., 2010; Piek & Dyck, 2004; Shum & Pang, 2009) and have been linked to attentional difficulties (Hazen et al., 2014; Konicarova et al., 2014), lower cognitive abilities (Davis et al., 2009; Piek & Dyck, 2004), poorer academic performance (Davis et al., 2009; Macdonald et al., 2018), and atypical brain functional connectivity (Kim et al., 2017; McLeod et al., 2014). A large body of research shows that functional outcomes and brain connectivity can be positively altered by training and practice over time (Bediou et al., 2018; Klingberg et al., 2005; Oei & Patterson, 2013; Posner et al., 2015; Qian et al., 2018; Sánchez-Pérez et al., 2019; Tang et al., 2007; Thorell et al., 2009) — including by sensorimotor exercises, computer-based training, and repetition of specific tasks — as well as by a variety of behavioral therapies, and lifestyle interventions related to nutrition and physical activity (Alderman et al., 2017; Christiansen et al., 2019; Esteban-Cornejo et al., 2021; Jirout et al., 2019). Given these collective findings, training programs may play a critical role in supporting development and improving functional outcomes in children with developmental issues, especially programs that comprehensively target and integrate multiple interrelated areas of development.

One such multimodal training program (Brain Balance® program) aims to address developmental issues through a nonpharmacologic approach that involves regular frequency and duration of integrative activities, including sensory engagement, motor skills development, cognitive exercises, nutritional guidance, and academic training, along with complementary home-based exercises. In recent studies, three months of participation in the Brain Balance program was shown to improve cognitive performance and mental well-being in children and adolescents (aged 4–17 years) who tested below age-appropriate levels for developmental and cognitive functioning prior to program participation (Jackson & Robertson, 2020; Jackson & Wild, 2021). Outcomes of participants in this program can be measured using existing instruments that have been shown to be valid and reliable for use in screening and evaluating children. However, many of these instruments separately focus on measuring individual domains of developmental functioning, such as motor skills (Croce et al., 2001; Di Fabio & Foudriat, 1996), behavior (Lau et al., 2021; Perry et al., 2021), academic performance (January & Ardoin, 2015; Thomas & January, 2021), or social functioning (Fink et al., 2013; Matson et al., 2013). The concept of comprehensive developmental screening — a thorough process allowing providers to collaborate with parents to monitor all domains of a young child’s development — is not new and has long been practiced in pediatrics (“Developmental Surveillance and Screening of Infants and Young Children,” 2001), albeit often using a combination of various screening tools (Glascoe, 2003; Robins et al., 2014; Squires et al., 1995) or tools that are designed for use with narrow age ranges (infants and toddlers) (Gollenberg et al., 2010; Robins et al., 2014; Squires et al., 1995). There are few validated instruments that comprehensively cover multiple developmental domains and in a range of ages, especially for the purposes of monitoring and assessing functional outcomes before and after developmental interventions, behavioral therapies, or training programs. Even for children receiving a single-modality intervention (for example, a motor skills intervention alone), ongoing monitoring and assessment of outcomes in multiple developmental domains may be warranted because of the interrelated nature of development (Davis et al., 2009; Diamond, 2000; Hadders-Algra, 2016; Mancini et al., 2018). The feasibility of developmental monitoring could be facilitated through the use of a single comprehensive measurement tool.

The purpose of this study is to improve and validate a multidomain developmental survey already in use within the Brain Balance program. The original survey was designed by an expert review panel at Brain Balance and consisted of 98 questions in the domains of motor skills, attention, emotional functioning, behavior, socialization, reading, and academic performance. The selection of questions from these domains was based on: 1) past observations of pre-enrollment Brain Balance paperwork indicating that these particular domains are the most commonly cited by parents as concerns; and 2) a large body of previous work showing that achievement in these domains is associated with positive developmental outcomes in school-aged children and adolescents (Adolph & Hoch, 2020; Ash et al., 2017; Cooper et al., 2014; Graziano et al., 2007; Liew et al., 2018; Macdonald et al., 2018; McClelland et al., 2000). As a routine part of the Brain Balance program’s assessment process, the survey is administered to parents of all students (ages 4–18 years) who enroll in the program in order to obtain parent-reported measures of children’s functioning before and after program participation. Recent efforts toward large-scale data collection from these parental surveys have now yielded sizable samples of children and adolescents who participated in the program from 2017 to 2020. Using these data, the present study shows a refinement of the Brain Balance multidomain developmental survey (BB-MDS) through exploratory models, validation of the refined survey’s factor structure, and demonstration of the survey’s measurement invariance across age and gender.

Methods

Participants and inclusion criteria

The BB-MDS was administered to parents of 47,571 children and adolescents (68.5% male; age range = 4–18 years) who participated in the Brain Balance program between 2017 and 2020. Participants were drawn from 115 Brain Balance center locations across various regions of the United States. Prior to enrolling in the program, prospective participants were assessed at Brain Balance centers by trained technicians. Participants who were eligible for enrollment in the Brain Balance program were within the 4- to 18-year-old age group, did not have any known genetic disorders, and needed to demonstrate a developmental readiness for the program, including the capacity to engage with instructors and follow one-step directions, attempt the tasks requested, and continue working throughout the assessment’s duration. Participants must also have tested below age-appropriate levels, as assessed by widely used, validated functional tests in the following categories: fine motor skills as assessed by the Purdue Peg Board (Squillace et al., 2015); body coordination, timing, and strength as assessed by the Presidential Fitness Test (George et al., 2011); interaural asymmetry as assessed by the dichotic listening test (Westerhausen & Kompus, 2018); and visual reading fluency as assessed by the Visagraph Reading Plus® tool (JBO—Volume 17—Issue 6—Silent Word Reading Fluency & Temporal Vision Processing Differences Between Good and Poor Readers | OEP Foundation, n.d.); as well as proprioception, balance, and vestibular function; auditory and visual processing; and eye coordination and movements. Children who met the inclusion criteria were then enrolled for participation in the Brain Balance program, which has previously been described in detail (Jackson & Robertson, 2020; Jackson & Wild, 2021).

Measure

The BB-MDS was administered to parents of all participants prior to the initial assessment as well as part of the post-program assessment. The original survey was designed by an expert review panel at Brain Balance using standard survey design methods (Jones et al., 2013) and consisted of 205 questions spanning a range of commonly expressed parental concerns in the areas of motor skills, attention, emotional functioning, behavior, socialization, reading, sensory processing, academic performance, and medical issues. To minimize survey length and redundancy in questions, the survey was shortened to 115 questions. Survey items were omitted if they were redundant (i.e., multiple items asking similar questions) and if they were outside of the scope of the developmental domains surveyed (i.e., medical concerns such as allergies and ear infections.) This resulted in a final pool of 98 items. The survey was originally administered in a paper format, then transitioned to a computer-based format and used numeric rating scales based on those used in previous studies of teacher-reported behavioral ratings and self-reported pain intensity ratings (Farrar et al., 2010; Volpe et al., 2011). Parents were instructed to rate each statement in the survey on a numeric scale of 0 to 10 (0 = not observed/does not apply; 10 = frequently observed).

Data analysis

Data screening and preparation

Participant data was included if ≥ 80% of the BB-MDS was completed. Variables were screened for large correlations that may indicate redundancy in item content (here, a threshold of Pearson r ≥ 0.65 was used, as it suggests that around 40% of the variance between items is shared). A zero-order (bivariate) correlation matrix showed several item pairs met this threshold. Items that A) had lower face validity than the other item or B) had multiple large correlations with other items were subsequently dropped. The dataset was then randomly partitioned into a training (75%) and validation (25%) sample, stratified by year, gender, and grade. This allowed us to refine the BB-MDS on a training sample through exploratory models and examine the BB-MDS’s goodness-of-fit on unseen data.

Scale refinement

Exploratory Factor Analysis (EFA) was used on the training sample to identify latent factors. All EFAs were conducted in Mplus version 8.1.5 (Muthén & Muthén, 2017) with Robust Maximum Likelihood (MLR), which is robust to data that are non-normally distributed. Additionally, MLR tolerates data that are missing at random through Full Information Maximum Likelihood (FIML), a gold standard approach to missing data estimation (Enders, 2010). CF-Equamax rotation was used, as it aims to simplify both variable and factor complexity (Browne, 2001). The oblique version of Equamax was used because factors were presumed to be correlated with one another, which is common in social sciences research (Osborne et al., 2008). The number of factors was determined by a combination of a variant of the Sequential x2 Model Test (SMT) and the Hull method (Lorenzo-Seva et al., 2011). These two methods were used as they: 1) both work to find the simplest factor solution that emphasizes good model fit; and 2) work well in tandem with one another in retrieving the true number of latent factors (Auerswald & Moshagen, 2019).

Briefly, SMTs are a method in which EFAs with increasing numbers of factors are tested and the number of factors in which the first model x2 is non-significant (p > 0.05) is taken as the number of factors to retain. Because model x2 is sensitive to sample size (i.e., virtually all factor solutions will be statistically significant per model x2 tests in large samples), we elected to use the variant of this approach outlined by (Preacher et al., 2013). This variant utilizes the Root Mean Square Error of Approximation (RMSEA), a well-known index of goodness-of-fit, where values ≤ 0.05 indicate “good” fit (Hu & Bentler, 1999). Here, the suggested number of factors is taken as the first model in which the lower bound of the 90% confidence interval of the RMSEA is below 0.05 (Preacher et al., 2013).

The Hull method aims to find a factor solution that optimizes the balance between goodness-of-fit and model complexity (Lorenzo-Seva et al., 2011). It starts by performing Parallel Analysis (Horn, 1965), a Monte Carlo simulation procedure (here, with 1,000 simulations of the raw data), to identify the maximum possible number of factors; the total number of factor solutions tested is the maximum possible number of factors plus one in order to test the full range of possible solutions. For each candidate solution, goodness-of-fit and degrees of freedom (model complexity) are computed. Based on these parameters, a scree test (st) is calculated \(.\) The solution with the largest st is suggested as the optimal number of factors. Here, the RMSEA and the Comparative Fit Index (CFI; where values ≥ 0.95, indicate “good” model fit ((Hu & Bentler, 1999)) were used as goodness-of-fit indices. The Hull method was implemented via the CHull package (Wilderjans et al., 2013) in R (R Core Team (2020). — European Environment Agency, n.d.).

Once an optimal factor solution was identified, items were removed if their highest standardized factor loading was ≤ 0.50. This conservative cut-off was chosen to ensure high correlations among all items within a subscale and to shorten the length of the BB-MDS while retaining the items most representative of a latent factor. Items were also removed if they loaded onto more than one factor (again, conservatively defined here as a difference of |0.10| or more between factors). Once items were removed, the remaining items were subjected to the same process outlined above. This was done iteratively until all items loaded clearly onto a single factor.

Validation of factor structure

Once a factor solution was identified in the training sample, the validation sample was used to examine how well the factor solution fits previously unseen data. For this, an Exploratory Structural Equation Model (ESEM) with targeted rotation was used, where items were determined a priori to load onto specific factors. ESEM was used in lieu of Confirmatory Factor Analysis (CFA) because it allows items to have small loadings on non-target factors, while CFA traditionally constrains all cross-loadings to zero. Because of this, CFA often fails to support instruments that are otherwise well-established, provides biased and misleading estimates of model fit, and often results in counterproductive strategies to overcome its limitations (Marsh et al., 2014). Model fit for the ESEM was evaluated through the RMSEA and CFI, as well as the Standardized Root Mean Square Residual (SRMR; where values < 0.08 indicate “good” fit ((Hu & Bentler, 1999)). In addition to this, we examined whether factor loadings were similar between the training sample and validation sample (freely estimated through EFA) through Tucker’s congruence coefficient, where values ≥ 0.95 indicate factor loadings are equal between samples (Lorenzo-Seva & ten Berge, 2006). The congruence coefficient was tested through the “psych” package in R (Revelle, 2021).

Measurement invariance

Following validation of the factor structure through ESEM, we tested the measurement invariance of the BB-MDS, which assesses the psychometric equivalence of a construct across groups (Putnick & Bornstein, 2016). Invariance was tested across reported gender (male vs. female) and age range (pre-adolescent vs. adolescent, where pre-adolescent was defined as < 10 years old and adolescent was defined as ≥ 10 years old). Evaluating measurement invariance typically consists of assessing configural invariance (whether the factor structure fits well across groups), metric invariance (whether factor loadings are equivalent across groups), and scalar invariance (whether item intercepts are equivalent across groups) sequentially, using x2 difference tests. Each step tests whether model fit differs from the previous step. Because the x2 difference test is sensitive to negligible differences when sample size is large (Putnick & Bornstein, 2016), we focused on change in model fit instead. We evaluated measurement invariance through \(\Delta\) RMSEA, \(\Delta\) CFI, and \(\Delta\) SRMR, as these are commonly used (Putnick & Bornstein, 2016). We considered \(\Delta\) RMSEA of 0.015, \(\Delta\) CFI of -0.010, and \(\Delta\) SRMR of 0.015 to indicate measurement invariance (Chen, 2007). It is important to note that changes in model fit for measurement invariance is well-established for CFA, but not for ESEM (Marsh et al., 2013). Because of this, we also tested whether factor loadings (from freely-estimated EFAs) were equivalent across groups using Tucker’s congruence coefficient.

Results

Data analysis originally focused on BBC participants from 2017–2019. There were 49,364 participants who provided data; of these, 37,648 (76.2%) completed 80% or more of the BB-MDS. Of these participants, 28,254 (75%) were randomly allocated to the training sample and the remaining 9,394 (25%) were allocated to the validation sample. Data on an additional 9,923 participants (including participants in 2020) became available at a later point in time and served as a separate validation sample. Thus, data were analyzed on a total of 47,571 participants. BBC participants were mostly male (68.50%), and the average age was 9.64 (standard deviatio n = 3.29, range = 4 to 18). Across all samples, the lowest covariance coverage for a pair of items was 96%. According to Monte Carlo simulations, this amount of missing data can be handled appropriately through FIML (Enders & Bandalos, 2001).

Exploratory factor analysis

Prior to analysis, a zero-order correlation matrix identified 26 item pairs with large correlations; three items were removed for large correlations with several items, and eight items were removed for lower face validity. Following the iterative procedure outlined above with EFA, 56 items were discarded, which resulted in a 31-item scale. The SMT variant and Hull Method (using both RMSEA and CFI) agreed that a six-factor solution fit the data best (see Table 1 and Fig. 1 for detail). Each of the items clearly loaded onto one factor only (see Table 2 for a list of the items, their factor loadings, and the factor they were assigned to). The model exhibited good fit to the data (RMSEA = 0.041, CFI = 0.954, SRMR = 0.018). The six factors were labeled “Negative Emotionality,” “Reading/Writing Problems,” “Academic Disengagement,” “Hyperactive-Disruptive,” “Motor/Coordination Problems,” and “Social Communication Problems.”

Table 1 Goodness-of-fit Indices for Exploratory Factor Analyses (N = 28,254)
Fig. 1
figure 1

Hull method for root mean square error of approximation and comparative fit index. Red marker indicated suggested number of factors per the scree-test. Number of factors are shown for simplicity; degrees of freedom were used in scree-test calculations

Table 2 Results of Exploratory Factor Analysis with CF-Equamax Rotation on training sample (N = 28,254)

Validation of factor structure

The ESEM with targeted rotation was applied to both validation samples. Model fit was strong in the first validation sample (n = 9,394; RMSEA = 0.041, CFI = 0.955, SRMR = 0.018) and all items had loadings > 0.50 to their assigned factor. Identical strong model fit was observed in the second validation sample (n = 9,923, RMSEA = 0.040, CFI = 0.956, SRMR = 0.018), and all items had loadings > 0.50 to their assigned factor as well. Furthermore, Tucker’s coefficient of congruence was 1.00 across all factors between the training sample and both validation samples (via EFA with six factors), suggesting that the factor solution identified in the training sample is replicable across similar samples.

Because the factor structure demonstrated strong goodness-of-fit on both validation samples, we re-fit the ESEM with targeted rotation to the full sample. All items, their factor loadings, and the factor they were assigned to can be found in Table 3. The RMSEA on the full sample was 0.041, CFI = 0.954, and SRMR = 0.018. Factors had small to large associations with one another: specifically, negative emotionality was moderately associated with the academic disengagement, hyperactive-disruptive, and social communication problems subscales; reading/writing problems and academic disengagement was moderately associated; hyperactive-disruptive and social communication problems were moderately associated, and there was a large association between motor/coordination problems and social communication problems (see Table 4 for the full correlation matrix). McDonald’s Omega (\(\omega\)) (McDonald, 1999) was computed for each of the factors to estimate internal reliability, where values ≥ 0.80 indicate “good” reliability. This coefficient is considered preferable to the traditional Cronbach’s Alpha (\(\alpha\)), as coefficient \(\alpha\) assumes all items have equal factor loadings, whereas coefficient \(\omega\) acknowledges that some items better represent the factor (i.e., have a higher factor loading) than others. For negative emotionality, \(\omega\)= 0.787; reading/writing problems, \(\omega\)= 0.849; academic disengagement, \(\omega\)= 0.841; hyperactive-disruptive, \(\omega\)= 0.817; motor/coordination problems, \(\omega\)= 0.849; and social communication problems, \(\omega\)= 0.830. Altogether, the revised BB-MDS appears to have “good” internal reliability.

Table 3 Factor loadings from Exploratory Structural Equation Modeling with targeted rotation on full sample (N = 47,571)
Table 4 Inter-factor correlations with full sample (N = 47,571)

To examine test–retest reliability, we used participants who completed the BB-MDS and then re-tested within 7 days without any intervention in-between. There were 121 participants between the ages of 4 and 18 years (M = 9.14; SD = 3.33) that provided test–retest data. Test–retest reliability coefficients (per Pearson correlations) were high for each of the subscales: negative emotionality (r = 0.83, p < 0.001); reading/writing problems (r = 0.85, p < 0.001); academic disengagement (r = 0.82, p < 0.001); hyperactive/disruptive (r = 0.83, p < 0.001); motor/coordination problems (r = 0.88, p < 0.001); and social communication problems (r = 0.75, p < 0.001).

Factor structure and measurement invariance across groups

Freely-estimated EFAs (using CF-Equamax rotation) observed strong model fit across gender and adolescent status (RMSEA ranged from 0.039 to 0.041; CFI ranged from 0.952 to 0.960, SRMR ranged from 0.018 to 0.019). Tucker’s coefficient of congruence was ≥ 0.97, which suggests that loadings were equivalent across reported gender and adolescent status. Tests of measurement invariance can be found in Table 5. Both \(\Delta\) RMSEA and \(\Delta\) SRMR suggest that the BB-MDS is invariant across gender and adolescent status, while \(\Delta\) CFI was slightly over the threshold outlined by Chen for adolescence (Chen, 2007). Given the agreement between \(\Delta\) RMSEA and \(\Delta\) SRMR, in tandem with high coefficients of congruence in freely-estimated EFAs, the convergence of evidence provides support to the notion that the BB-MDS is equivalent across gender and adolescent status.

Table 5 Tests of measurement invariance across reported gender and adolescent status

Discussion

The Brain Balance program uses a survey to obtain parent-reported measures of children’s functioning in multiple developmental domains pre/post-program participation. Data collection over the past several years from the survey’s parent-reported measures allowed the present psychometric analysis on large samples of children and adolescents (total of 47,571) who participated in the program. In this analysis, the original survey was refined and shortened; the resulting factor structure performed well on two validation samples and appears to be equivalent across age and gender. The results provide evidence for several strong measurement properties of the BB-MDS.

Although published findings on the mental health and cognitive outcomes of Brain Balance program participants have been recent (Jackson & Robertson, 2020; Jackson & Wild, 2021), the first center-based Brain Balance program opened in 2007. At that time, there was an immediate need for Brain Balance professionals to be able to provide measurable data to parents on their children’s pre-program baseline functioning and post-program progress. The program’s original parental survey was developed based on this need and was designed to capture the potential impact of the training program on participants’ functioning in a wide range of developmental domains commonly cited by parents as concerns, yielding a 98-question survey. In this study, an exploratory factor analysis refined the scale to 31 questions, while still retaining the breadth of developmental areas measured. The analysis indicated a six-factor structure of the BB-MDS, which was shown to be replicable across two validation samples and exhibited strong goodness of fit. The BB-MDS also demonstrated test–retest reliability within a time window when participants had not undergone any intervention. Tests of measurement invariance further suggest that the BB-MDS is equivalent across reported gender and age range (younger than 10 years vs adolescent). Overall, the BB-MDS appears to have strong measurement properties — including validated factor structure, internal reliability, test–retest reliability, and measurement invariance across gender and age — for the parent-reported assessment of six domains of developmental functioning in children and adolescents who participated in the Brain Balance program.

Although data used in the present analysis were collected entirely from in-center Brain Balance locations, there may be utility of the survey for practitioners outside of the Brain Balance program. Individually, each of the six factors in the survey has received substantial attention in the literature as critical for achieving healthy developmental outcomes (Adolph & Hoch, 2020; Ash et al., 2017; Cooper et al., 2014; Graziano et al., 2007; Liew et al., 2018; Macdonald et al., 2018; McClelland et al., 2000). The six-factor structure of the refined BB-MDS — with only 31 questions — represents an advantage over the original 98-question survey in its brevity and time to completion for parents, without sacrificing the number of developmental domains covered. Although other lengthy, more detailed measures of child developmental health status are required for diagnostic purposes, the combination of the six factors in a single survey tool could provide a simple, valid, and cost-effective method for practitioners to reduce the number of standardized assessments required to identify and monitor parent-reported developmental delays or challenges in a range of ages, from preschool-aged children to high school–aged adolescents. Further research would need to be conducted outside of the Brain Balance program to determine the feasibility and potential applicability of the BB-MDS across other settings, such as community-based programs, schools, or clinical settings.

Limitations

There are several limitations to this study. First, the current investigation focused exclusively on the factor structure and internal reliability of the BB-MDS in a clinical population. Although the factor structure of the BB-MDS was well-supported in this sample, it is unclear how well the structure of the BB-MDS fits in a non-clinical sample. Other important components of test validity, such as convergent and discriminant validity, and criterion validity, could not be assessed. We are currently collecting data to examine convergent and discriminant validity. In addition, the BB-MDS does not capture attention/cognitive processes, which is an important component that we are currently assessing. Future studies are needed to characterize the psychometric properties of the survey across different settings and compare its performance to existing developmental assessment tools.

Conclusions

Monitoring of multiple interrelated domains of development in children with developmental challenges, and the subsequent receipt of appropriate interventions, ultimately helps increase the proportion of children who reach their developmental potential. However, well-performing validated tools to monitor struggling children and adolescents who require intervention are needed, especially tools that gauge multiple developmental domains and can be used in a range of ages. The BB-MDS showed good internal reliability, test–retest reliability, a factor structure with strong goodness-of-fit, and evidence that the survey is equivalent across gender and age.