Introduction

Understanding when and under what circumstances mental health problems develop across childhood and adolescence can provide important information to support strategies for prevention and treatment. Gaining this knowledge requires longitudinal research into the timing and the form or shape of change (Ployhart & Vandenberg, 2010), and into the factors influencing mental health problem progression. For children with neurodevelopmental disabilities (NDD), who often display higher levels of mental health problems than their non-disabled peers (Oeseburg et al., 2010, 2011; van Steensel et al., 2011), gaining this knowledge is of even greater importance. The purpose of the present study is to provide a systematic review of longitudinal studies of mental health problems in children with NDD. In this study, we focused on child-onset neurodevelopmental or sensory disabilities, and use the word ‘child’ to refer to the paediatric period, and define it as those aged < 19 years (Sawyer et al., 2019). NDDs include, but are not restricted to, the diagnoses classified as NDDs in the Diagnostic and Statistical Manual of Mental Disorders DSM-5 (American Psychiatric Association, 2013) or as mental, behavioural or NDD in the International Classification of Diseases, ICD-11 (World Health Organization, 2020). Example diagnoses include intellectual disability (ID), autism spectrum disorder (ASD), and attention deficit hyperactivity disorders (ADHD). According to the ICD, NDDs are characterised by a “clinically significant disturbance in an individual’s cognition, emotional regulation, or behaviour…usually associated with distress or impairment in personal, family, social, educational, occupational or other important areas of functioning” (World Health Organization, 2020, p. 1). However, there are several other diagnoses listed in other parts of the diagnostic manuals that arguably also meet this definition. For example, cerebral palsy (CP) and spina bifida (SB) are primarily associated with motor disorders, but also frequently present with cognitive, emotional, and behavioural difficulties (Morris et al., 2013; Rosenbaum et al., 2007).

A mental illness diagnosis is not conceptually equivalent to an NDD, however, for some child-onset disability groups, there has been a long-standing recognition of an increased risk of mental health problems or illness, for example, anxiety disorders in ADHD (van Steensel et al., 2011) or depression in children with ASD (Gotham et al., 2015), while there has been less focus on this aspect of health for others. For example, the presence of mental illness has not commonly been explored in children with CP, although mental health problems, such as distress, loneliness and other psychosocial issues are known concerns (Dickinson et al., 2007; Power et al., 2018). One reason for keeping a broad approach to the included population in the current review, and not focusing on one diagnosis at a time, is to reduce the risk of overlooking general factors influencing mental health problems in children across diagnoses. In non-disabled children, the evidence suggests distinct trajectories for different forms of emotional and behavioural problems during various parts of the developmental period. Physical aggression, for example, seems to have a bell-shaped curve in the first years of life for many children, with increases from 12 to 36–48 months, followed by a decrease in the subsequent years (Alink et al., 2006; Girard et al., 2014; Tremblay et al., 2004). Non-suicidal self-injury, on the other hand, typically appears in early adolescence (Cipriano et al., 2017; Nock, 2010) and decreases in late adolescence (Moran et al., 2012). For internalizing problems, such as anxiety and depression, the general tendency appears to be increasing levels in adolescence (Costello et al., 2011) which may be primarily driven by the higher prevalence in girls (Costello et al., 2003).

Studies investigating factors influencing or predicting the development of mental health problems throughout childhood in non-disabled children have focused on a broad range of factors, including child factors, such as sex or temperament, parental factors, such as parental mental health problems or parenting style, and broader environmental factors, such as socio-economic status or bullying (see for example Basu & Banerjee, 2020; Carneiro et al., 2016; Goodman et al., 2011; Moore et al., 2017; Peverill et al., 2021; Rose et al., 2018). While it is reasonable to assume these factors also play a role in the mental health trajectories of children with NDD, it is important to explore whether the presence of an NDD influences the timing of onset, or shape of change over time, or indeed whether there are additional important variables that impact mental health problem outcomes in this population. Participation in important life situations (Imms et al., 2017) is one potentially modifiable variable that may have a greater influence on the mental health of those with NDD than on children without disability, because children with NDD are known to experience significant participation restrictions (Chan et al., 2005; King et al., 2013; Shabat et al., 2021; Shattuck et al., 2011).

Careful definitions and separation of constructs are essential when discussing outcomes such as mental health problems in children with NDDs, in particular, a distinction between mental illness, mental health problems, and mental health is needed (Granlund et al., 2021). In this review, mental illness is defined as a condition meeting the threshold for diagnosis (e.g., depression, anxiety disorder, bipolar disorder, post-traumatic stress disorder). A mental health problem is defined more broadly as encompassing mental illness, but also includes problems of stress or distress that do not meet the diagnostic criteria for illness (Granlund et al., 2021). In the current review, we applied a broad definition of mental health problems, incorporating both internalizing - and externalizing problems. Including externalizing problems is important since they have been shown to predict internalizing problems later in the developmental period (Mesman et al., 2001; Wang et al., 2018). Traditionally, in diagnostic manuals, mental health has been defined as the lack of mental health problems. In the last decade, a shift towards a dual continua model in which mental health and mental health problems are seen as separate but related phenomena has been seen (Keyes et al., 2002). In the dual continua model, mental health is defined as wellbeing, using a broad definition including emotional wellbeing, psychological wellbeing as well as social wellbeing. Based on the dual continua model, this study primarily aimed to identify and synthesise the evidence from longitudinal studies of mental health problems of children with NDDs. In addition, we aimed to identify pre-disposing or ameliorating factors that influence outcomes. Further, the study sought to identify which children with NDDs appear to be most at risk of developing mental health problems. Knowledge gained will provide useful information for preventative or protective strategies, along with treatment planning and individual support. The review question was: For children with NDD, what are the longitudinal trajectories of mental health and/or mental health problems, and which factors moderate or mediate the development?

Materials and Methods

Design

This was a systematic review, and the findings are reported according to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines (Page et al., 2021). The review was prospectively registered on the PROSPERO International Prospective Register of Systematic Reviews in October 2019 (CDR42019142312).

Search Strategy

Searches were conducted in September 2019 and updated in June 2021. Six databases were searched: PubMed, Web of Science, Scopus, PsycINFO, ERIC, and PsycARTICLES. Search terms were developed for the following key concepts contained in the research question: (i) the population of interest was children aged \(\le\) 19 years; (ii) diagnosed with any NDD; (iii) the outcomes studied were mental health problems; and (iv) the research designs were longitudinal. A sample search strategy is included in Supplementary Table 1. The search strategy was applied with the following limits: publication type to include peer-reviewed journal articles, written in English, and published after 1990.

Study Selection

We sought studies of children and adolescents with NDDs, who were followed longitudinally, for at least two years and assessed on at least three occasions using outcome measures tracking mental health problems. The criterion of at least a two-year follow-up was used to ensure a wide enough time to identify changes in mental health problems with commonly used screening instruments. The criterion of at least three-time points was used to allow for a potential in-depth analysis of the shape of trajectories since two time points only allow for depicting change as a straight line. Inclusion and exclusion criteria were established to screen for eligible articles as described in Table 1.

Table 1 Inclusion and exclusion criteria for study selection

The search results were downloaded with full bibliographic information, and combined into one data source. Duplicates were removed with the R package revtools (Westgate, 2019). Titles and abstracts were screened using the inclusion/exclusion criteria. Reviewers (n = 22 due to the volume of records) received an Excel worksheet with a record (study/document) identification number and title/abstract for their share of documents to screen. The selection at the title and abstract screening stage was made independently by two reviewers. Any document selected by at least one reviewer at this stage of screening was included in the full text screening stage. Full text screening was undertaken by various pairs of reviewers, blinded to each other’s ratings. Disagreements about inclusion/exclusion between reviewers were resolved by a third reviewer.

Data Extraction

A standardized Excel spreadsheet was developed for data extraction. Extracted information included: study identifiers (authors, year); study setting; study purpose; participant characteristics (diagnosis, age, sex); study methodology (design, frequency, and timing of measurement, analysis approach); main and secondary mental health problem outcomes (measures, respondent/s, data); reported moderators and mediators (e.g., child’s biological sex, socio-economic factors, parent factors); authors’ summary of results and conclusions. Pairs of reviewers extracted data independently, and discrepancies were identified and resolved through discussion (with a third reviewer when necessary). Missing data required for the assessment of relevant studies or data synthesis was requested from the study authors. Where possible, outcome data were extracted from each included study for each mental health problem measure at each time point collected. In cases where the outcome data were visualized in graphical figures, but not presented in numbers or obtained from authors, means for each timepoint were extracted using the metaDigitise package (Pick et al., 2018).

Evaluation of the Risk of Bias

Two reviewers (MI, CI) independently assessed each included article for risk of bias, using the Critical Appraisal Skills Programme checklist for cohort studies [CASP; Critical Appraisal Skills Programme (2018)]. Disagreements were discussed and consensus decisions were taken with the involvement of a third reviewer (HD) when necessary. Some adaptations were made to the CASP based on the specific aims and nature of the current review. First, since the question of precision of results was not possible to answer coherently across a methodologically diverse set of studies, reviewers rated all studies reporting some measure of variability positively (with a “yes”). Second, the question about whether results could be applied to the local population was answered as to whether the findings could be generalized to diverse settings since the reviewers in the current study resided in different local settings (e.g., Australia, Sweden). Third, additional criteria were added to some questions to enhance the consistency of responses by raters (see results for further description).

Data analysis

The analyses were conducted in R (R Core Team, 2022) using RStudio (RStudio Team, 2020) and the manuscript was formatted using the papaja package for R (Aust & Barth, 2022). Given the nature of the research question and the expected heterogeneity of included studies’ methods and data, a meta-analysis was not appropriate. Therefore, a narrative synthesis, guided by Popay et al. (2006) was undertaken to address the primary focus of the review. First, the volume (number of studies, participants, and participant groups) and quality (risk of bias) of the evidence were summarised. Findings were then summarised and described as follows: (i) the longitudinal mental health problem outcomes for those with an NDD, including populations studied, time course (i.e. length of follow-up), and identification of outcomes measured and results; (ii) studies reporting data collected using the same outcome measures were graphed longitudinally to aid in the interpretation of findings; and (iii) where there was evidence of contributing factors to mental health problem outcomes, those factors were identified and described along with strength and direction of relationships with mental health outcomes. Figures were created with the ggplot2 [longitudinal trajectories; Wickham (2016)] and robvis [risk of bias assessment; McGuinness & Higgings (2019)] R packages. Mental health problem trajectories were plotted for outcomes used in more than two studies.

The narrative synthesis was used to consider patterns in outcomes along with variations across populations and settings/situations and provide guidance about at-risk groups. Groups of studies with similar populations, and/or outcomes, and/or time courses were considered together in the narrative synthesis where there were data to support this approach.

Results

Study Selection

The searches resulted in a total of 94,662 records, 80,481 in the original search in 2019 and 14,181 from the updated search in 2021. In addition, reference lists of identified systematic reviews from the searches were manually checked, which resulted in 208 more records (that were then checked for doubles and screened). From all records, only 49 publications were included. Figure 1 displays the flow of records through the review and summarises the primary reasons for exclusion. Because our review question focused on longitudinal outcomes, it was common that individual studies were reported within multiple publications over time; at least 18 of the included publications had some degree of overlap in participants with 1–3 of the other included studies.

Fig. 1
figure 1

PRISMA flow diagram (Page et al., 2021) depicting the flow of information in the study including records identified, included, and excluded at the different phases of the study

Study Characteristics

Table 2 presents the characteristics of the 49 included publications. Most studies were conducted in the United States (n = 23 publications), followed by the United Kingdom (n = 8), Australia (n = 6), Canada (n = 7), the Netherlands (n = 5), and one each from Switzerland, Israel, and Germany. Three studies were published in the 1990s and 13 since 2019 (26.5% of those included).

Table 2 Characteristics of the included studies

The most commonly studied populations were children with ADHD and ASD, represented in 12 publications each. Sample sizes ranged from 10 to 722 (total included participants 9,446). Age at baseline ranged from \(\le\) 1 to 12.3 years, with four publications reporting outcomes starting in the infancy-toddler period, 18 in the preschool period; 18 in middle childhood, and 17 in adolescence (total numbers \(\ge\) 49, as some publications reported different outcomes at different time points). The follow-up time ranged from 2 to 16.8 years, with occasions of assessment ranging from 3 to 17 times.

The most commonly used assessment tools were the Child Behavior Checklist (CBCL, n = 13 publications) and the Strengths and Difficulties Questionnaire (SDQ, n = 8 publications). Most outcomes were collected as proxy-report from parents, and on occasion, teachers. Self-reported outcomes were collected in five studies. In studies reporting results of diagnostic interviews, the informant was either a parent (in two studies) or the child (three), but the information was interpreted by professionals.

Risk of bias

The summary of the risk of bias assessment is displayed in Fig. 2. Almost all studies had some risk of bias, with the most common risks across studies associated with lack of information about the accuracy of measurement of outcomes, identification of confounders, controlling for confounders in analyses, and completeness of follow-up (i.e., loss to follow up was common). “Addressing a clearly focused issue” was the domain where the risk of bias was lowest. The full list of risk of bias assessments for each domain and included study can be found in Supplementary Fig. 1.

Fig. 2
figure 2

Summary of the CASP Checklist Risk of Bias Assessments for the included studies (red reflects “high” risk of bias, yellow “can’t tell”, and green “low”)

Longitudinal Trajectories of Mental Health Problems

The main purpose, results, and implications of the included studies, as reported by the original study authors, can be found in Supplementary Table 2. Table 3 displays truncated longitudinal outcomes (first, second, third, and last occasion of assessment) for selected mental health problem scales. Data are organized by the mental health problem scale for ease of comparison across studies and populations. The total number of outcomes displayed is 118, which is lower than the sum of all reported outcomes across the included studies. When multiple scales with different degrees of specificity were reported for the same instrument in a study, only the broadest was included. For example, if the total difficulties scale of SDQ was reported together with the four specific subscales that constitute it, only the total difficulties scale was included. Other studies were excluded from the table when it was not possible to calculate the trend. For 45.8% (n = 54) of these outcomes, the last data point remained within ± 10% of the baseline measure. In 31.36% of outcomes (n = 37), the mental health problem was reduced by more than 10% from the first to the last time point in the trajectory. In 22.88% (n = 27) of the outcomes, mental health problems increased by 10% or more from the first to the last time point. However, the trend pattern differed when comparing scales measuring aspects of internalizing mental health problems to scales measuring aspects of externalizing mental health problems. In internalizing scales, an upwards trend was seen in 31.71% (n = 13) of the selected scales when comparing the first and last time points, a stable trend in 46.34% (n = 19), and a downward trend in 21.95% (n = 9). In externalizing scales, the trend was increasing in 11.43% (n = 4), stable in 42.86% (n = 15), and decreasing in 45.71% (n = 16). A more comprehensive list of outcomes (n = 148), arranged alphabetically by study and outcome, can be found in Supplementary Table 3.

Table 3 Selected mental health problem outcomes of included studies across the first three and the last time points sorted by outcome

Figure 3 displays the variation in longitudinal trajectories measured using the conduct and emotional problems subscales of the SDQ reported in six publications involving five different diagnostic samples. When measured using the SDQ, the conduct problem trajectories of children with ADHD were at a higher level than children with other NDDs such as ASD, specific language impairments (SLI), and specific speech and language disorders (SSLD). The trajectories of children with language-related diagnoses (SLI and SSLD) were found on the opposite side of the scale, starting and ending at a lower level. The trend was decreasing (> 10%) in six of eight of the trajectories when comparing the first and last data points, stable in one, and decreasing (> 10%) in one. For SDQ-measured emotional problems, there was an increasing (> 10%) trend for four of the included trajectories when comparing the first and last data points, decreasing (> 10%) for three, and stable for one.

Fig. 3
figure 3

Longitudinal trajectories for Conduct and Emotional Problems Subscales of the SDQ (panel 1 = Conduct problem subscale; panel 2 = Emotional problems subscale)

Figure 4 displays the variation in longitudinal trajectories measured using the internalizing and externalizing broadband scales of the CBCL reported from 10 publications (nine studies) involving different diagnostic samples. For those studies reporting externalizing problems’ raw scores, three of six trajectories ended at a level 10% or more below the starting point, two remained relatively stable, and one ended at a higher level. Two groups of participants (SLD and CP) followed trajectories starting and remaining at lower levels than the trajectories reported in studies involving children with ADHD, ASD, and developmental delay. The externalizing trajectories reported as T-scores (T-scores are relative to the norm data for the test) showed a stable development in six out of seven studies, indicating that the children with different NDDs kept their position relative to the norm group over time in most cases. Raw score internalizing trajectories had an increasing (> 10%) trend when comparing the first and the last data point in three cases, a decreasing (> 10%) in two cases, and stable in one. T-score trajectories indicated a 10% or more decrease in relation to the norm group in one case and stable trajectories in six cases. Visualizations of additional SDQ and CBCL scale scores are found in Supplementary Figs. 2, 3 and 4.

Fig. 4
figure 4

Longitudinal trajectories of Internalizing and Externalizing Broad-Band Scales measured using the CBCL (panel 1 = Internalising T-score; panel 2 = Internalising raw score; panel 3 = Externalising T-score; panel 4 = Externalising raw score)

Variables That Influence Trajectories

There was considerable variation in the approach taken to investigate the role of mediators, moderators, or predictors of mental health problem outcomes across the studies. Twenty-seven studies did not assess any influences on the trajectory. Other studies investigated the effect of one or more variables (range 1 to 13). Table 4 displays the variables investigated and those where changes were found to be statistically significant within studies. Variables identified as significantly influencing trajectories (or outcomes at specific time points) suggest the importance of relationships (i.e., quality of parental relationships; peer relationships); severity of symptoms and/or baseline mental health problems; child characteristics (i.e., biological sex, intellectual impairment, child’s communication or language skills); parental health and the socio-economic resources of the family.

Table 4 Investigated and significant associations (predictors, moderators, mediators, and correlations) between the longitudinal trajectories of mental health and other variables in the included studies

Discussion

Synthesis of Findings

The 49 publications reported in this review suggest a growing interest in longitudinal mental health outcomes for those with NDD. The considerable heterogeneity in the populations studied, concepts and scales applied, the length of follow-up, and other methodological aspects risk making any summary of the results simplistic. Nonetheless, some patterns stand out in answer to our research question. First, mental health problems in children with NDD are often stable over time. In almost half of the longitudinal studies included in the present review, change was smaller than 10% when comparing the first and last data points. Second, when a change occurred, it was more likely to be in the form of an upward trajectory for internalizing problems, mirroring findings in children with typical development (Costello et al., 2011), and a downward trajectory for many outcomes that could be described as externalizing behaviours. Third, outcomes measured using the two most commonly applied scales (CBCL and SDQ), show that children with ADHD and ASD tend to start at higher levels of internalizing and externalizing problems, than other diagnostic groups.

There was considerable overlap of studies across the included 49 publications because they were longitudinal designs and because of researchers’ interest in disseminating new knowledge as it is learned. This overlap needs to be considered in relation to the overall volume of research available on the topic. The population included in the research is also important to consider: it was more common that children with ADHD or ASD were the populations of interest than those with primary physical disorders, such as CP or SB. One possible explanation for this difference is that it mirrors the prevalence of different NDDs in the population (McIntyre et al., 2022; Polanczyk et al., 2007; Zeidan et al., 2022), another is that ADHD and ASD, unlike CP and SB, are classified as mental disorders. The scarcity of studies about children with CP and SB could also be due to fewer problems in these groups, making them less motivating to investigate. However, this might be an artefact of clinician and researcher focus on the primary movement and sensory disorders, with a relatively more recent awareness of mental health problems in this group and subsequent lack of longitudinal studies to track impact (Downs et al., 2018; Whitney et al., 2019a, b). In addition, most studies have focused on middle childhood – ages 5 to 15 years, with only a few studies beginning to follow children in earlier childhood or later adolescence. Longitudinal follow-up across the childhood years requires valid reliable measures that cross the age groups.

Summary of Confidence in the Available Evidence

Confidence in the findings of this review is influenced by the volume, quality, and consistency of the evidence available. The inclusion of 49 studies involving 9,446 participants indicates a reasonable volume of evidence when considering NDD overall. We used the CASP tool for cohort studies to evaluate the quality of the included longitudinal studies with 10 of 14 individual items reported in Fig. 2 and Supplementary Fig. 1. We excluded item 3, as none of the included studies was interventional, and item 8, ‘precision’, as interpretation across the range of methods was problematic. Items 7 (results) and 12 (implications) are reported in Supplementary Table 2 but are not included in the risk of bias summary. Of the included studies, only one (Peverill et al., 2019) rated positively on all ten risk of bias domains. The remaining studies carry some risk of bias (bias scores ranged from 1 to 6 per study). The most common concerns related to incomplete follow-up and a lack of control for potentially confounding factors on outcomes.

Incomplete follow-up is common in longitudinal research, and the longer the desired follow-up, the more difficult it is to maintain the sample’s engagement. Strategies for sustaining participant involvement are important, to build trust in the findings as the direction of influence from loss to follow-up may be difficult to predict. We included studies with at least three occasions of measurement over two or more years, as the minimum requirement for a longitudinal study, thus we considered risk of bias related to the length of follow-up in this review to be low. Without at least three measures it is not possible to understand the shape of change (Ployhart & Vandenberg, 2010), and with only three data points (as occurred in 26 [53%] studies), peaks or troughs in measurement are difficult to interpret. The findings of this review suggest that repeated follow-up of mental health problems should continue for longer than two years to obtain accurate outcomes and not miss a critical time point for the trajectory, something that is important for our long-term goal of informing future interventions.

Another serious concern in longitudinal research is the inadequate consideration of potentially confounding factors. In this review, only 22 (45%) studies identified, and analytically controlled for, or explored, potentially confounding or explanatory factors. It is unlikely that all children will experience the same pattern of outcome and identifying factors that could be modified (e.g., parental stress) and sub-groups who may require greater support (e.g., those with co-morbid diagnoses) can only be done if there are well-considered hypotheses that are tested over time.

Variables Explored as Predictors, Moderators, or Mediators in Comparison to the Literature on Children Without Disabilities

Few of the included studies reported information on specific variables mediating or moderating the direction of trajectories over time. A more common design was to include baseline variables predicting outcomes at one or more time points or trajectory group membership. No study investigated the influence of the child’s participation on mental health problem outcomes. Many of the factors found to be associated with the longitudinal trajectories of mental health problems in children with NDDs in the present study are similar or identical to factors identified in studies with typically developing children. For example, family socio-economic status has repeatedly been linked to mental health problems in typically developing children (Peverill et al., 2021). Among the studies included in the current review, a significant association was identified between baseline family income and child conduct disorder and oppositional behaviour in children with ADHD (Lahey et al., 2016), parental education and emotional problems in children with ASD (Stringer et al., 2020), and family income at baseline and heightened levels of externalizing and internalizing problems in children with ASD (Vaillancourt et al., 2017). Female biological sex was also associated with higher rates of internalizing symptoms in ASD (Vaillancourt et al., 2017) and ADHD (Lahey et al., 2016), as expected from findings in typically developing populations (Costello et al., 2003). Another example is maternal depressive symptoms, which have been linked to internalizing and externalizing symptoms in typically developing children in earlier research (Goodman et al., 2011), and to children with developmental disabilities in the current review (Hauser-Cram & Woodman, 2016).

Some factors identified in the present study are more closely related to the NDD itself, e.g., the severity of insistence of sameness or other autistic traits in children with ASD, the presence of comorbid conditions (ID and/or ASD), and other aspects of child functioning (e.g., communicative functioning and/or adaptive behaviour). Such child-related factors are of interest when it comes to identifying sub-groups where the risk for mental health problems is higher but may be relatively static and thus not ideal as a target for interventions at the individual level. However, other identified factors, such as parenting stress, aspects of the child-parent relationship, or peer relationships could be more dynamic and therefore feasible as targets for intervention. In addition to identifying single or, multiple factors influencing mental health problem outcomes in children with NDDs, knowledge about the cumulative effects of risk and protective factors is needed. Wille et al. (2008) highlight the importance of identifying cumulative risks, as these increase the rates of mental health problems, but it is also important to note that the effects of the risks are moderated by increasing child, family, and social resources.

Measures and Respondents

The two most frequently applied scales in the present review, SDQ and CBCL, are both instruments primarily aimed at screening for emotional and behavioural problems. As such they do not clearly distinguish between problems that could be primarily related to an NDD, or as part of the condition (such as hyperactivity), and theoretically separate issues, e.g., anxiety or depression. For more specific subscales, such as those capturing emotional difficulties, the risk of confusion may be negligible. However, when applying the broad-band internalizing and externalizing scales as was often the case in the studies included in this review, there is a risk of confusion between NDD-related difficulties and comorbid problems; for example, communication problems and the peer-problem scale in the SDQ. Consequently, it is not always possible to determine to what degree a change in a longitudinal trajectory reflects a change in a core NDD-related difficulty and/or a separate emotional problem. Items that confound NDD characteristics with mental health problems may also lead to evidence of stronger stability in the problems assessed, which may partly explain the stability seen in the current review. The results of the current study indicate that this problem of conceptual clarity is present in the field but does not clearly describe the extent. Future research will need to further investigate the extent of this conceptual overlap and whether the measures taken to address it are sufficient. Notably, neither SDQ nor CBCL was specifically developed for use with children with an NDD and their application in these groups has been debated. For example, internal consistency is only modest for children with ID (similar to children without ID) in the SDQ (Emerson, 2005), and in those with ID, the factor structure may deviate from the expected three or five-factor solution (Haynes et al., 2013).

The present review demonstrates that the assessment of mental health problems in children with an NDD is still highly dependent on parent ratings. Solely measuring profoundly subjective experiences from the perspective of another person is, however, not without problems. Correlations between teacher, parent and child-rated mental health problems are often weak (De Los Reyes et al., 2015). A similar pattern can also be seen in the few studies with multi-informant ratings included in the present review. In the field of quality of life measurement in individuals with ID, an international panel of experts stated some 20 years ago that “proxy measurement […] is not valid as an indication of a person’s perception of his or her life” (Schalock et al., 2002, p. 462), adding that it should always be clearly stated when measurement reflects the perspective of another person than the participant for whom the outcome is measured. Even though young age and/or level of cognitive impairment sets limits for when self-rating is possible, such factors do not explain the lack of a child perspective in many of the studies included in the current review. The consequence of over-relying on parent observations is that overt behaviours visible in some contexts (e.g., home) are over-emphasized in comparison to covert behaviours and other contexts (e.g., among peers).

Strengths and Limitations

The heterogeneity of the results in the present study is a logical consequence of our broad inclusion criteria. For example, the broad definition of mental health problems led to the inclusion of outcomes such as school suspension and absences (Wei et al., 2014). Although such outcomes may not be considered mental health problems in a more narrow sense, previous research has demonstrated that children who refuse to go to school have a higher risk of psychiatric disorders than other children (Egger et al., 2003). While it might be appealing to limit the scope of this review to a more narrow definition of mental health problems (e.g., only internalizing problems), one or fewer diagnostic groups, or parts of the developmental period, this would also entail the risk of missing patterns that span these elements. The presented study deliberately sought to identify such patterns on an overarching level. The low sensitivity (included records divided by the total amount of records) for the searches indicates that a narrower search strategy could have been used, but also implies that we can be relatively sure that most relevant records have been found. Because of our broad interest, we also retrieved a very large number of records from the searches. To address this volume of work, we involved a large team of reviewers. There is some risk of inconsistency in the selection processes because of the large team, however, varied pairs of reviewers undertook these steps, following written guidelines, and discrepancies were discussed and resolved.

Another limitation is that only studies written in English were included in the review. There is a risk that studies written in other languages and cultural contexts could be systematically different to some degree, thereby introducing some level of bias in the results.

Assessing risk of bias in these studies was limited by the lack of an empirically supported tool specifically for longitudinal research. To increase the consistency in our approach, we developed additional explanatory criteria, and only two authors independently undertook this aspect. Even so, most papers required discussion of at least some criteria to come to consensus decisions.

Our approach to synthesizing the findings may be critiqued for the decision to include only the first three and last data points to interpret the overall direction of the trajectory for each paper. While the figures provide additional information for two measures, this approach did not allow us to analyze the shape of trajectories over time. Of course, all data points in longitudinal research provide key information about how development changes over time and needs to be considered to avoid assumptions of linearity or a simplified form of trajectory. By including only four data points, the additional information contained in 21 of the included studies (where data were collected between five and 17 times) is glossed over. Those with particular interest, are of course able to review the original studies.

Implications of the Findings for Practice and Research

It is clearly important to conduct longitudinal outcomes research, particularly through the childhood years and into adulthood, to understand the course of important developmental outcomes. The variation in approaches to longitudinal studies included in this review, suggests that encouraging researchers to publish their longitudinal study protocols would be of benefit to the field. Protocol publications require research teams to be explicit, and thoughtful, about their theories, hypotheses, and methods, and they benefit from peer review. In addition, future systematic reviews can take advantage of published protocols when interpreting interim publications.

Our review highlights the need for more careful consideration of measures of mental health problems to ensure that the chosen scales do not confound the NDD diagnosis with additional mental health concerns. Self-report of subjectively experienced phenomena is needed, and this requires us to design and validate measures that support children and adolescents with varying communicative and cognitive support needs to report their experiences.

Since many of the factors found to be associated with the mental health problems trajectories were the same as for typically developing children, it is reasonable to think that many interventions designed for typically developing children targeting these factors would be relevant for children with NDD as well. In that case, the question could be more a matter of cognitive and physical accessibility of those treatments rather than of the fundamental theoretical assumptions of the interventions. It is, however, possible that such interventions could become more effective by adding components specifically targeting risk factors that are unique to children with NDD or factors for which exposure is higher than for typically developing children. Parental stress and parental mental health are two examples of modifiable risk factors that were identified in our review. These risk factors are likely to be higher for children with NDD than for typically developing children, but they are also identified in the literature in typically developing children (Lee, 2013; Miodrag & Hodapp, 2010). Incorporating elements aiming at reducing these issues, or reducing the overall number of risk factors (Wille et al., 2008), could be hypothesized to lead to effects on mental health in the children as well, based on the findings in the present review. The Rusk et al. (2018) synergistic change model takes a dynamic systems-based approach to address sustained changes in mental health/wellbeing, suggesting that lasting change (improved mental health) is more likely if reinforcing (synergistic) changes occur across multiple domains of positive functioning: goals and habits; emotions; attention and awareness; virtues and relationships; and comprehension and coping. It is also important to note that there may be additional factors that are uniquely or unusually strongly associated with mental health problems in children with NDD that research to date has failed to identify, such as participation in important everyday activities. Knowledge of general and NDD-unique risk and protective factors is needed to leverage positive change.

Conclusions

We conducted a systematic review of longitudinal trajectories of mental health problems in children with NDDs. Our findings suggest that the most commonly used tools are screening measures of problems and that trajectories are predominantly stable or demonstrate reducing problems. However, there is an important lack of self-report in the available data and an over-emphasis on following the mental health problem outcomes of children with particular NDD diagnoses. The measures used are not specifically designed to assess mental health problems in children with NDDs. Some expressions of mental health issues can be confounded with characteristics of impairment, e.g. hyperactivity, communication problems, and peer problems.

The factors found to be associated with the identified mental health problems were similar to factors found in typically developing children. Many of these factors are static and difficult to change on an individual level. To inform the design of targeted interventions, evidence over time about the emergence and resolution of modifiable risk factors is needed.