Planning Skills in Autism Spectrum Disorder Across the Lifespan: A Meta-analysis and Meta-regression

Individuals with an autism spectrum disorder (ASD) are thought to encounter planning difficulties, but experimental research regarding the mastery of planning in ASD is inconsistent. By means of a meta-analysis of 50 planning studies with a combined sample size of 1755 individuals with and 1642 without ASD, we aim to determine whether planning difficulties do exist and which factors contribute to this. Planning problems were evident in individuals with ASD (Hedges’g = 0.52), even when taking publication bias into account (Hedges’g = 0.37). Neither age, nor task-type, nor IQ reduced the observed heterogeneity, suggesting that these were not crucial moderators within the current meta-analysis. However, while we showed that ASD individuals encounter planning difficulties, the bias towards publishing positive findings restricts strong conclusions regarding the role of potential moderators.


Introduction
Planning is defined as choosing and implementing a strategy in new or routine situations in which a sequence of adolescence, which typically goes hand in hand with agerelated improvement in planning ability (Best et al. 2009), with a peak around young adulthood (Anderson et al. 2001; for a meta-analysis see; Romine and Reynolds 2005). This developmental pattern is also experienced in daily life by typically developing individuals and reported by their caregivers (Huizinga et al. 2006;Huizinga and Smidts 2011). Little is known, however, about the development of planning ability in people with ASD. With respect to planning tasks, some studies find age-related improvements from childhood to adolescence (e.g. Happé et al. 2006;Pellicano 2010), whereas other find no gains during this transition (e.g., Goldberg et al. 2005;Van Eylen et al. 2015). However, it has been argued (e.g. Luna 2007;Ozonoff and McEvoy 1994) that people with ASD follow a different developmental trajectory with respect to planning than typically developing people, and, thus, age may explain variability across studies in comparing these groups on planning performance. In sum, the substantial development within the frontal striatal network, together with the possible differences in developmental trajectories of planning ability in people with and without ASD stress the importance of taking the role of age into account when studying planning.
Secondly, the variety of tasks and dependent measures that are reported may partly explain the heterogeneity in findings of planning performance among people with ASD (Kenworthy 2008;Sergeant et al. 2002). For example, it is suggested that people with ASD perform worse on the standard human-administered neuropsychological tasks (e.g. the Tower of London; Lopez et al. 2005) than on their computer-administered variants (e.g. the CANTAB Stockings of Cambridge subtest; see for a review Kenworthy 2008). This conclusion is, however, tentative, as another study did not find a difference in performance between human and computerized administration of the Tower of London task among people with ASD (Williams and Jarrold 2013). This inconsistency in findings combined with the plethora of planning tasks available, raises the question of which of these tasks is most suitable and robust in its findings with regard to people with and without ASD.
Thirdly, variability in intellectual ability (IQ) is posed as a possible moderator of planning performance among people with ASD (Hill 2004;Kenworthy 2008). Some studies show that group differences between ASD and TD on planning measures are more prominent at lower IQ levels (e.g. Hughes et al. 1994). Also, IQ is sometimes found to be more strongly related to performance on cognitive measures in people with ASD than in TD individuals (Brunsdon et al. 2015). However, to date, no systematic review has investigated the role of IQ in planning performance among people with ASD as compared to TD people.
Based on the above, it seems imperative to systematically review the literature on planning ability and articulate the magnitude of the supposed planning deficits in ASD across the lifespan. Furthermore, it seems valuable to investigate other sources of inconsistencies such as the variety of tasks and dependent measures that are reported and the range of intelligence across groups. To this end, this study provides the first comprehensive quantitative review of the literature across all, to the best of our knowledge, studies of planning performance in ASD that fall within our inclusion criteria. By means of a meta-analysis and meta-regression, we aim (1) to present the magnitude of possible planning performance deficits in ASD; (2) to describe potential developmental changes in planning performance across the lifespan; (3) to conceptualize which of the several planning measures is most consistent (e.g. robust) in its findings when comparing people with and without ASD; (4) to investigate whether intelligence levels have an effect on the observed findings when comparing people with and without ASD on planning performance.

Literature Search Strategy
In May and November 2015, a systematic literature search was performed using the online databases PsycINFO, Web of Science, and PubMed. PsycINFO was chosen because it is most frequently used within the behavioral and social sciences and indexes many psychology journals. Web of Science was selected because of its interdisciplinary nature and the high quality of the indexed journals. Finally, given that ASD is seen as a psychiatric disorder and highly comorbid with various medical conditions, PubMed was included to cover the medical journals. 1 PubMed is one of the biggest and most widely used medical databases that largely indexes psychiatry. The search was done with the following terms of interest related to ASD (autism; autistic disorder; pervasive developmental disorder; Asperger; ASD; PDD-NOS) combined with terms related to planning (planning; executive function; Tower; Tower of London (ToL); Tower of Hanoi (ToH); Stockings of Cambridge (SoC); Behavioral Assessment of the Dysexecutive Syndrome (BADS); Mazes; CANTAB; WISC; NEPSY; D-KEFS; BRIEF). Reference lists of selected papers were also checked in search of relevant studies.

Eligibility Criteria
Studies were only included if they met the following eligibility criteria: (1) ASD participants were the population  being studied and they met diagnostic criteria according to  the DSM-III, DSM-III-R, DSM-IV, DSM-IV-TR, DSM-5, or ICD-10 (defined by clinical diagnosis, autism questionnaires, interviews or observation schedules: please see Table 1 for details); (2) a typically developing (TD) comparison group was included (3) experimental or clinical neuropsychological planning tasks were used; 2 (4) studies provided outcome data sufficient and suitable for the calculation of effect sizes, either in the published study or upon request; (5) articles presented original data; (6) studies were written in English and published in a peer-reviewed journal between 2003 and November 2015. Preceding studies on planning performance in ASD were included based on the reviews by Hill (2004) and Sergeant et al. (2002) if they met our eligibility criteria.

Study Selection
Titles and abstracts of retrieved records were screened for eligibility. Studies were excluded if they clearly did not meet our inclusion criteria. After this initial search, the full texts of the remaining records were screened for eligibility. The corresponding authors of articles that did not report sufficient data for the calculation of effect sizes and/ or the moderator analysis were contacted to try to retrieve the missing data, as well as any unpublished data on the subject. None of the replies included such unpublished data. Studies that fulfilled the criteria (either immediately or after receiving additional data from the corresponding authors) were included in the meta-analysis. An independent researcher checked the full text screening and the extracted data of the selected studies. Any disagreements between the first author and this researcher were discussed and resolved with a third assessor. See Fig. 1 for a flow diagram of the search results.
2 Note that we chose to not include studies using the Trail Making Test (Reitan and Wolfson 1985) which was reported on in the last qualitative review of Hill (2004). Rather than a pure measure of planning, it assesses a number of different functions related to mental flexibility (Crowe 1998;Delis et al. 2001). In addition, tasks were not included if they did not assess the cognitive ability of thinking ahead, such as motor planning tasks, or tasks that were not commonly known in the planning literature and of which we, therefore, did not know whether they validly assessed planning. For example, one of the tasks that we did not include was the Question Discrimination and Plan Construction task used in Alderson-Day (2011), as this task was not used in any other ASD planning study and is not widespread in the planning literature.

Data Collection Process
This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocol (PRISMA-P) flow diagram and checklist (Moher et al. 2015). The literature search generated 4618 hits; an additional nine articles was screened for eligibility from the reviews by Hill (2004) and Sergeant et al. (2002). Based on titles and abstracts, the number of articles was narrowed down to 162 studies. After full text screening, 106 studies did not meet inclusion criteria according to the first author and an independent researcher. Reasons for excluding studies were the absence of an ASD group (n = 6) or TD comparison group (n = 23), no assessment of an experimental or clinical neuropsychological planning task (n = 68), the non-experimental nature of the study (e.g. a review or case report; n = 5) or the study was not published in an English-language peer reviewed journal (n = 4).
Of the 56 studies that met inclusion criteria, 7 studies reported insufficient information to calculate the effect size (Booth et al. 2003;Just et al. 2007;Lin et al. 2013;McLean et al. 2014;Olivar-Parra et al. 2011;Ruta et al. 2010;Sinzig et al. 2008). Corresponding authors were contacted and one provided the requested information (Sinzig et al. 2008). Therefore, 50 studies were included in our meta-analysis. This resulted in a combined sample size of 1755 participants with ASD and 1642 TD comparison individuals (see Table 1). Twenty-six studies were conducted with childhood samples (mean age ≤12 years), 11 studies with adolescent (mean age 13-18 years), and 13 studies with adult samples (mean age: >18 years). All the study information listed in Table 1 was first recorded by the first author and then verified by an independent researcher.

Dependent Variables
We recorded the dependent measure for each task. It is important to note that despite the use of similar tasks, the studies differed considerably in the reported dependent measure. In addition, the majority of studies reported more than one dependent measure for the task of interest. Therefore, we selected the measure that best reflected planning, and was most commonly reported among the included studies. If this measure was not reported, we requested this data from the corresponding author or, if not available upon request, selected the next measure most demonstrative of planning. When two or more dependent variables were considered to reflect this equally, we tried to reduce heterogeneity by selecting the variable most frequently reported in other included articles. The selection of dependent measures was made before effect sizes were calculated to minimize experimenter bias. Eight studies presented multiple planning tasks. To prevent dependency in our data and       extra weight being assigned to these studies in the metaanalysis, we chose to combine these effect sizes within the same study into one effect size per study (Borenstein et al. 2009), using an earlier reported inter-test correlation (range 0.41-0.63). If this correlation was not available, we used an inter-test correlation of 0.7 as the tasks are supposed to all measure planning ability (rule of thumb in meta-analysis, see Borenstein et al. 2009). See Table 1 for the dependent measure that was selected per task. 3 For each continuous outcome, a standardized mean difference (Hedges' g;Hedges and Olkin 1985) was 3 A rerun of our meta-analysis in which we set the inter-test correlation to r = .41 for the studies of which the inter-test correlation was unknown gave the same main outcome of a significant medium positive effect size of 0.52.  calculated-the difference between the mean score of the ASD group and TD group divided by the pooled standard deviation per planning measure in each study (see Table 1). This effect size is widely used, easily interpretable and can be calculated from t-test statistics (Borenstein et al. 2009;Turner and Bernard 2006). Effect sizes were interpreted accordingly: g = 0.2-0.5 is small; g = 0.5-0.8 is medium; g > 0.8 is large. Therefore, a smaller Hedges' g stands for a smaller distinction between the ASD and TD group. A positive effect size indicates poorer performance by the ASD group as compared to the TD group, whereas a negative effect size indicates that the ASD group outperformed the TD group.

Data Analysis
The data were analyzed using the Metafor package for R (Viechtbauer 2010). Variability among the true effect was expected due to differences in methods and sample characteristics between studies. In order to account for this withinand between-study variation, a random effects model was chosen. In this procedure, the effect size is corrected for sample size of each individual study before the weighted average effect size of planning performance across studies is calculated. A significant degree of between-study variation would imply heterogeneity between studies, driven by additional factors other than planning ability. Therefore, the test of homogeneity of effects was performed (Q statistic). Studies included in quantitative synthesis (meta-analysis) (n=50) Fig. 1 Flow diagram: meta-analysis of planning performance in people with ASD. Six additional studies were excluded from the synthesis because they provided insufficient data to estimate effect sizes after contacting the corresponding authors Since this test does not quantify the amount of betweenstudy variation, we also estimated the amount of residual heterogeneity (τ 2 ) and ratio of true to total variance (I 2 ). The I 2 is interpreted as the proportion of the observed variability in a set of effect sizes that reflects real differences among true effects (Borenstein et al. 2009). Next, random restricted maximum likelihood metaregression techniques were applied to determine possible moderating effects of age. Age was indexed as the mean age of the ASD participants. Using this same technique IQ was explored. For task-type, a subgroup analysis was performed to compare the mean effect for different subgroups of studies using the same type of planning task [Tower; BADS (BADS Zoo Map test, BADS Key Search test, BADS Six Elements test and Mazes); CANTAB (SoC)]. The effect of each moderator was tested separately. The presence of publication bias was assessed with a funnel plot, a regression test for funnel plot asymmetry, and the Trimm and Fill method (Duval and Tweedie 2000). The fail-safe N analysis (Rosenthal 1979) was performed to indicate the robustness of the overall effect. Fig. 2 Forest plot indicating effect sizes (Hedges' g) and 95% confidence intervals for each study effect included in the meta-analysis. Positive effect sizes indicate worse planning performance in the ASD group as compared to the TD group while negative effect sizes indicate that the ASD group outperformed the TD group

Overall Results of Planning Performance in ASD versus TD
The random effects meta-analysis showed a significant medium positive effect size (Hedges'g) of 0.52 (95% CI 0.39-0.66; range −0.53-2.27), indicating that individuals with ASD perform worse on planning tasks as compared to TD controls (z = 7.57, p < .0001). As expected, there was significant heterogeneity in effect sizes across planning studies (Q (49) = 161.7, p < .0001; τ 2 =0.16; I 2 = 71.43). The forest plot in Fig. 2 depicts the summary effect and individual effect sizes of planning performance by ASD as compared to TD.

Outliers and Publication Bias
To investigate the presence of influential data points or outliers, we visually inspected the forest plot (Fig. 2) and calculated Cook's distance. Cook's distance was below one for all effect sizes which suggests that there were no outliers. In addition, a leave-one-out analysis showed that leaving any study out of the meta-analysis would not change the overall results. Finally, a QQ-plot confirmed that there is a normal distribution of effect sizes. None of these methods thus revealed any outliers.
A regression test for funnel plot asymmetry was significant (z = 2.66, p = .008), and therefore suggested the presence of publication bias. This was confirmed by the Trimm and Fill method (Duval and Tweedie 2000), which demonstrated that 11 unpublished studies were missing on the left side of the funnel plot (see Fig. 3). Inclusion of these missing studies would decrease the overall summary medium effect size of 0.52 to a small effect size of 0.37 (95% CI 0.21-0.51). The Rosenthal's fail safe N analysis demonstrated the robustness of the overall effect (3401 null findings are needed to nullify the overall significant effect). Hence, the observed overall effect size is still of relevance, but we must consider a moderate impact of publication bias in the research of planning performance in people with ASD as compared to TD individuals.

Age
Age did not significantly moderate the effect sizes across planning studies (Q AGE = 2.89, p = .09), and heterogeneity in effect sizes remained significant (Q (48) = 152.65, p < .0001; τ 2 = 0.15; I 2 = 70.30). Based on the discussion in non-experimental research whether an increase in planning difficulties in ASD can be found around adolescence (Van den Bergh et al. 2014versus Rosenthal 2013, we also inspected a quadratic relationship. We inserted age as a centered quadratic predictor, and found no support for a quadratic association between age and planning performance (Q AGE 2 = 2.62, p = .11). Furthermore, heterogeneity between studies remained significant (Q (48) = 156.11, p < .0001; τ 2 = 0.15; I 2 = 70.74). However, in one study the mean age of the ASD participants deviated far from the grand mean age of ASD participants (Geurts and Vissers 2012). Visual inspection of the corresponding boxplot showed that this study was indeed an outlier. Excluding this study did, however, not alter our age-related findings as age was still not a relevant moderator (linear: Q AGE = 0.72,

Task-type
The studies were classified according to the following type of tasks: BADS (n = 13), CANTAB (n = 7) or Tower (n = 28). Two studies did not fall in any category and were, therefore, excluded from the moderator analysis (Brunsdon et al. 2015;Taddei and Contena 2013). Task-type was not a significant moderator of effect sizes across planning studies (Q TASK = 0.10, p = .95) and heterogeneity between studies remained significant (Q (45) = 138.73, p < .0001; τ 2 = 0.14; I 2 = 66.38).

IQ
Forty out of 50 studies included estimates of total IQ. For IQ, no outliers were detected. IQ did not significantly moderate the effect sizes across these studies (Q IQ = 2.56, p = .11) and heterogeneity in effect sizes remained significant (Q (38) = 94.41, p < .0001; τ 2 = 0.09; I 2 = 59.92).

Discussion
The aim of the present meta-analysis was to systematically and quantitatively review the experimental literature on planning performance in ASD to examine whether people with ASD encounter difficulties with this skill. In line with non-experimental research, the meta-analysis revealed that people with ASD indeed show poorer planning performance as compared to typically developing (TD) individuals. This difference was moderate in size and consistent across the lifespan, various types of planning tasks, and different intelligence levels. However, please note that examination of publication bias indicated that there may be missing studies with negative effect sizes (i.e., individuals with ASD outperforming people without ASD with respect to planning) in our meta-analysis. Hence, the true effect size might be smaller, but planning deficits do still seem to exist in people with ASD.
As suggested in one of the last, narrative, reviews on planning performance (Hill 2004), we investigated whether age influenced performance on planning tasks. Despite a rather broad age range across studies (5-64 years of age) and the inclusion of 50 studies, age did not moderate the variability in findings across studies that compared people with ASD to TD individuals on planning ability. More specifically, people with ASD seem to have persistent planning deficits throughout their life, unable to attain the performance level of TD individuals. This in line with previous reports on planning (e.g. O'Hearn et al. 2008), and suggests that the developmental trajectory of people with ASD runs parallel below to the trajectory of TD individuals. To date, only one prospective longitudinal study has examined this trajectory in young children with ASD (4-7.3 years) and found age-related gains in executive functioning (including planning) (Pellicano 2010). Moreover, studies focusing on middle aged and older people with ASD were rather scarce in the current meta-analysis. Hence, longitudinal studies across the whole lifespan are needed to test whether a parallel pattern can be replicated, and to improve our understanding about the developmental trajectory of planning skills in ASD.
In addition, we found that several planning measures seem to be evenly consistent in their findings when comparing people with and without ASD on planning performance. Even though the measures differ from each other in for example difficulty level, instruction, and structure of the task, they all find medium effect sizes-all find a moderate deficit with regard to planning ability in people with ASD as compared to people without ASD. This suggests that, contrary to what some previous reports claimed (e.g. Kenworthy 2008;Sergeant et al. 2002), the task-type cannot explain discrepant findings in the literature. Hence, when focusing on the most commonly used tasks the type of planning task does not seem to be crucial when assessing planning abilities in ASD.
Finally, the variability in effect sizes across studies could not be explained by intelligence level. While IQ is strongly related to general executive functioning (e.g. Dang et al. 2014;Friedman et al. 2006), and therefore planning ability, it does not impact the difference in planning performance between those with and without ASD. Our findings should, however, be interpreted with caution, as the number of included studies in the moderator analysis was smaller than in the overall meta-analysis due to missing IQ estimates in ten studies. Moreover, the majority of the included studies and five of the studies that missed exact IQ estimates only assessed people within the normal intelligence range (IQ > 70). Therefore, our finding may not generalize to lower ranges of IQ. Previous reports show that the effect of intellectual ability on ASD outcome is more pronounced in groups of individuals with ASD with IQs within the lower ranges (Matson and Horovitz 2010;McGovern and Sigman 2005). As IQ is strongly related to executive functioning, findings on planning tasks of these individuals are hard to interpret due to their restricted cognitive abilities. Poor performance on measures of planning in individuals with a low IQ may be, at least in large part, attributable to their below average IQ rather than a planning deficit per se. This is in line with the statement by Hill and Bird (2006) that executive function difficulties that are related causally to ASD are most likely to be found in their most pure forms in individuals with ASD with a higher IQ. Although it should be stressed that it is a limitation that we could not investigate these lower ranges of intellectual ability, the fact that our meta-analysis mostly included individuals with ASD within the normal IQ range strengthens the finding that there is indeed a planning deficit in people with ASD as compared to TD individuals.
Although a difference was observed between people with ASD and TD individuals with respect to planning ability, there was a large amount of heterogeneity in these differences across studies that we could not explain by our pre-specified factors. This complicates the interpretation of our findings as it suggests that there must be additional factors that influence the difference in planning performance among people with and without ASD. Five potential candidates come to mind that may moderate planning ability in ASD. First, severity of ASD symptoms may affect planning performance. Across studies included in this meta-analysis, all DSM-IV-TR (APA 2000) subtypes of ASD (Asperger, PDD-NOS or autism) were assigned to one overall ASD category (in line with DSM-5; APA 2013) due to missing specification of this information within the studies. However, some previous reports suggest that EF difficulties increase as symptoms of ASD are more severe. For example, Bölte et al. (2011) found that higher planning difficulties were associated with higher scores for stereotypic, ritualized behavior and interests on the ADI-R and ADOS. These symptoms thus may specify the extent of planning difficulties, and, therefore, ASD symptomatology might even be more interesting to investigate as a moderator of planning ability. Unfortunately, we were unable to do this in the current study, as information on ASD symptomatology was not sufficiently reported in the included studies and the studies that did report on ASD symptoms used a variety of measures, which complicates a moderator analysis. We, therefore recommend that in future studies the relationship between ASD symptomatology and planning ability in people with ASD will be tested.
Second, comorbid psychopathology may influence performance on lab-based planning measures in people with ASD. People with ASD have higher rates of psychiatric comorbidity than people without ASD; 69% percent of people with ASD as opposed to 40% of typically developing people meet criteria for another psychiatric disorder at least once in their life (Buck et al. 2014;Kessler et al. 2005). Psychiatric disorders other than ASD are also related to poorer cognitive functioning (e.g. McDermot and Ebmeier 2009), and thus, it may be that the higher incidence of psychiatric comorbidity in ASD partly explains why planning is worse in those with ASD as compared to people without ASD. Further study is needed to determine the potential impact of comorbid psychiatric disorders on planning performance in people with ASD.
Third, the use of psychotropic medication may affect planning performance in people with ASD. It is wellknown that the majority of people with ASD use some type of psychotropic medication (Esbensen et al. 2009), especially those with comorbid psychiatric disorders (Coury et al. 2012), and that the use of this medication can have adverse effects on cognitive performance (Agay et al. 2010;Amado-Boccara et al. 1995;Linssen et al. 2014). We, therefore, recommend including measures of these factors in future studies on planning performance in people with ASD in order to further explain the heterogeneity across planning studies.
Fourth, related to the impact of IQ, mental age might also be an informative factor for moderation as two individuals with the same IQ may be functioning on different developmental levels (i.e., have a different mental age). Mental age might, therefore, capture the individuals' level of intellectual functioning better than IQ tasks. However, we could not explore the impact of mental age on the variability in effect sizes in our meta-analysis as only five studies reported on mental age. This number is insufficient to make any valid statements concerning moderation of planning deficits in ASD by mental age, but should definitely be investigated in future studies.
Fifth, the choice of comparison group may impact metaanalytic findings. For example, using a comparison group of unaffected siblings of individuals with ASD or a specific clinical group will lead to different, possible smaller, effect sizes, than using a TD comparison group. However, as only one of the included studies had unaffected siblings as the comparison group (Bölte et al. 2011), it is unlikely that this affected our results.
An additional important avenue of future research that cannot be covered within a meta-analysis, is the investigation of individual differences in planning ability. 4 Previous studies show that individual differences in planning ability exist in both people with and without ASD (e.g. Brunsdon et al. 2015;Hill and Bird 2006;Hughes et al. 1994;Wallace et al. 2016). Focusing on individual differences instead of group comparisons can help to determine whether specific subgroups within the ASD group exist with respect to planning performance.
Despite the limitations in relation to the unexplained heterogeneity and the publication bias, the observed planning difficulties of people with ASD as compared to typically developing individuals underline that there might be room for improvement with respect to the planning abilities in people with ASD. As planning is so key to our daily life, intervention aimed at improving this skill might be helpful for people with ASD. The meta-analysis further suggests that it is of importance that null findings (and counter intuitive findings) need to be published as only then we can determine which of the currently studied factors (i.e., moderators) influencing planning abilities can indeed be fully dismissed. Nonetheless, we also need to investigate additional factors that could explain heterogeneity in effects to help unravel the planning deficit among individuals with ASD.