Cohorts and Descriptive Analysis
Figure 1 and Table 1 display the flow chart of key populations and their characteristics, respectively. The majority of participants completed all items (n = 4415; 88.3%). This ‘completers’ group was used as the main analysis group. Around two-thirds of completers (66.4%) took part in the AIM in vertical layout.
Respondents with complete AIM were mainly mothers (92.7%) with a mean (SD) age 38.74 (7.20) years. All 50 states of the US were represented as well as some overseas territories. Children with ASD had a mean (SD) age of 9.01 (3.90) and were mainly male (79.9%). Almost a quarter of children (23.1%) attended full time special education school, while 45.2% spent between 60%-100% of school time with typically developing peers. Of those with SCQ available, 83.5% were verbal (according to item 1 of the SCQ). The only qualitatively notable difference between caregivers who used the vertical instead of horizontal format was their slightly younger mean age (37.7 vs. 40.8 years). Furthermore, there were no notable differences for completers, non-completers, and those which had SCQ and/or RBS-R data available for linkage.
The median time to complete the AIM was 7.08 min [IQR 5.53–9.82]. The mean time was just over one minute faster for completers on the horizontal format (median [IQR] 6.28 min [4.90–8.63]) versus the vertical format (median [IQR] 7.47 min [5.97–10.45]). A minority (4.1% in both vertical and horizontal format) took over one hour to complete all questions.
Item Level Analysis
Full item level analyses are summarized in supplementary Table S1. Responses to most items were approximately normally distributed. None of the items had a ceiling effect, but 5 had a floor effect which was defined by a median response of 1. Namely these items were: Q3 “lined things up” [impact only, repetitive behavior domain]: Q5 “used hand over hand” [frequency and impact; communication domain]: Q27 “used made-up or private language” [frequency and impact; communication domain].
Disregarding missing values, the item with highest (most severe) mean score (3.90) was Q38 “engaged in chit-chat [frequency; social reciprocity domain]. Furthermore, the top five highest scoring items were all frequency questions and only three of the top 20 highest scoring (mean ≥ 3.02) were impact questions. Only two of the 20 lowest scoring items (mean ≤ 2.28) were frequency related. Mean scores for each item were not systematically higher or lower based on the vertical or horizontal layout.
Overall, there was very little missing data on an item-by-item basis. Some questions had as little as 10 missed responses from the whole sample (0.20%). Q36 “showed interest in others” [impact] was most frequently missed but still only for 76 participants (1.52%). All items were more often missing on the horizontal format, however with 2.46% being the highest rate of missing data in this layout (Q36 impact). In general, impact questions were more commonly missing than frequency questions.
Internal Consistency
Cronbach alpha for the total AIM score was 0.96, which is well above the threshold of 0.7, which we pre-specified would identify scores with a good internal consistency. Frequency items and impact items also showed high internal consistency (0.96 and 0.95 respectively), as did each of the individual domains (from 0.79 for social reciprocity to 0.91 for communication). The median (IQR) of all inter-item correlations was r = 0.15 (0.22–0.30) and only the correlation between frequency and impact scores for Q6 “problems with speech” was higher than 0.90. These results indicate little item redundancy.
All domains were positively and moderately inter-related according to Spearman’s rank coefficient (Table 2). The weakest relationship was between Repetitive Behavior and Social Reciprocity (0.39). The strongest relationship was between Repetitive Behavior and Atypical Behavior (0.67). Domain correlations were very similar with both Spearman and Pearson correlation methods, indicating that relationships between domain scores were linear.
Table 2 AIM Inter-domain Spearman–rank correlations Convergent Validity
The total AIM score showed good convergent validity with the total SCQ score (r = 0.55, Table 3). Each individual AIM domain was also positively correlated (r ≥ 0.34) with the total SCQ score. As hypothesized, the SCQ Reciprocal Social Interaction domain has highest correlations with the AIM Social Reciprocity (0.48) and Peer Interaction (0.45) domains. Also as expected, the SCQ Repetition/Stereotyped Behavior domain had the strongest relationship with AIM domains of Repetitive Behavior (0.48) and Atypical Behavior (0.34). None of the SCQ-AIM domain-domain relationships met the threshold of 0.5 however, and specifically against our expectations, the SCQ Communication domain was least correlated with the AIM Communication domain (0.18). In sensitivity analyses this correlation was raised to 0.34 in verbal children and 0.25 in non-verbal children. When restricting to a 4 to 5 years old age-range, the correlation was 0.19.
Table 3 Convergent Validity (Pearson’s correlations) between AIM Domains and SCQ and RBS-R Domains The RBS-R total score had a strong positive correlation with the total AIM score (0.64). It also had good correlation (≥ 0.30) with all AIM domains, frequency and impact scores. Furthermore, for the RBS-R and AIM, all domain-domain correlations were positive, and were strongest (between 0.51 and 0.74) in the 4 pre-hypothesized cases. Results for both SCQ and RBS-R remained stable when restricting the analysis population to those children who were exactly the same age (in years) at the time of SCQ/RBS-R and AIM (opposed to within 1-year, as per main analyses; see Table S3).
Factor Analysis
Table 4 provides a detailed comparison of the proposed factors (Mazurek et al. 2018) and factors found in our confirmatory analysis. The Communication domain was replicated perfectly in our data. The proposed 6 items for this domain all loaded highest on the third factor produced by our data and no other item loaded highest on this same factor. Other well pronounced and well reproduced latent concepts were Repetitive Behavior and Social Reciprocity. All items proposed for these domains loaded highest on factor 1 and factor 2 in our data, respectively. The only additional item with highest loading on factor 2 was Q32 “had positive response to approach”, which was supposed to be part of the Peer Interaction domain. Q32 also had a high loading on factor 4 however, and factor 4 otherwise only had highest loadings of the other 3 of the 4 items representing the Peer Interaction domain. Hence Peer Interaction was also well reproduced as a latent variable. Finally, 3 of the 6 items expected to load together to form the Atypical Behavior domain indeed did load together in a distinct fifth factor. The other 3 items however loaded highest on factor 1, showing some similarity with the Repetitive Behavior concept. The first 3 factors collectively explained 37.1% of total variance in the data. Five factors explained 48.4%.
Table 4 Factor Analysis and Specified Domains of the AIM Known-Group Analysis
For the patients who completed all items, the mean (median) total AIM score was 220.8 (219). In general, frequency items received higher scores than impact items [119.9 (120) vs.100.9 (99)]. Mean and median scores for the five domains were; Repetitive Behavior: 41.3 (40); Communication: 30.7 (28); Atypical Behavior: 34.8 (35); Social Reciprocity 27.1 (27); Peer Interaction 22.9 (23). All of the above summary scores were approximately normally distributed.
Mean scores for total AIM, frequency, impact and all domains increased monotonically from high IQ to low IQ. These associations of low IQ and greater ASD severity were statistically significant in ANOVA analysis (p < 0.01 in all domains). AIM scores were similar between those in full time special education and those who spent less than 30% of school-time with typically developing peers. Otherwise, AIM scores increased with higher proportion of special-education activity and all differences were statistically significant (p < 0.01).
Other ‘known-groups’ were binary-categorized. Both total AIM score (Fig. 2) and impact score (supplementary Figure S2) were able to differentiate between all pre-defined known-groups (p < 0.01). All such associations were directionally as expected, with higher scores in the group expected to have more severe ASD. The largest difference in mean total AIM score was between verbal and non-verbal children (257 vs 214, respectively). The frequency score also differentiated between all known groups (p < 0.01) except for those children with or without another psychiatric comorbidity (p = 0.41, Figure S1). Mean scores for the Communication (Fig. 3) and Peer Interaction (Figure S6) domains were significantly different (p < 0.01) between levels of all 9 pre-defined known-groups. Repetitive Behavior, Social Reciprocity and Atypical Behavior domains significantly (p < 0.01) distinguished between levels of 8, 8 and 7 of the 9 known-groups respectively, too (see supplementary Figures S3-S5). None of the results for known-groups were altered by adjusting for age, i.e. p-values always remained stable (either ≥ 0.05, between 0.01 and 0.05, or < 0.01). Results for a total AIM score based on only 29 items were very similar to those based on all 41 items.
Clinically Important Response Estimates
For the total AIM score, the CIR estimate ranged from 3.30% to 8.25% (Table 5). This corresponded to a change of between 10.8 and 27.1 points on the raw scale (Table S2). The CIR estimate range for the frequency score was between 3.21% and 8.04% and between 3.74% and 9.34% for the impact score. Of the domains, Social Reciprocity had the least variability and hence the smallest estimates for the CIR (3.67% to 9.16%). All other domains had CIR estimates ranging between 4.20% and 4.96% at the lower end, and between 10.49% and 12.41% at the upper end.
Table 5 Estimates for Clinically Important Responses of the AIM scores, overall and by age and IQ group (rescaled scores 0-100) The largest change in variability across strata was for the Communication domain and IQ level. CIR estimates decreased monotonically from low to high IQ (11.69% for IQ < 70, 7.78% for IQ > 100; upper estimates). This corresponded to a 3.7 to 5.6-point difference on the raw scale (in which a maximum change of 48 points is possible). This example aside, the data had stable variance across IQ and age ranges, because estimates of variability were generally only slightly higher in the groups with smallest sample size (IQ < 70 and age 15–17 years). Generally, variance was slightly smaller within children of similar IQ, rather than of similar age.