Exploring Sensory Subgroups in Typical Development and Autism Spectrum Development Using Factor Mixture Modelling

This study uses factor mixture modelling of the Short Sensory Profile (SSP) at two time points to describe subgroups of young autistic and typically-developing children. This approach allows separate SSP subscales to influence overall SSP performance differentially across subgroups. Three subgroups were described, one including almost all typically-developing participants plus many autistic participants. SSP performance of a second, largely-autistic subgroup was predominantly shaped by a subscale indexing behaviours of low energy/weakness. Finally, the third subgroup, again largely autistic, contained participants with low (or more “atypical”) SSP scores across most subscales. In this subgroup, autistic participants exhibited large P1 amplitudes to loud sounds. Autistic participants in subgroups with more atypical SSP scores had higher anxiety and more sleep disturbances. Supplementary Information The online version contains supplementary material available at 10.1007/s10803-021-05256-6.


Introduction
Sensory processing in Autism Spectrum Development (ASD) 1 has historically received little attention from clinicians and researchers, and indeed sensory processing differences were only added to the DSM criteria for autism in 2013 (American Psychiatric Association). However, research attention to the sensory features of autism has dramatically increased in recent years (Ben-Sasson et al., 2019), and this research emphasizes the importance of this domain in the lives of many autistic people. Furthermore, external sensory inputs play a central role in negative sensory experiences in autism (see Mostafa, 2008;Madriaga, 2010), and society's role in exposing individuals to some of these inputs should not be ignored in favour of a sole focus on sensory processing within autistic people themselves. There are, admittedly, positive aspects to sensory processing in autism, such as experiences of sensory pleasure and enhanced performance on some perceptual tasks (Mottron, 2019). Nevertheless, sensory experiences can be a cause of distress for many autistic people (Belek, 2018). Reports of atypical sensory processing in autism have been linked to anxiety (Mazurek et al., 2013;Neil et al., 2016;Uljarević et al., 2016), sleep problems (Hohn, de Veld et al., 2019;Mazurek et al., 2019), gastrointestinal problems (Mazurek et al., 2013), adaptive functioning (Ausderau et al., 2016;Williams et al., 2018a, b), and participation in activities . Moreover, sensory over-responsivity may divert autistic people's attention away from social information (Green et al., 2018). Perhaps most importantly, not only have autistic sensory issues been associated with existing measures of quality of life (Lin & Huang, 2019), but they have been described as being a factor of quality of life in autism in and of themselves (McConachie et al., 2019).

Heterogeneity of Sensory Features
However, patterns of sensory processing in autism are highly heterogeneous (Uljarević et al., 2017). Importantly, if sensory processing patterns differ across individuals, then dissimilar individuals may require different interventions or accommodations to ameliorate sensory-related challenges.
In an attempt to address the obstacle posed by this sensory heterogeneity, researchers have published numerous studies using clustering and mixture model analyses to define and characterize meaningful subgroups of autistic people with different patterns of sensory symptoms. Overall, these studies suggest that there is a division between an autistic phenotype with more typical sensory features and a phenotype with larger, more global sensory differences, while results regarding other distinctions (e.g., between hypo-responsiveness and hyper-responsiveness, between sensory modalities) are more inconsistent across studies (DeBoth & Reynolds, 2017). Along with factors such as the use of different measures across studies, it seems reasonable to suppose that variability in the age of participants might have contributed to some of these ambiguities; sensory subgroups in some studies differ in chronological age (Lane et al., 2014;Liss et al., 2006;Tomchek et al., 2018). Unfortunately, most studies of sensory heterogeneity in ASD are limited by their cross-sectional nature; some autistic individuals may not be diagnosed until relatively late, potentially preventing their inclusion in younger parts of samples and confounding age differences. Relatively few longitudinal studies have explored sensory subgroups in ASD (cf. Ausderau et al., 2014Ausderau et al., , 2016Dwyer et al., 2020a). The factor mixture modelling approach that is used in the present longitudinal study allows subgroups to vary in levels of latent variables over time.

Dimensional and Categorical Models
Theoretically speaking, factor mixture modelling partially addresses another limitation of prior studies using clustering and mixture models to explore patterns of sensory processing in autism: their implicit assumption that patterns of heterogeneity can be adequately described through defining categorical subgroups, rather than through dimensional scores. Whereas conventional latent class and latent profile mixture models assume that observed variables are independent within each subgroup or class, factor mixture models describe patterns of covariation among variables (Lubke & Muthén, 2005). In a sense, factor mixture modelling offers some degree of synthesis between the categorical and dimensional approaches to heterogeneity (Clark et al., 2013). Factor mixture modelling describes different patterns by assigning individuals to different classes on the basis of probabilities, but individuals might still vary in their levels of the continuous latent variables onto which observed variables load.
That said, the factor mixture model may not entirely resolve the theoretical difficulties involved in subgrouping with potentially continuous data. Notably, as argued by Fushing and McAssey (2010), the question of exactly how many categorical subgroups exist in a particular dataset-that is, the question of what number of classes is "optimal"-may be ill-posed unless subgroups are well-separated and nonoverlapping. Notably, such a non-overlapping structure might make these subgroups visually obvious, eliminating the need for mixture modeling. Thus, the reader should bear in mind that the classes defined in the present study may not have a discrete existence as the sole subgroups that may be validly used to describe the present dataset. There might be other subgrouping solutions, with either fewer or more classes, that could also provide meaningful information about patterns in these data. In other words, mixture model classes might be best conceptualized as a descriptive tool for illustrating dimensional heterogeneity at a given level, rather than as a tool for uncovering discrete categorical entities. The fit indices offered by mixture models (see Nylund et al., 2007) could be seen as a means of identifying particularly informative solutions, as well as of reducing researcher degrees of freedom.

Multimodal Measurement
Another obstacle to research on heterogeneity of sensory behaviour in ASD is that posed by the improper reliance on any single type of measurement. Any specific measure indexing individual differences in sensory processing may have limitations or may converge poorly with other types of measures, and for this reason, researchers have called for multimodal investigations of sensory processing heterogeneity in ASD (Uljarević et al., 2017). For example, caregiverreport questionnaires have been criticized on the grounds that caregivers lack direct insight into the internal sensory experiences of autistic people and can therefore only report on external behaviours, which might be misleading (Grandin & Panek, 2014). This is concerning, as most studies describing sensory subgroups in autism have relied on caregiverreport questionnaires (see DeBoth & Reynolds, 2017), with exceptions including a study based on a self-report questionnaire (Elwin et al., 2017) and one based primarily on auditory event-related potentials (ERPs; Dwyer et al., 2020b).
Various studies suggest that continuous associations might exist between sensory questionnaire scores and neurophysiological responses such as ERPs and event-related fields (e.g., Aoki et al., 2019;Carter Leno et al., 2018;Donkers et al., 2015Donkers et al., , 2019Hudac et al., 2018). Indeed, although the subgroups found by Dwyer et al. (2020b) were defined on the basis of the strength of their ERPs, they were later 1 3 compared on auditory subscales of a caregiver-report questionnaire, Short Sensory Profile (SSP), which provided multimodal evidence that relatively strong brain responses to loud sounds in ASD were linked to behavioural auditory distractibility/filtering problems. The present study sample is drawn from the same project-the Autism Phenome Project (APP), a longitudinal study at the UC Davis MIND Institute-as Dwyer et al., (2020b), but it adopts essentially the opposite approach: through factor mixture modelling, subgroups are defined on the basis of covariation between SSP subscales and latent factors, after which their ERPs can be compared. In this approach, the SSP-including non-auditory subscales-dictates the formation of subgroups, but the ERPs remain available to provide a complementary multimodal perspective.

Present Study
The present study uses factor mixture modelling to define different subgroups of participants from the APP in terms of sensory processing features from the SSP: in different models, not only factor loadings but also both longitudinal patterns of change and patterns of covariation between time points are allowed to differ across subgroups. Following the approach taken by Little et al. (2017) and by Dwyer et al. (2020b) both autistic and typically-developing participants are included in the same mixture model. This allows results from both groups to be more clearly placed in context against one another than in studies describing subgroups in ASD alone, or in studies where non-equivalent subgroups are separately defined in ASD and TD. We made the following predictions: 1. That a large majority of typically-developing participants would be assigned, on the basis of posterior probabilities, to a single class, constituting a typical pattern of covariation between subscales and overall sensory processing as reflected in the latent factor. We expected that some, but not all, autistic participants will also be assigned to this class on the basis of their posterior probabilities. 2. That other autistic participants will be assigned to one or more classes showing a pattern of factor loadings differing from the aforementioned typical class. We expected very few typically-developing participants to be assigned to this class/these classes. 3. That autistic participants in classes with atypical patterns of factor loadings would show higher anxiety, poorer sleep, and lower levels of adaptive functioning than autistic participants in the class with typical factor loadings. 4. That autistic participants in any classes where high levels of auditory distractibility and noise distress make major contributions to overall sensory features would show stronger brain responses to loud sounds, consistent with Dwyer et al. (2020a), and that these participants would also show high levels of sensory sensitivity in other modalities.

Methods
The study was approved by the UC Davis Institutional Review Board and all procedures were in accordance with the Declaration of Helsinki.

Participants
Autistic participants in the present study met criteria for a pervasive developmental disorder (based on DSM-IV and Collaborative Programs of Excellence in Autism Network criteria) and passed cut-off scores on the ADOS-G (Lord et al., 2000) and, for either Social or Communication subscales, on the ADI-R (Lord et al., 1994). Further details regarding the APP and participant recruitment can be found in previous publications (e.g., Libero et al., 2016;Nordahl et al., 2011). As part of the APP, caregiver-reports of sensory behaviours on the SSP were collected at two time points an average of 2.80 years apart (range 1.15-5.31 years). APP participants were included in the present study if an SSP form was returned at either APP Time 1 (ages 2 through 4) or APP Time 3 (ages 4 through 9) with complete data (i.e., no items missing) on at least one subscale. Note that between Times 1 and 3, participants returned for a Time 2 visit, but neither SSP nor ERP data were collected at Time 2. A total of 285 participants from the APP were included in the present analysis: 190 ASD (160 male, 30 female) and 95 TD (64 male, 31 female). This included 172 autistic and 87 typically-developing participants with Time 1 SSP data, as well as 87 autistic and 55 typically-developing participants with Time 3 SSP data. Data were available at both time points from 116 participants (69 ASD, 47 TD). Further information regarding participants is provided in Table 1.

Short Sensory Profile (SSP)
The Short Sensory Profile (SSP; McIntosh et al., 1999) is a 38-item caregiver-report questionnaire that has been widely used in research on sensory features in ASD (see, e.g., Hand et al., 2017;Lane et al., 2014;Tomchek et al., 2015;Uljarević et al., 2016). Higher scores reflect relatively "typical" sensory behaviours, whereas lower scores are indicative of more "atypical" sensory behaviours, but the SSP has a contested subscale structure. McIntosh et al. (1999) defined seven SSP subscales, while Tomchek et al. (2014) 1 3 defined a total of six SSP subscales. More recently, Williams et al., (2018a, b) found both of these previous solutions had an unacceptable fit, and they defined nine subscales: Low Energy/Weakness (LEW), Taste/Smell Sensitivity (TSS), Hyperactivity/Inattention (HYI), Tactile Sensitivity (TS), Movement Sensitivity (MS), Auditory Distractibility (AD), Hypo-responsiveness to Speech (HRS), Visual Sensitivity (VS), and Noise Distress (ND). As described below, the present study uses the solution offered by Z. J. Williams and colleagues. In the present study, the SSP was collected at both Time 1 and Time 3.

Mullen Scales of Early Learning (MSEL)
The

Differential Ability Scales (DAS)
The Differential Ability Scales, Second Edition (DAS-II; Elliott, 2007) are a standardized assessment of cognitive ability for children aged 2-17 years; in the APP, this measure was collected at Time 3. The standardized General Conceptual Ability (GCA) score was used as an overall index of cognitive ability in the present study. Note that discrepancies between MSEL DQ and DAS GCA at Times 1 and 3 should be interpreted with caution, as prior research indicates DAS GCA and MSEL DQ are not on the same scale; DAS GCA scores are often higher than MSEL DQ (Farmer et al., 2016).

Vineland Adaptive Behaviour Scales (VABS)
The Vineland Adaptive Behavior Scales, Second Edition (VABS-II; Sparrow et al., 2005) are rating scales intended to assess the adaptive functioning of individuals with developmental disabilities in their natural environments. At Times 1 and 3 in the APP the VABS was collected as a caregiverreport questionnaire. The VABS yields a standardized composite adaptive behaviour score, which was employed in the present study, as well as standardized scores for communication, daily living skills, socialization, and motor skills.

Childhood Behavior Checklist (CBCL)
The Childhood Behavior Checklist (CBCL; Achenbach & Rescorla, 2000) is a caregiver-report questionnaire which aims to measure problematic internalizing and externalizing behaviours. The CBCL's DSM-oriented anxiety T-score was used to index anxiety in this study. At Time 1, this anxiety score was derived from the preschool-age version of the CBCL. At Time 3, the preschool-age CBCL was obtained from 72 autistic participants and the school-age CBCL from 15 autistic participants in the present study.

Children's Sleep Habits Questionnaire (CSHQ)
The Children's Sleep Habits Questionnaire (CSHQ; Owens et al., 2000) is a 45-item questionnaire which asks parents to report the weekly frequency of problematic sleep behaviours. Although the measure has subscales indexing bedtime resistance, sleep onset delay, sleep duration, sleep anxiety, night walking, parasomnias, sleep disordered breathing, and daytime sleepiness, the composite total sleep disturbances questionnaire was used in the present study. Complete data were available from 143 autistic participants at Time 1 and 80 at Time 3.

EEG Task and Processing
EEG data were collected at Time 1 in the APP, when participants were aged 2 through 4 years. The EEG task and processing procedures employed in the present study have been described in greater detail elsewhere (De Meo-Monteil et al., 2019;Dwyer et al., 2020bDwyer et al., , 2021. Briefly, while participants were seated in a caregiver's lap watching a video of their choice, approximately ~ 1200 50 ms complex tones (combining multiple frequencies) of four intensities (50, 60, 70, and 80 dB SPL) were presented binaurally using headphones at a random interstimulus interval of 1-2 s. EEG was sampled at 1000 Hz using a 61-channel electrode cap (easycap.de). Offline in BESA 5.2 (besa.de), data were averagereferenced and subjected to a low-cut filter of 0.4 Hz (12 dB/ octave). Epochs (− 200 ms to + 900 ms) were extracted and bad channels, trials with extreme amplitudes, and trials with mechanical artefacts were removed. In order to eliminate putatively non-neural signals (e.g., muscle tension and eye movements), remaining trials were submitted to secondorder blind source identification (SOBI; Belouchrani et al., 1997) independent components analysis using a semiautomatic artifact removal tool (SMART; see Saggar et al., 2012 for details). Trials were then averaged for each participant and condition and entered into Cartool (Brunet et al., 2011), which was used to interpolate data from excluded channels using a three-dimensional spline (Perrin et al., 1987), further filter the data (second-order Butterworth, 12 dB/octave; 40 Hz high-cut, 60 Hz notch), and apply baseline correction (using the 100 ms immediately prior to stimulus onset). In the ASD group from the present study, usable ERP data were obtained from 115 participants at Time 1.

Confirmatory Factor Analysis (CFA)
Given the existence of multiple SSP factor solutions (McIntosh et al., 1999;Tomchek et al., 2014), of which that described by Z. J. Williams et al., (2018a, b) appeared to have the greatest precision, we chose to proceed with a confirmatory factor analysis (CFA) to determine whether the Williams et al. solution would be acceptable for use in the present study. Mplus version 8.2 (Muthén & Muthén, 1998/2017) was used to estimate separate CFA models at each time point, collapsing across groups. A robust, diagonally-weighted least squares estimator ("WLSMV") was employed. Items were defined as ordered categorical variables. Model fit was measured using Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and Standardized Root Mean Square Residual (SRMR). Hu and Bentler's (1999) criteria (acceptable RMSEA < 0.06, CFI/TLI > 0.95, and SRMR < 0.08) were used to evaluate model fit.

Factor Mixture Model (FMM)
As CFA showed acceptable fit (see results section below), we proceeded to define a factor mixture model in Mplus version 8.2 using the nine SSP subscales proposed by Z. J. Williams et al., (2018a, b) as observed variables. In the overall model, at each time point, a latent factor was defined loading on all nine SSP subscales. This overall latent factor is not intended to suggest that the SSP is unidimensional; it was defined in order to explore how different subscales contribute to the individual's overall level of sensory behaviours and whether these contributions vary across classes of individuals. To simplify the model, factor loadings were constrained to be equal across the two time-points. Covariances between scores on each SSP subscale at Time 1 and the same subscale at Time 3 were estimated, as were covariances between the latent factors at Times 1 and 3. Variances of each latent factor were fixed to one. The mean of the latent factor at Time 1 was fixed to one, and the means of subscales were constrained to be equal across the two timepoints, thereby forcing any longitudinal change in sensory processing scores to be expressed through the estimated mean of the latent factor at Time 3.
The classes defined in the factor mixture model were allowed to vary from the overall model in the loadings of each SSP subscale onto the latent factor (although these loadings were still constrained to be equal across timepoints). Furthermore, in some models, classes were also allowed to vary in the covariance among the latent factors at each time-point (representing the degree to which overall Time 1 sensory features predicted Time 3 sensory features) and/or in the mean of the latent factor at Time 3 (representing overall longitudinal changes in levels of total sensory features).
Fit indices were used to evaluate and compare models (Nylund et al., 2007). These included information criteria: Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC), and sample size-adjusted BIC (SABIC). These information criteria endeavour to maximize both the parsimony and explanatory power of models; smaller values indicate better model fit. Furthermore, statistical tests were used in an effort to determine whether model fit significantly improved with the addition of new classes; these tests were the Lo-Mendell-Rubin (LMR) test, the Vuong-LMR (VLMR) test, and the bootstrap likelihood ratio test (BLRT). Through bootstrapping, the BLRT, unlike the LMR and VLMR tests, empirically estimates the shape of the distribution of differences between models. Finally, entropy was considered; entropy measures the separation of classes, with more separated classes-reflected by higher values-suggesting more powerful models (Celeux & Soromenho, 1996).

Comparisons of Classes
After model selection, exploratory follow-up analyses were conducted to determine whether autistic participants assigned to separate classes differed in chronological age, cognitive ability, adaptive behaviour, anxiety, and scores on each SSP subscale at Time 1 or Time 3. If participants who lacked SSP data at one time point had other relevant scores at that time point, these data were included in the analysis. Kruskal-Wallis tests were used for omnibus effects and Wilcoxon-Mann-Whitney tests with Bonferroni-Holm corrections were used to determine significance of posthoc comparisons, which are reported using the effect size δ (Cliff, 1993) with a 95% asymmetric confidence interval from the orddom R package (Rogmann, 2013). Values of δ can range from -1 to + 1, with each extreme value representing a complete lack of overlap between the ordinal data. In contrast, Cliff's δ values approaching 0 suggest classes are indistinguishable on a given dependent variable. η 2 effect sizes are also reported for Kruskal-Wallis tests (http:// www. psych ometr ica. de/ effect_ size. html).

ERP Analysis
ASD participants with usable ERP data at Time 1 were compared across classes using mixed ANOVA with P1 amplitude and latency as the dependent variable and stimulus intensity and hemisphere as within-subject factors. The approach used to quantify the P1 component in these ERPs has been discussed in greater detail in prior research involving the present dataset (Dwyer et al., 2021). 2 Essentially, in order to avoid confounds with individual differences in topography or cap positioning, the P1 was measured separately in each condition and hemisphere over the fronto-central electrode at which the individual exhibited their largest positive voltage, as well as immediately adjacent electrodes; electrodes outside the pre-defined fronto-central region (see Fig. 1) were not included in the analysis. In each condition, the P1 time window was defined as the area ± 50 ms on either side the greatest positive peak in any channel in the grand-averaged data from both diagnostic groups in the full APP ERP dataset, which was 70-170 ms (50 dB), 59-159 ms (60 dB), 44-144 ms (70 dB), and 41-141 ms (80 dB). Within this time window, latency was measured as 50% fractional area latency (see Luck, 2014): that is, latency was defined as the point at which the area above and below the curve within the aforementioned time window was equalized on either side of the latency estimate. Amplitude was measured as area amplitude within the P1 time window.

Confirmatory Factor Analysis (CFA)
At both time points, all CFA fit indices examined were within acceptable ranges, supporting the SSP factors/subscales defined by Z. J. Williams et al., (2018a, b), at least with regard to the data from autistic and typically-developing children in the present study (Table 2).

Fit Indices and Selection of FMM
As can be seen in Fig. 2, fit indices generally suggested that there was no substantial improvement to fit from allowing covariances of factors or the mean of F2 to vary across classes, over and above allowing factor loadings to vary across classes. For this reason, all models except those where only the factor loadings were allowed to vary across classes were discarded.
However, fit indices offered somewhat unclear feedback regarding the "optimal" number of classes when only factor loadings were allowed to vary across classes (Fig. 2). Smaller values of AIC, BIC, and SABIC are thought to indicate better model fit, and both AIC and SABIC continuously improved as the number of classes was increased from one to seven. On the other hand, entropy became poorer as the number of classes was increased from two to three (higher entropy values suggest better fit). The BLRT test indicated improvements in fit as the number of classes was increased from one to two, p < 0.0001, and from two to three, p < 0.0001, but the LMR and VLMR tests found no improvement in fit even from one to two classes, with each yielding p = 0.26. Meanwhile, BIC improved substantially when the number of classes was increased from one to two, and there was a further modest improvement of BIC from two to three classes, but BIC afterwards plateaued. Overall, the three-class solution may represent somewhat of an intermediate position between indices favouring a small number of classes (LMR, VLMR) and those favoring a large number (AIC, SABIC). Furthermore, the three-class solution appeared to be clinically and practically meaningful. Therefore, the three-class solution was selected. However, as discussed previously, multiple solutions with varying numbers of classes might each independently convey meaningful information about dimensional variability in these data. We have therefore presented two-and four-class solutions for reference in supplementary materials.

Description of Classes
In the three-class solution, the first class was characterized by an extremely high loading of the low energy/weak subscale (LEW) onto the latent factor (Table 3), so that, in this class, participants' overall level of reported sensory behaviour as reflected in the latent factor was highly influenced by raw scores on the LEW subscale ( Supplementary Fig. 1). Given the importance of the LEW subscale in shaping the latent overall factor in class 1-LEW, and in order to better understand this subscale, associations between the LEW subscale and other variables were explored in supplementary materials (Supplementary Tables 5 and 6).
The second class of participants had more balanced loadings, and it is thus referred to as the Generalized Positive Loadings (GPL) class. These loadings appeared to reflect these participants' problematic reported sensory features and behaviours across a number of different subscales. These were most notably Taste/Smell Sensitivity (TSS), Hyperactivity/Inattention (HYI), Tactile Sensitivity (TS), Movement Sensitivity (MS) and Auditory Distractibility (AD).
Finally, the third class was characterized by negative loadings of the subscales onto the latent factor, and it is referred to as the Negative Loadings (NL) class. These loadings appeared to reflect the presence of relatively few unusual or problematic sensory features, with notable negative loadings on HYI and TSS.
Average posterior probabilities of individuals' class assignments are given in Table 4. The lowest average posterior probability was 89%, suggesting class assignments were, for the most part, quite confident.

Classes and Diagnostic Groups
The 1-LEW and 2-GPL classes were overwhelmingly dominated by autistic participants, while the 3-NL class was a mixed group of typically-developing and autistic participants (Table 5). A Fisher's exact test indicated that the proportions of participants from each diagnostic group were not homogeneous across classes, p < 0.0001. Interestingly, a Fisher's test comparing participants from the four largest combinations of class and diagnostic group (i.e., excluding TD participants in classes 2 and 3) found a lack of homogeneity in the proportions of participants with and without Time 3 SSP data, p = 0.006.

Classes and Raw SSP Scores
As very few typically-developing participants were assigned to classes 1-LEW and 2-GPL (Table 6), analyses comparing classes on the basis of raw SSP scores were carried out within the ASD group. Participants assigned to separate classes differed in the raw scores they obtained on a number of SSP subscales at both Time 1 (Table 7) and Time 3 (Table 8), confirming that differences in factor loadings translated into differences in actual scores. Visualizations of participant raw scores and trajectories on each subscale are available in supplementary materials (Supplementary Figs. 1,2,3,4,5,6,7,8,and 9), as are longitudinal Here, entropy appears to favour a two-class solution and BIC a three-class solution, while other fit indices appear to suggest continued improvements through to seven classes comparisons of SSP subscores at Times 1 and 3 within each class and group (Supplementary Tables 7,8,9,and 10).
In addition, Wilcoxon-Mann-Whitney tests were used to determine whether autistic and typically-developing participants assigned to class 3-NL significantly differed in their raw SSP scores at Time 1 and also at Time 3. As shown in Table 9, autistic participants in this class had lower (more atypical) scores than their typically-developing counterparts on almost all subscales.   Table 4 Posterior probabilities of assignment to each class, as averaged across all participants in each class Columns define actual class assignments, while rows refer to the posterior probabilities of assignment within each class for participants in a given column. That is, participants who are actually assigned to Class 1-LEW have on average a 92.09% posterior probability of assignment to said class, along with a 3.45% posterior probability of assignment to Class 2-GPL and a 4.46% posterior probability of assignment to Class 3-NL
In addition, however, a two-way interaction between class and intensity attained significance, F(6, 336) = 2.38, p = 0.03, 2 G = 0.01 ( Supplementary Fig. 10), and this interaction remained significant after Greenhouse-Geisser correction for sphericity violation, GGε = 0.88, p = 0.04. Due to the presence of this interaction, the original omnibus main effects of class and intensity should be regarded with Fig. 4 Boxplots comparing P1 ERP latencies across autistic participants assigned to different classes, overlaid by individual participants' latency values. A significant three-way ANOVA interaction of stimulus intensity (loudness), hemisphere, and mixture model class was observed. Uncorrected tests suggested P1 latencies might be shorter over the right hemisphere in class 1-LEW in the 80 dB condition as well as in class 2-GPL in the 70 dB condition, but these effects were modest and did not survive correction for multiple comparisons Fig. 5 Boxplots comparing P1 ERP amplitudes across autistic participants assigned to different classes, overlaid by individual participants' amplitude values. Note that P1 amplitudes in the 80 dB condition remained significantly larger in 2-GPL than other classes after correction for multiple comparisons and after removal of the outlying participant in class 2-GPL (viz., the participant with amplitude > 6 μV) caution. Follow-up ANOVAs comparing pairs of classes at each intensity level indicated that ERP amplitudes in autistic participants, after Bonferroni-Holm correction for twelve comparisons, were only significantly higher in class 2-GPL than class 1-LEW, and than class 3-NL, in the 80 dB condition (Table 11).

Discussion
The present study found clear evidence of meaningful heterogeneity of SSP scores within autism. A total of three subgroups were described, each with different loadings of SSP subscales onto latent factors. The significance and meaningfulness of these subgroups is emphasized by findings that cognitive ability, anxiety, sleep quality, and auditory event-related potential amplitudes differed across autistic participants assigned to separate classes. Notably, allowing levels of the latent factor to vary over time did not appear to improve model fit, in contrast to results obtained by Dwyer et al. (2020a) in the same sample. This may reflect the very limited number of participants with atypical trajectories observed by Dwyer et al., coupled with the greater complexity of the factor mixture model used in the present study.

Class 3-NL
As predicted by the first hypothesis, almost all participants in the TD group were assigned to a single class: namely, class 3-NL. The first hypothesis accurately predicted that many autistic participants would be assigned to this class as well, although the fact that the majority of autistic participants would be assigned to the class was not necessarily anticipated. However, it would not be appropriate to say that autistic participants in class 3-NL had typical sensory processing, or that class 3-NL is defined by typical sensory processing per se. Indeed, autistic participants in class 3-NL were found to have significantly lower (i.e., more atypical/problematic) scores than typically-developing participants on every SSP subscale except for the one indexing movement sensitivity at one or both of the time points from the present study. The autistic participants in class 3-NL did have higher SSP scores on many subscales than autistic participants in other classes, but their SSP scores clearly differed from those of most TD participants. It might be more accurate to say that SSP subscales of autistic participants assigned to class 3-NL covaried with the latent factors in a manner more similar to the bulk of typically-developing participants than to autistic participants in other classes. The overall level of sensory processing indexed by the latent factor itself could still be atypical in some participants from class 3-NL.

Class 1-LEW
The second hypothesis predicted that many autistic participants would be assigned to other classes. This too was supported. Two other classes besides class 3-NL were described: class 1-LEW and class 2-GPL. Factor loadings indicate that the LEW subscale of the SSP exercised an outsized influence over the overall SSP performance of participants in class 1-LEW, and indeed, autistic participants in class 1-LEW showed markedly lower raw SSP LEW scores than autistic participants in other classes. Its dominant role in this subgroup emphasizes the importance of understanding the LEW subscale, as does the fact that the LEW subscale is found not only in the factors defined by Williams et al., (2018a, b), but also the subscales presented by McIntosh et al. (1999) and Tomchek et al. (2014). Prior studies also suggest the LEW subscale may have a large role in sensory heterogeneity in ASD (Hand et al., 2017;Lane et al., 2014).
Exploratory analyses presented in supplementary materials do note some associations between LEW scores and other SSP subscales, which might imply that the LEW subscale is partly related to individual differences in overall sensory processing. However, correlations between the LEW subscale and the SSP were generally fairly modest, such that it may not fully or even primarily index sensory processing. Furthermore, the LEW scale does not seem to primarily measure sleep; notably, autistic participants in the LEW class had better sleep quality than those in class 2-GPL.
Another possibility may be that LEW scale taps into hypotonia and/or physical inactivity. Analyses suggested that lower levels of energy and higher levels of weakness on the SSP were associated with worse fine and gross motor adaptive skills (as measured on the VABS). Although prior research did not find significant correlations between LEW scores and motor performance in ASD (Tomchek et al., 2015), prior research suggests physical activity levels and motor skills may be independent in autistic preschoolers (Ketcheson et al., 2018), which raises intriguing questions regarding what factors might drive individual differences in physical activity in these children. Further research may be necessary to explore that question, as well as whether the LEW scale might be more associated with autistic children's actual levels of exercise and mobility than with their motor skills.
Interestingly, low (more atypical/problematic) LEW scores were also related to increased verbal and nonverbal cognitive abilities at Time 1, and indeed, autistic participants in class 1-LEW were observed to have higher MSEL DQ at Time 1 than participants in other classes. Tomchek et al. (2015) might be read as implying that sedentary behaviour in ASD could be linked to better opportunities for language learning. 3 If this is the case, it is notable that LEW scores in the present sample were far lower (i.e., more "atypical") in class 1-LEW autistic participants than typically-developing participants; the LEW phenotype could therefore represent a protective developmental mechanism in ASD that is not evident in TD. However, the possible benefits of low energy/weakness should not be over-emphasized, as low energy and high weakness were also related to internalizing problems. It is also noteworthy that cognitive ability effects in the present study were not significant at Time 3, though it is possible that this could partly reflect the more limited range of the DAS compared to the MSEL. Moreover, in prior research, autistic toddlers with hypotonia have been reported by parents to have lower quality of life, not only in the physical domain but also psychosocially (Lopez-Espejo et al., 2021). Autistic children with hypotonia also have more autistic characteristics than those without (Lopez-Espejo et al., 2021), and although not all autistic behaviours are weaknesses , current measures of autistic behaviour are based on a pathology paradigm and accordingly do focus on areas of challenge (Timini et al., 2019). Thus, evidence regarding whether hypotonia and low energy/weakness are protective or problematic seems mixed.
Although the LEW subscale is emphasized in discussion of class 1-LEW due to the sheer extent of its influence, it should be borne in mind that loadings on other subscales were positive, in contrast to the negative loadings in class 3-NL. In keeping with this, autistic participants in class 1-LEW had significantly lower raw scores than those in class 3-NL on tactile sensitivity, movement sensitivity, and auditory distractibility at both time points, as well as visual sensitivity and noise distress at Time 1 only. Thus, class 1-LEW is not solely defined by the LEW subscale.

Class 2-GPL
However, it appears that class 2-GPL might be relatively more influenced by these other subscales. Class 2-GPL is defined primarily be generalized positive loadings across SSP subscales, with comparatively balanced loadings suggesting no single subscale is largely responsible for this pattern. The LEW subscale aside, autistic participants in class 2-GPL showed the lowest raw SSP scores. Scores were even lower than SSP scores in class 1-LEW on (at both time points) the tactile and movement sensitivity subscales; (at Time 1) the auditory distractibility, hypo-responsiveness to speech, and noise distress subscales; and (at Time 3) the hyperactivity/inattention subscale.
This phenotype of hyper-reactivity, hyper-activity, and at least social/linguistic hypo-responsiveness was linked to P1 auditory ERPs, providing evidence of multimodal convergence of caregiver-report questionnaire patterns with a neurophysiological response. Specifically, among ASD participants, main effects and interactions could be taken to suggest generally higher P1 amplitudes and particularly high amplitudes in the 80 dB condition in class 2-GPL. Admittedly, the main effect should be interpreted with caution due to the interaction, and only one of nine main effects comparing autistic participants in different classes achieved significance outside the 80 dB condition even before multiple comparison correction. However, the interaction effect involving the 80 dB condition appears quite consistent with prior associations observed in the present sample between relative 80 dB response strength in the P1 latency range and auditory distractibility and noise distress (Dwyer et al., 2020b). A novel contribution of the present study is the finding that these patterns appear linked to atypical sensory processing in other modalities besides hearing alone, perhaps due to central nervous system influences on sensory processing (such as attention or excitation-inhibition balances) that might hold across modalities.

Anxiety and Sleep
Autistic participants in classes 1-LEW and 2-GPL-that is, those with relatively low SSP scores, suggesting more unusual or problematic sensory processing-were found to have significantly more anxiety and more sleep disturbances than autistic children in class 3-NL at Time 1. Differences between classes were not significant at Time 3. As there was still a strong trend towards a between-class difference in anxiety at Time 3, it is not clear that developmental changes removed between-class differences; attrition between Time 1 and Time 3 may simply have reduced power to detect such effects. In any case, the findings at Time 1 appear broadly consistent with prior research linking sensory processing to sleep (Tzischinsky et al., 2018) and anxiety (Mazurek et al., 2013;Neil et al., 2016;Uljarević et al., 2016) in ASD. Prior longitudinal research (Green et al., 2012) suggests that sensory processing differences might have contributed towards the development of anxiety in these individuals. In particular, sensory sensitivities could cause individuals to fear environments or stimuli that might evoke sensory distress. However, it does not seem impossible that the association could also be bidirectional; anxious vigilance might make distressing sensory stimuli more salient. Qualitative research does note that anxiety exacerbates sensory challenges in autism (Landon et al., 2016). Meanwhile, it seems plausible that sensory processing differences could drive later sleep disturbances, but further analyses would be necessary to empirically explore this. Overall, despite the lack of differences in adaptive functioning between autistic participants in different classes, the effects of anxiety and sleep (both of which are relevant to quality of life in ASD; see Adams et al., 2019;Deserno et al., 2019;Smith et al., 2019) emphasize the importance of sensory processing in the lives of autistic people.

Limitations
The present study has a number of strengths. These notably include its use of a large, well-characterized longitudinal sample, as well as its inclusion of a neurophysiological measure-namely, P1 auditory event-related potentials-to complement caregiver reports of sensory behaviours. However, it is not without limitations.
One limitation is the loss of some participants between Time 1 and Time 3 of the study. Although some level of attrition is only to be expected in a longitudinal study, it is noteworthy that autistic participants in class 3-NL seemed to be less likely to be retained at Time 3. It is not impossible that the factor loadings characteristic of class 3-NL were relatively uncommon in ASD at Time 3, such that the fit of the classes may have differed across time points and groups.
Although the present study sample is large, the computational demands of the factor mixture modelling approach used here prevented us from splitting the sample or using resampling to evaluate the stability of the classes. This makes it difficult to claim that the classes described in the present study exist as clear categorical groups that could be reliably replicated by future studies using similar methods; however, we view these classes primarily as a descriptive technique for exploring data patterns and variability. We remain open to the possibility that other subgrouping solutions might illuminate different patterns and variability.
Another limitation of the present study is the lack of multimodal measures indexing non-auditory domains of sensory processing, such as touch or vision. Although auditory neurophysiological hyper-reactivity to loud sounds converged in the present study with a caregiver-reported phenotype of sensory sensitivity, as well as hyporesponsiveness to speech and hyperactivity/inattention, the present study cannot establish whether and how caregiver reports of sensory behaviours from the present study would have converged with neurophysiological measures outside the auditory modality. In addition, the present study relies on only a single caregiver-report questionnaire, the SSP, and thus lacks the "parallel validation" that could be offered by including other similar measures (Agelink van Rentergem et al., 2021). We also lack other relevant types of sensory measure, such as perceptual acuity measures.
Furthermore, although the present study indicated that autistic participants' levels of anxiety and sleep differed across classes, suggesting associations between these variables and sensory processing, the present study does not resolve the directionality these associations. Additionally, the present study does not determine whether specific SSP subscales accounted for a relatively larger degree of variance in anxiety and sleep than other subscales. Thus, further research will be necessary to explore associations between sensory processing scores and the variables of anxiety and sleep.
One final limitation of the present study is that the SSP subscale solution proposed by Williams et al., (2018a, b) and employed here has not been tested for measurement invariance across samples of autistic and non-autistic children. However, the factor mixture modelling approach used in this study arguably is in some ways related to questions of measurement invariance, insofar as it examines inter-individual differences in how the separate SSP factors converge with overall SSP performance. Indeed, whereas measurement invariance analyses may in practice be used to justify the exclusion or elimination of particular subscales that might, due to their very differential function across groups, be of considerable substantive interest, the present study's approach simply highlights subgroups of individuals whose SSP scores on a particular subscale contribute to overall SSP performance in a manner disproportionate to other subgroups.

Summary
The present study suggests that there are multiple different patterns of relative contributions of different SSP subscales towards overall SSP performance in ASD, as indicated by differences in factor loadings across classes. The present study also finds a single pattern-that of class 3-NL-that is characteristic of most typically-developing individuals. The largest single group of autistic participants also exhibited this pattern, as reflected by their membership in class 3-NL, although autistic participants in this class still had lower mean SSP scores on most subscales than their typicallydeveloping counterparts. Other autistic participants were sorted into two classes characterized by different patterns of factor loadings. In each class, these loadings corresponded to even more atypical SSP subscores on various subscales. However, while one of these classes-viz., 2-GPL-was characterized by generally atypical sensory processing, factor loadings suggested that overall SSP performance in class 1-LEW was heavily shaped by the SSP's LEW subscale, though raw scores on other subscales were low as well. The finding that LEW subscores can so heavily influence overall SSP performance suggests that clinicians and researchers should exercise caution in interpreting the SSP and closely examine not only overall scores but also scores on each subscale. Unfortunately, the meaning of the LEW subscale is somewhat unclear; low LEW scores in ASD were weakly to moderately associated with low scores on other SSP subscales, with poor motor skills, with high internalizing, and with high cognitive abilities.
The present study also provides evidence that caregiverreported sensory processing on the SSP converges with auditory event-related potential amplitudes in ASD. Specifically, neural hyper-reactivity to loud sounds was observed in class 2-GPL, which was characterized by low SSP scores, including high levels of sensory sensitivity in various sensory modalities such as touch and hearing. Thus, neural hyper-reactivity to loud sounds appeared to be associated not only with hyperacusis but also sensory processing in other modalities. Broadly speaking, the present study's finding of convergence between neural and caregiver-report measures furthers progress towards understanding the relationships among different types of sensory processing measures in ASD.
Finally, the present study found increased anxiety and sleep disturbances among autistic participants in classes 1-LEW and 2-GPL, both characterized by more positive factor loadings and lower SSP subscores, relative to class 3-NL. Given the importance of anxiety and sleep in the daily experiences of individuals, this result emphasizes the importance of understanding the heterogeneous sensory phenotypes and experiences of autistic people.