Background

Differences in behavioral responses to sensory inputs from the environment have been associated with autism spectrum disorder (hereafter “autism”)Footnote 1 since the first clinical descriptions of the condition [6, 7]. Sensory phenotypes are present across multiple modalities (e.g., auditory, visual, tactile) and include differences in sensory reactivity and modulation, multisensory integration, and certain aspects of perception [8,9,10,11,12,13,14]. With regard to sensory reactivity, these features are frequently parsed into three specific behavioral “response patterns”: hyperreactivity (HYPER; i.e., excessive and/or defensive reactions to stimuli that most individuals find innocuous), hyporeactivity (HYPO; i.e., diminished or absent responses to sensory stimuli that most individuals would respond to), and sensory seeking (SEEK; i.e., unusually strong fascination with or craving of sensory stimulation, often accompanied by repeatedly seeking out specific sensory inputs [15,16,17]). Notably, these response patterns are not mutually exclusive, and many individuals express behaviors characteristic of multiple sensory response patterns, even within the same modality [9, 18, 19]. Sensory reactivity differences are extremely common in autistic individuals: the point prevalence of a child displaying differences in any of the three response patterns (i.e., HYPER, HYPO, or SEEK in any modality) was recently estimated to be 74% using large-scale population-based data from the Autism and Developmental Disabilities Monitoring Network [20], and 70.9–88.3% of autistic youth in two large samples (from the United States and Australia, respectively) were determined to have sensory reactivity differences of at least “mild” severity [21, 22].

Although sensory reactivity differences are prevalent in many childhood-onset neurodevelopmental and neuropsychiatric conditions (e.g., attention deficit hyperactivity disorder [ADHD], anxiety, obsessive–compulsive disorder, Tourette syndrome, Williams syndrome [23,24,25,26,27,28]), and all of these clinical groups can be differentiated from neurotypical controls in terms of sensory reactivity differences (see also [29]), a recent meta-analysis suggests that autistic individuals demonstrate higher average levels of HYPER (with findings mixed and inconclusive for HYPO and SEEK) when compared to individuals with other clinical conditions [30]. Moreover, many qualitative and quantitative studies have linked specific sensory features of autism to functional impairment, reduced activity participation, and lower quality of life (e.g. [31,32,33,34,35,36,37,38,39,40,41]), further emphasizing the importance of research into the sensory aspects of the autism phenotype. However, it is worth noting that not all sensory features of autism are inherently impairing or pathological, and some (particularly within the SEEK domain) are viewed positively by autistic people themselves [42,43,44,45,46,47,48,49,50].

Although recognition of the sensory features of autism has grown noticeably in recent years [30], relatively little published research in this area has evaluated structural relationships between different domains of sensory reactivity or tested the validity of existing theoretical subdimensions to describe this aspect of the autism phenotype (e.g. [51,52,53,54,55,56]). The majority of studies examining sensory features in autism have utilized caregiver-report questionnaires such as the Sensory Profile (SP [57, 58]), the Sensory Experiences Questionnaire (SEQ [19, 59, 60]), and the Sensory Processing-3 Dimensions: Inventory (SP-3D:I [61, 62]), which contain a mixture of HYPER, HYPO, and SEEK items split among sensory modalities. Though combinations of all three response patterns and the five classical sensory modalities (vision, audition, olfaction, gustation, and touch) are typically represented on most sensory reactivity questionnaires, the number of items tapping each subconstruct can vary substantially, and additional sensory modalities (e.g., vestibular sense, facets of somatosensation such as pain/temperature and proprioception, interoception) may or may not be included as well. Notably, these measures are most often scored by generating supra-modal (i.e., combining multiple sensory modalities) HYPER, HYPO, and SEEK “response pattern scores” that aggregate items within a single response pattern across all assessed sensory modalities.

Although the aforementioned supra-modal constructs are consistent with the major conceptual models of sensory features [63,64,65], empirical support for the practice of combining responses to stimuli across multiple sensory modalities into a single “overall response pattern [HYPER, HYPO, or SEEK]” construct (as would be operationalized by a total score on all HYPER items, for instance; see [66]) is relatively limited. When examining the factor structures of existing sensory questionnaires, models that consider supra-modal response pattern factors in isolation tend to be inadequate, demonstrating very poor overall fit to empirical data (e.g. [67]). Thus, in order to successfully explain the factor structure of the HYPER, HYPO, and SEEK constructs, previous studies have needed to utilize more complex models that include not only supra-modal response pattern factors but also modality-specific response pattern factors that account for the additional shared (co)variance between items within a given sensory modality (e.g. [51, 54, 68]). Notably, these models represent bifactor structures (i.e., two-level structures with orthogonal “general” and “specific” factors contributing to each item response), with variance attributable to both supra-modal constructs (e.g., HYPER, HYPO, SEEK) and modality-specific constructs (e.g., Vision, Audition, Olfaction) [69, 70]. Given this division of variance between levels, summed supra-modal response pattern scores may only be clearly interpretable as measures of HYPER, HYPO, or SEEK if the strength of the supra-modal factor is much stronger than the modality-specific factors [70,71,72,73]. However, there has been a dearth of psychometric work using bifactor indices to examine the interpretability of these supra-modal sensory constructs in the autistic population to date (though see [68]); thus, their construct validity in this population remains unclear.

In contrast to studying HYPER, HYPO, and SEEK at the supra-modal level, a minority of studies (e.g. [27, 74,75,76,77,78,79,80,81]) have investigated these sensory constructs in a modality-specific manner by calculating response pattern scores that are limited to a single sensory modality (e.g., “Auditory HYPER,” which reflects the sum score of only the HYPER items within the Auditory modality). As psychophysical and neural measures of sensory function (e.g., detection thresholds, psychometric function parameters, evoked potential amplitudes) are frequently limited to a single sensory modality, some researchers theorize that the modality-specific subconstructs represented by these measures will correlate more strongly with sensory reactivity measures that are limited to that same sensory modality rather than collapsed across modalities (e.g., visual evoked potential amplitudes may be expected to correlate moreso with a measure of Visual HYPER than with general HYPER). To our knowledge, studies to date have not formally tested these hypotheses to determine whether or not modality-specific response pattern scores demonstrate any empirical advantages over conceptually broader supra-modal response pattern scores when correlated with psychophysical or neurophysiological measures of sensory function.

Determining the most appropriate “level of analysis” (supra-modal versus modality-specific versus some combination of the two) for these sensory constructs has major implications for other areas of sensory autism research, as this decision will impact whether modality-specific or supra-modal sensory constructs are assessed by diagnostic/phenotyping instruments, targeted by clinical interventions, correlated with other individual differences, explained with neuroscientific or psychological models (e.g., multiple forms of sensory reactivity having a shared underlying mechanism or cause versus separate mechanisms), and even incorporated into the diagnostic criteria for autism. Thus, additional research is needed to more conclusively determine whether sensory reactivity differences in autism are most appropriately studied at the level of a response pattern score (HYPER, HYPO, or SEEK) combined across modalities (e.g. [30, 66, 82,83,84,85]), at the level of modality-specific response pattern scores (e.g. [75, 77,78,79,80, 86]), or some combination of the two (e.g., interpreting both types of scores; favoring one level of analysis at different points in a study based on the research question or the specific construct(s) being studied).

Purpose

To address this critical gap in research on sensory features in autism, the present study sought to quantitatively investigate the latent structure of caregiver-reported sensory features across a large and heterogeneous group of autistic children. By pooling data from multiple independent research groups and the National Database for Autism Research (NDAR [87]), we compiled a cohort of several thousand autistic children to be analyzed within the methodological framework of integrative data analysis (IDA [88, 89]). The IDA approach has recently gained popularity within autism research, going beyond small sample studies to yield insights about the latent structure of core and associated autism features [56, 90,91,92,93,94,95], the psychometric properties of widely-used measures [56, 92, 96,97,98,99,100,101], and the associations between autism features and other related clinical and demographic variables [93, 99, 102,103,104]. However, many of these studies have not explicitly quantified the degree to which effects of interest vary across pooled datasets (i.e., effect size heterogeneity), arguably a major strength of IDA methodology (see [103] for a notable exception). Utilizing modern psychometric techniques such as item response theory (IRT [105, 106]) and bifactor modeling [69, 70, 107], the current IDA sought to rigorously evaluate the latent structure of sensory features in autism across multiple measures. Our specific aims were to:

  • derive psychometrically sound metrics of HYPER, HYPO, and SEEK within modalities;

  • derive psychometrically sound supra-modal metrics of HYPER, HYPO, and SEEK;

  • use bifactor models and indices to evaluate whether modality-specific HYPER, HYPO, and SEEK metrics (e.g., Auditory HYPER, Tactile HYPO, Visual SEEK) provide added-value to the supra-modal metrics; and

  • estimate meta-analytic associations between psychometrically derived sensory constructs and other clinical and demographic variables (as well as the degree of heterogeneity in those associations) in the autistic population.

Methods

Participants

Data used in the current investigation were obtained from nine separate research groups within the Autism Sensory Research Consortium (https://tinyurl.com/ASRCoverview): University of North Carolina (n = 104), Vanderbilt University Medical Center 1 [PI: CJC] (n = 181), University of California San Francisco (n = 35), Syracuse University (n = 55), University of California Los Angeles (n = 67), Thomas Jefferson University (n = 93), University of Reading (n = 37), Kennedy Krieger Institute (n = 47), and Vanderbilt University Medical Center 2 [PI: TGW/MTW] (n = 114). Although no systematic review of the broader literature was undertaken, the included cases represented the full population of individual-participant data (including unpublished data) gathered by researchers within the Autism Sensory Research Consortium that could legally be shared with the first author’s institution. These data were further pooled with (a) all eligible non-overlapping data available in NDAR (n = 741 [15 collections; NDAR study 1160]), (b) data from a large online cohort including participants drawn from the Kennedy Krieger Institute-based Interactive Autism Network (IAN [108]) and various statewide and local autism advocacy groups (referred to as the University of North Carolina Sensory Experiences Project [SEP] sample [PI: GTB]; n = 1285 [21]), and (c) data from individuals recruited from Simons Powering Autism Research for Knowledge (SPARK [109]) research match (project number RM0035_Woynaroski; n = 1107), resulting in a total of 3866 unique participants across all data sources (see Additional file 1: Table S1 for individual sample demographics). In accordance with IRB approved protocols for each primary study, informed consent was obtained from parents or legal guardians of each participant, and when relevant, assent was obtained from participants as well at the time of data collection. The institutional review board at Vanderbilt University Medical Center approved the secondary analysis of pooled data from these studies.

All participants included in the current study were between the ages of 3 years 0 months and 18 years 0 months and had clinical diagnoses of autism spectrum disorder according to DSM-5 criteria or equivalent DSM-IV-TR diagnoses [15, 110]. Notably, we chose to restrict our analyses to autistic children as a way of protecting against both non-normal latent trait distributions and differential item functioning according to autism diagnostic status [111, 112]. An additional criterion for inclusion in the current study was the accessibility of cross-sectional item-level data from one of the following sensory questionnaires: the Sensory Profile 1 (SP1 [57]), the Short Sensory Profile 1 (SSP1 [113]), the Sensory Profile 2 (SP2 [58]), the Short Sensory Profile 2 (SSP2 [58, 114]), the Sensory Experiences Questionnaire, version 2.1 (SEQ-2.1 [59, 115]), or the Sensory Experiences Questionnaire, version 3.0 (SEQ-3.0 [51, 60]). Although other caregiver questionnaires such as the SP-3D:I were originally considered for inclusion in analyses as well, they were ultimately not included due to a very small number of individuals in our dataset (< 7% of the sample) with usable data on these measures. Broader inclusion/exclusion criteria for participation in contributing studies varied across samples; no participants represented in extant datasets were excluded from the current study due to any additional clinical characteristics (e.g., language level, cognitive skills/IQ), demographic factors (with the exception of chronological age < 3 or > 18), co-occurring medical/psychiatric conditions, or receipt of specific interventions/services.

Constructs and measures

All participants in the current study had usable data on one or more of the primary study questionnaires, including the SP1/SSP1, SP2/SSP2, SEQ-2.1, or SEQ-3.0, and these measures were used to operationalize the sensory (sub)constructs of interest (see Additional file 1: Supplemental Methods for additional details regarding the chosen sensory questionnaires and their use in the autistic population). In the current study, analogous items on the SP1 and SP2 (and their short forms) as well as analogous items on the SEQ-2.1 and SEQ-3.0 were combined into single items for the purpose of cross-dataset analysis. Notably, as the SP1/SSP1 are scored in the opposite direction of the remaining questionnaires, these measures were reverse-scored (i.e., such that scores of “5” represent more frequent behaviors) in order to keep item-level scoring consistent between all items in the study.

In addition to measures of sensory features, we also examined a number of putative demographic and clinical correlates, including age, sex at birth, cognitive ability, adaptive behavior, core autism features, and co-occurring psychiatric symptoms. Cognitive ability was assessed using verbal, nonverbal, and full-scale intelligence quotients (VIQ, NVIQ, and FSIQ, respectively [derived from many instruments]), their developmental quotient (DQ) analogs, and a binary indicator of intellectual disability status (FSIQ < 70 or NVIQ < 70 if no FSIQ available). Adaptive behavior was measured via summary scores (Communication [COM] domain, Daily Living Skills [DLS] domain, Socialization [SOC] domain, and Adaptive Behavior Composite [ABC]) from the Vineland Adaptive Behavior Scales (VABS [116]), including the first, second, and third editions of the measure. Core autism features were assessed using the Social Responsiveness Scale-School Age (SRS [117, 118]) total raw score, as well as the Repetitive Behavior Scale-Revised (RBS-R [119]) repetitive sensory motor (RSM), self-injurious behavior (SIB), and “ritualistic/sameness/compulsive” behavior (RSC) subscales (the latter being the sum of the RBS-R ritualistic/sameness and compulsive subscales due to a high intercorrelation between the two in the current sample; see Additional file 1: Supplemental Methods). Co-occurring psychiatric symptoms (based on multiple measures) were summarized using the trait domains of “internalizing symptoms” (INT), “externalizing symptoms” (EXT), and “total psychiatric symptoms” (TOT), as well as features of ADHD (ADHD). See Additional file 1: Supplemental Methods for additional information on the measures and scores used in the current study.

Sensory item selection

Before statistical analyses were undertaken, items from the four primary sensory questionnaires (SP1/SSP1, SP2/SSP2, SEQ-2.1, SEQ-3.0) were first subjected to a qualitative review by the first author to remove items that were multisensory in nature (e.g., SEQ-3.0 item 95: “How often does your child avoid playing with toys or novel objects that make a lot of noise and light up at the same time?”) or appeared to assess non-sensory behaviors (e.g., SP1 item 48: “Has difficulty paying attention.”). The remaining items were then sorted into modality × response pattern “sensory subconstructs” (e.g., Auditory HYPO) by the first author (ZJW), and this sorting process was reviewed iteratively by four additional experts (authors RS, GTB, CJC, and TGW) until a consensus classification was reached. See Fig. 1 and Additional file 1: Supplemental Methods for additional information on this process. A full list of items and their theoretical classifications can be found in Additional file 1: Table S2.

Fig. 1
figure 1

Schematic diagram with overview of study methodology

Data analysis

Sensory subconstruct refinement and empirical item removal

The first aim of this study was to develop psychometrically sound indicators of each sensory subconstruct (i.e., the combination of modality and response pattern, e.g., Visual HYPO, Auditory HYPER, Tactile SEEK). In order to develop these single-subconstruct scales, we started with all items theoretically classified as belonging to that subconstruct and empirically removed items until the resulting set of items conformed to a unidimensional structure, as defined below. In doing this, we aimed to generate a set of well-differentiated sensory constructs that could then be fit to a bifactor model (with a general factor representing the supra-modal response pattern and specific factors for each modality-specific subconstruct), allowing us to test the hierarchical structure of each sensory response pattern without poorly related items inflating estimates of general factor saturation.

For the subconstruct refinement portion of the study, we opted to use data only from the subset of autistic children who provided data on both (a) one version of the SP and (b) one version of the SEQ (n = 930). However, as relatively few individuals in this pre-defined group of 930 had completed the SP2 or SSP2 (n = 26), we expanded this exploratory sample to encompass all other children in the full sample who had completed any version of the SP2 (n = 267). This resulted in a final sample of 1197 individuals, which we refer to as the “calibration sample”.

All data analyses were conducted in the R statistical computing environment, version 4.2.0 [120]. Subscale item refinement was conducted in the calibration sample using an iterative process based on hierarchical item clustering with the ICLUST algorithm [121, 122], as implemented in the psych R package [123] (see Additional file 1: Supplemental Methods for additional details). ICLUST analysis was conducted on 18 total dimensions representing 6 HYPER subconstructs (Auditory, Visual, Tactile, Olfactory, Gustatory, Vestibular/Proprioceptive [Movement]), 5 HYPO subconstructs (Auditory, Visual, Tactile/Somatosensory, Olfactory, Gustatory), and 7 SEEK subconstructs (Auditory, Visual Tactile, Olfactory, Gustatory, Oral Tactile, and Vestibular/Proprioceptive [Movement]). Within each dimension, the ICLUST process was repeated iteratively, and items were removed on each iteration until the cluster solution stabilized. Once no more items met the criteria for removal, we evaluated the resulting scale for unidimensionality and reliability by fitting it to a graded response model (GRM [124]; a type of IRT model) and assessing that model for global fit and composite score reliability (see Additional file 1: Supplemental Methods for model specifications and specific psychometric criteria used to judge fit). GRMs were fit to 6/6 HYPER subconstructs, 3/5 HYPO subconstructs, and 7/7 SEEK subconstructs, with Olfactory and Gustatory HYPO being excluded due to both containing fewer than the requisite three items for a unidimensional GRM. Poor GRM model fit or low reliability was followed up with examination of local misfit and iterative item removal with the goal of further improving model fit. In cases where a single-subconstruct scale demonstrated inadequate unidimensionality and/or insufficient reliability that could not be corrected by further item removal, that construct was deemed not sufficiently measurable with a subscale, and the subconstruct of interest was then operationalized using the single item from the available pool deemed most global or nonspecific by the first author.Footnote 2 This process was repeated for each subconstruct until there was at least one unidimensional scale or single item available to assess all subconstructs of interest for the current investigation.

Bifactor modeling of sensory constructs

To further assess the validity of the supra-modal HYPER, HYPO, and SEEK constructs, we additionally examined the item-level latent structure for each response pattern using confirmatory bifactor IRT models. Approximate simple structure (i.e., items loading on expected modality factors) was first confirmed for each response pattern using exploratory graph analysis (EGA [125, 126]; see Additional file 1: Supplemental Methods for technical details), and once this structure was confirmed, item-level data were fit to a bifactor GRM [127, 128] (3 total models). Model fit was assessed with the same criteria for adequate fit as used for the unidimensional GRMs (see Additional file 1: Supplemental Methods).

After confirming the adequate fit of each bifactor GRM, bifactor model-based indices [71, 72, 129] were calculated to determine the appropriateness of interpreting supra-modal response pattern scores. Bifactor indices that were examined included coefficient omega total (ωT; model-based total score reliability), coefficient omega subscale (ωS; model-based subscale score reliability), coefficient omega hierarchical (ωH; proportion of variance in total score accounted for by general factor), coefficient omega hierarchical subscale (ωHS; proportion of variance in subscale score accounted for by general factor), explained common variance of the general factor (ECVG), and explained common variance of the specific factor for each subscale (ECVSS). We defined bifactor structures with ωH ≥ 0.80 [71] or the combination of ωH ≥ 0.70 and ECVG ≥ 0.60 [130] as demonstrating evidence of a strong general factor and, thus, a valid and interpretable higher-order construct [131].

A third aim of the study was to determine whether modality-specific subconstruct scores provide substantial “added value” over the supra-modal response pattern scores in characterizing the sensory features of autism. In the current study, we specifically addressed this question using recently-proposed psychometric decision rules based on combinations of bifactor model-based statistics [129]. Modality-specific subconstruct scores (i.e., the set of items representing one subconstruct such as Tactile SEEK) with low reliability (ωS < 0.70) were considered to have sufficient added value with ωHS ≥ 0.25 or ECVSS ≥ 0.45; sensory subconstructs with high reliability (ωS ≥ 0.70), were considered to have sufficient added value with ωHS ≥ 0.20 or ECVSS ≥ 0.30 [129]. With these analyses, we could determine whether the HYPER, HYPO, and SEEK constructs should be interpreted (a) only at the supra-modal level (e.g., with consideration of HYPER as a unitary construct), (b) only for individual modalities (e.g., with consideration of Auditory HYPER and Tactile HYPER as distinct modality-specific subconstructs), or (c) as a combination of the two (analogous to interpretation of FSIQ as well as VIQ/NVIQ on an intelligence test).

Demographic and clinical correlates of sensory reactivity

Modeling procedures The fourth aim of this study was to evaluate correlations between sensory reactivity and demographic and clinical characteristics. Based on the previous bifactor analyses, latent factor scores were calculated for all interpretable modality-specific subconstructs and supra-modal constructs using a plausible value framework [132, 133] (10 plausible values per individual) and then examined using random-effects IDA models [88], which use random effects to account for the heterogeneity of effect sizes among the different study samples. Correlates examined in these models included chronological age, sex (female versus male), cognitive scores (VIQ/VDQ, NVIQ/NVDQ, and FSIQ/FSDQ), intellectual disability status (which was defined as FSIQ < 70 [or NVIQ < 70 in cases where FSIQ was missing], excluding DQ scores), psychiatric symptom scores (internalizing symptoms, externalizing symptoms, total psychiatric symptoms, ADHD features), VABS scores (COM, DLS, SOC, ABC), and core autism features (SRS total score, RBS-R scores [RSM, SIB, and RSC]). Models were not fit for combinations of sensory outcomes and putative correlates with fewer than 100 observed cases. The random-effects IDA models were specified as Bayesian hierarchical linear models with a standardized subconstruct score regressed on the correlate of interest; random intercept and slope terms were also added to account for between-study mean differences in the outcome and effect sizes. Weakly-informative priors were placed on all model parameters. These models were fit using the brms R package [134, 135]. Additional file 1: Table S3 contains additional details regarding IDA model and prior specification, including computational specifications.

To summarize the strength of each IDA-based association, standardized effect sizes (Cohen’s d for categorical predictors and the linear correlation r for continuous predictors) were calculated based on the standardized regression slopes, and posterior distributions were summarized using the median and 95% highest-density credible interval (CrI [136]). Effect size posterior distributions were tested against interval null hypotheses that the effects were too small to be practically significant (d = [− 0.2, 0.2] and r = [− 0.1, 0.1], respectively [137]). It is important to note that such interval null-hypothesis tests have been found to demonstrate substantially lower false-positive rates than traditional frequentist tests of a point-null hypothesis [138], thereby guarding against errors relevant to the large number of hypothesis tests in the current study (see also [139, 140] for Bayesian perspectives on multiple testing). These interval null hypotheses were assessed using a Bayesian hypothesis testing procedure based on the region of practical equivalence (ROPE [136]) and the ROPE Bayes factor (BFROPE [141,142,143]). From the posterior distribution of effect sizes, we calculated the following indices: (a) PROPE, the posterior probability that the null hypothesis is true (i.e., the summary effect size is too small to be practically meaningful), (b) log(BFROPE), a measure of evidence that the summary effect size falls within versus outside the ROPE. Values of log(BFROPE) greater than 1.1 and 2.3 (i.e., log(3) and log(10)), respectively, provide moderate and strong evidence that the true effect size lies outside the ROPE (i.e., evidence that the effect is large enough to be practically meaningful), and log(BFROPE) values less than − 1.1 and − 2.3, respectively, provide moderate and strong evidence that the effect size lies within the ROPE (i.e., that the effect is practically equivalent to zero). If the log(BFROPE) value lies between − 1.1 and 1.1, the evidence for or against the interval null hypothesis is deemed inconclusive [144]. These Bayesian indices were calculated using the bayestestR R package [145].

From each IDA model, we also calculated several heterogeneity indices, including τ2 (the variance of the random slope parameter in standard deviation units), I2 (the percentage of variance in the slope parameter due to between-study heterogeneity), and the intraclass correlation coefficient (ICC; the proportion of total variance accounted for by both the random slope and intercept terms). Lastly, we calculated the 95% prediction interval of r or d [146, 147], which includes the range of values likely to be sampled from a new study of similar size to the ones included in the current analyses.

Results

Demographic and clinical characteristics of the sample are displayed in Table 1, and characteristics of each contributing sample can be found in Additional file 1: Table S1. Children in the combined sample had a mean age of 8.41 (SD = 3.36) years and were predominantly male (79.5%) and White/Caucasian (75.5%; 84.4% of those with non-missing data). The mean full-scale IQ/DQ of children with available data (n = 1028) was 92.1 (SD = 24.5), with FSIQ/DQ scores ranging from 12.0 to 153.0 (FSIQ: M ± SD [min–max] = 98.98 ± 19.59 [32.0–153.0]; FSDQ: M ± SD [min–max] = 59.08 ± 18.45 [12.0–124.0]).

Table 1 Participant demographics and broader characteristics for calibration sample and full sample

Sensory subconstruct refinement

Results of the scale refinement process for each subconstruct are presented in Table 2. Within the HYPER domain, items were originally sorted into six modalities (Auditory, Visual, Tactile/Somatosensory, Olfactory, Gustatory, and Vestibular/Proprioceptive [Movement]), each of which produced a unidimensional subconstruct scale that met our a priori requirements for reliability and model fit. Notably, during the scale refinement process, adequately fitting bifactor structures were also found for both Auditory HYPER (7 items, 1 specific factor; Additional file 1: Table S4) and Tactile HYPER (13 items, 4 specific factors; Additional file 1: Table S5), meeting all our a priori psychometric requirements aside from unidimensionality. For these two sensory subconstructs, unidimensional scales (4 items each) were utilized in the cross-modality structural analysis of HYPER, although subsequent single-modality models were built using the more reliable general factor scores from these bifactor models, as the latter included a broader item pool that was inclusive of more information about the sensory subconstruct of interest.

Table 2 Results of scale refinement process for each single-modality sensory subconstruct

Within the HYPO domain, items were originally sorted into five modalities (Auditory, Visual, Tactile/Somatosensory, Olfactory, and Gustatory), although two of these modalities (Olfactory and Gustatory) ultimately contained fewer than the minimum of three items needed for a subconstruct scale. Thus, these two subconstructs were operationalized with their full item pools: two items in the case of Olfactory HYPO (SEQ-3.0 item 69: “How often does your child seem to be unaware of strong or unpleasant smells that most other people notice?”; SP1 item 125: “Does not seem to smell strong odors.”) and one item in the case of Gustatory HYPO (SEQ-3.0 item 74: “How often does your child have trouble distinguishing between different types of tastes or flavors?”Footnote 3). From the Auditory HYPO items, a three-item composite met our criteria for an adequate subconstruct scale; however, after examining the content of the items (Additional file 1: Table S2), it was noted that these three items measured the subconstruct of “Speech HYPO” (i.e., HYPO to speech in particular, rather than auditory stimuli more generally). Thus, to ensure that reactivity to both speech and nonspeech auditory stimuli were captured in the analysis of HYPO constructs, an additional single item (SEQ-2.1 item 4/SEQ-3.0 item 4: “How often does your child ignore or tune-out loud noises?”) was included to capture non-speech Auditory HYPO. Similarly, although a reliable three-item scale could be derived from the Tactile HYPO items, this scale only contained items assessing HYPO to pain and temperature. Thus, an additional single item (SP1 item 46/SP2 item 26: “Doesn't seem to notice when face or hands are messy”) was chosen as the single indicator for Tactile HYPO unrelated to pain or temperature. Lastly, although the Visual HYPO scale contained three items and fit a unidimensional model adequately (SRMR = 0.024), the scale demonstrated subpar reliability (ω = 0.656, ρxx = 0.631) and, thus, was not retained. We therefore used a single item (SEQ-2.1 item 10/SEQ-3.0 items 22/23: “Is your child slow to notice new objects or toys in the room, or slow to look at objects that are placed or held near him/her?”) to capture Visual HYPO in our structural analyses.

Within the SEEK domain, items were originally sorted into seven modalities (Auditory, Visual, Tactile/Somatosensory, Olfactory, Gustatory, Oral Tactile, and Vestibular/Proprioceptive [Movement]). Notably, based on discussions during the item sorting, the SEEK response pattern encompassed the additional modality of Oral Tactile, which contained items from the SP/SEQ “Taste/Smell” sections that specifically describe a child mouthing or licking nonfood objects without necessarily seeking out the taste of those objects. Of the SEEK subconstructs, unidimensional scales were derived from four (Visual, Tactile, Oral Tactile, and Vestibular/Proprioceptive [Movement]). A three-item Auditory SEEK scale was found to demonstrate adequate fit (SRMR = 0.010) but subpar reliability (ω = 0.623, ρxx = 0.729); thus, it was replaced with a single-item indicator of Auditory SEEK (SEQ-2.1 item 36a/SEQ-3.0 item 7: “How often does your child seem fascinated with sounds?”). Additionally, three-item Olfactory and Gustatory SEEK scales demonstrated inadequate model fit (Olfactory: SRMR = 0.045; Gustatory: SRMR = 0.044); therefore, these constructs were replaced with single items as well (Olfactory: SEQ-2.1 item 36c/SEQ-3.0 item 67: “How often does your child seem fascinated with particular smells?”; Gustatory: SEQ-3.0 item 64: “How often does your child crave foods with a strong taste or flavor (such as spicy, sour, or bitter foods)?”).

Structural evaluation of supra-modal sensory constructs

HYPER

The item-level latent structure of the 23 HYPER items was initially examined using EGA, with the community structure recreating the six hypothesized modality-specific factors. A bifactor GRM with specific factors for each modality demonstrated largely adequate global fit (M2*(138) = 218.4, TLIM2 = 0.976, RMSEAM2 = 0.034, SRMR = 0.053); thus, the bifactor coefficients from this model were interpreted to determine the strength of the general HYPER factor.

Factor loadings and bifactor coefficients from the HYPER model are presented in Table 3 and Fig. 2. Coefficient omega total indicated that the overall HYPER sum score was highly reliable (ωT = 0.988), and coefficient omega hierarchical indicated that the general factor saturation was sufficient to justify interpretation of the total score (ωH = 0.800). However, despite the adequate ωH value, the general factor explained a relatively low proportion of common variance (ECVG = 0.436, i.e., 43.6%), suggesting that modality-specific subconstruct factors were responsible for a slight majority (56.4%) of reliable common variance in HYPER items (see also ECVG and ECVSS values in Table 3 for the proportions of common variance in each subconstruct accounted for by general and specific factors, respectively). Moreover, based on the guidelines of Dueber and Toland [129], five of six modality-specific HYPER subconstructs (all except Tactile/Somatosensory) demonstrated added value beyond that provided by the total HYPER score (i.e., ωS > 0.80 and ECVSS ≥ 0.30 Table 3). Thus, we concluded that both the HYPER total score and subconstruct scores were interpretable. Latent scores for both the HYPER general factor (calculated from the bifactor GRM) and modality-specific HYPER subconstruct factors (calculated from unidimensional GRMs or from bifactor GRMs in the case of Auditory and Tactile HYPER) were, therefore, examined in relation to demographic and clinical correlates.

Table 3 Fully standardized factor loadings and bifactor coefficients for hyperreactivity (HYPER) items
Fig. 2
figure 2

Path diagram of final bifactor model for the hyperreactivity (HYPER) response pattern

HYPO

The item-level latent structure of the 12 HYPO items was first examined using EGA, with EGA results indicating a three-factor solution with Speech HYPO items in one community, Pain/Temperature HYPO items in another, and all remaining items in the third community. Although the pair of Olfactory HYPO items did not form their own community, weighted topological overlap between the two items was high (wTO = 0.353 [148]), supporting the inclusion of a modality-specific factor to accommodate their local dependency. The 12 items were then fit to a bifactor GRM with one general factor and three specific factors (Speech HYPO, Pain/Temperature HYPO, and Olfactory HYPO); all subconstructs represented by single items were specified to load only onto the general factor.

Factor loadings and bifactor coefficients for the HYPO model are presented in Table 4 and Fig. 3. This model demonstrated adequate fit on all indices other than SRMR (M2*(6) = 5.10, TLIM2 = 1.015, RMSEAM2 = 0, SRMR = 0.062) and high total score reliability (ωT = 0.940). However, additional bifactor coefficients did not support the interpretation of the general HYPO factor, which fell below both a priori guidelines for general factor saturation (ωH = 0.653) and proportion of explained common variance (ECVG = 0.398; see also ECVG and ECVSS values in Table 4 for the proportions of common variance in each subscale accounted for by general and specific factors, respectively). Moreover, both the Speech HYPO (ωS = 0.927, ωHS = 0.722, ECVSS = 0.793) and Pain/Temperature HYPO (ωS = 0.846, ωHS = 0.564, ECVSS = 0.698) subconstructs demonstrated large proportions of specific-factor variance, indicating substantial added value over a general HYPO score. Thus, as HYPO subconstruct scores but not the supra-modal HYPO score met our guidelines for interpretability, only the Speech HYPO and Pain/Temperature HYPO latent trait scores (calculated from unidimensional GRMs) were examined in our analysis of clinical/demographic correlates. Notably, despite its bifactor indices demonstrating sufficient “added value” above the general factor the Olfactory HYPO score was not calculated due to it only containing two items (and therefore being insufficient for a unidimensional model).

Table 4 Fully standardized factor loadings and bifactor coefficients for hyporeactivity (HYPO) items
Fig. 3
figure 3

Path diagram of final bifactor model for the hyporeactivity (HYPO) response pattern

SEEK

To examine the latent structure of the SEEK constructs, we first conducted an EGA on the 21 SEEK indicators, finding that they clustered into four expected communities (Visual SEEK, Tactile SEEK, Oral Tactile SEEK, and Movement SEEK); the three single-item indicators were spread among the other communities, with the Auditory SEEK item clustering with Visual SEEK items and the remaining two (Olfactory and Gustatory SEEK) clustering with Tactile SEEK. We then fit a bifactor model in which the 21 SEEK indicators loaded onto their respective modalities and the 3 single items loaded only onto the general factor, although this initial model demonstrated subpar fit to the data based on several indices (M2*(108) = 226.5, TLIM2 = 0.930, RMSEAM2 = 0.046, SRMR = 0.067). In response to this subpar fit, we then chose to remove the three single-item indicators and fit a revised bifactor model with only the four multi-item SEEK constructs (18 items). Factor loadings and bifactor coefficients for the revised SEEK model are presented in Table 5 and Fig. 4. This 18-item bifactor model fit the data adequately for all indices except SRMR (M2*(63) = 103.8, TLIC2 = 0.970, RMSEAM2 = 0.036, SRMR = 0.058), and total score reliability was very high (ωT = 0.970). Notably, the general factor saturation of this model was at the a priori threshold for interpretability (ωH = 0.800), and while the explained common variance was still relatively low (ECVG = 0.533; see also ECVG and ECVSS values in Table 5 for the proportions of common variance in each subscale accounted for by general and specific factors, respectively), this suggested that a composite SEEK score could potentially be interpretable. However, as we excluded three modality-specific subconstructs that fit poorly in the original bifactor model, the omega hierarchical estimate from the 18-item bifactor structure likely overestimated the true general factor saturation of a scale consisting of all seven modalities (i.e., the construct captured by the supra-modal score on the SP or SEQ SEEK composite). Thus, as this overestimate did not exceed the a priori-specified threshold for general factor interpretability, we chose to not interpret the SEEK general factor score. Supporting our decision to interpret SEEK at the single-modality level only, additional bifactor indices suggested that all four modality-specific SEEK subscale scores demonstrated added value over the total score (Visual: ωS = 0.875, ωHS = 0.290, ECVSS = 0.347; Tactile: ωS = 0.753, ωHS = 0.257, ECVSS = 0.380; Oral: ωS = 0.935, ωHS = 0.680, ECVSS = 0.740; Movement: ωS = 0.839, ωHS = 0.238, ECVSS = 0.335). Thus, only the latent trait scores for the four SEEK subconstructs (calculated from unidimensional GRMs) were examined in our analysis of clinical/demographic correlates.

Table 5 Fully standardized factor loadings and bifactor coefficients for sensory seeking (SEEK) items (excluding single-item modality indicators)
Fig. 4
figure 4

Path diagram of final bifactor model for the sensory seeking (SEEK) response pattern

Demographic and clinical correlates

Modeling

Figure 5 displays a summary of all bivariate IDA model-based effect sizes estimating relations between latent (plausible value) sensory scores and identified clinical and demographic correlates (i.e., r for continuous variables and d for binary variables). Additional model-based statistics, including effect size credible intervals, posterior probabilities of the interval null hypothesis (PROPE), log(BFROPE) values, predictive intervals, and heterogeneity estimates are presented in Additional file 1: Tables S6, S7. Of 208 correlations examined (with interval null hypothesis tests used to control the type I error rate and reduce the overall false-discovery rate [138]), 22 (10.8%) demonstrated strong evidence for a practically significant effect (log(BFROPE) > 2.3), 28 (13.8%) demonstrated moderate evidence for a practically significant effect (1.1 < log(BFROPE) < 2.3), 44 (21.7%) demonstrated strong evidence for a trivially small effect (log(BFROPE) < − 2.3), and 43 (21.2%) demonstrated moderate evidence for a trivially small effect (− 2.3 < log(BFROPE) < − 1.1). The remaining 66 correlations (32.5%) provided inconclusive evidence for or against the presence of a practically significant effect (− 1.1 < log(BFROPE) < 1.1). Additionally, of the 26 examined Cohen’s d values, one (3.8%) demonstrated moderate evidence for a practically significant effect (1.1 < log(BFROPE < 2.3), 13 (50.0%) demonstrated strong evidence for a trivially small effect (log(BFROPE) < − 2.3), nine (34.6%) demonstrated moderate evidence for a trivially small effect (− 2.3 < log(BFROPE) < − 1.1), and three (11.5%) were inconclusive (− 1.1 < log(BFROPE) < 1.1). Heterogeneity varied greatly between models (ICC range 0.036–0.850), with random effects typically accounting for approximately 6–14% of total score variance (Additional file 1: Tables S6, S7).

Fig. 5
figure 5

Meta-analytic standardized effect sizes for associations between sensory subconstructs and putative demographic and clinical correlates. All coefficients represent meta-analytic summary effects from random-effects integrative data analysis models. Associations with continuous predictors are quantified using correlation coefficients (r), and associations with categorical predictors are quantified using standardized mean differences (d). Values were tested against an interval null hypothesis of r = [− .1, .1] or d = [− 0.2, 0.2] using a region of practical equivalence (ROPE) Bayes Factor (BFROPE). Effects with log(BFROPE) > 1.1 (significant evidence against the interval null) are presented in bold, whereas effects with log(BFROPE) < − 1.1 (significant evidence in favor of null) are presented in italics. Cells with values of “—” were not examined due to prohibitively small sample sizes (n < 100 across all studies). IQ/DQ, Intelligence/Developmental Quotient; VABS, Vineland Adaptive Behavior Scales (Sparrow 2011); ABC, Adaptive Behavior Composite from VABS; SRS, Social Responsiveness Scale (Constantino and Gruber 2012); RBS-R, Repetitive Behavior Scale-Revised (Bodfish et al. 2000); RSM, repetitive sensory-motor (stereotypy); SIB, self-injurious behavior; RSC, ritualistic, sameness, and compulsive behavior

Correlations with demographic variables

Caregiver-reported sensory reactivity demonstrated relatively few significant correlations with demographic variables (i.e., age, sex), and almost all log(BFROPE) values for these models moderately or strongly favored the null hypothesis of a trivially small effect. However, younger age was significantly associated with a higher degree of SEEK in the Oral Tactile (r = − 0.164, CrI95% [− 0.248, − 0.091], ICC = 0.035 [0.006, 0.087]), and Vestibular/Proprioceptive [Movement] (r = − 0.187, CrI95% [− 0.274, − 0.100], ICC = 0.079 [0.035, 0.147]) modalities, though these effects were small in magnitude.

Correlations with cognition and adaptive functioning

No sensory variable demonstrated practically significant associations with VIQ, NVIQ, or FSIQ when cognitive abilities were measured continuously.Footnote 4 However, moderate group differences were found between individuals with and without intellectual disability for Oral Tactile SEEK (d = 0.422, CrI95% [0.073, 0.768], ICC = 0.093 [0.010, 0.317]) such that individuals with intellectual disability were reported to have higher scores on this construct (e.g., more frequent mouthing of objects). Adaptive skills, as measured by multiple VABS subscales, demonstrated small yet practically significant negative associations with Pain/Temperature HYPO (VABS SOC: r = − 0.182, CrI95% [− 0.267, − 0.091], ICC = 0.157 [0.045, 0.332]; VABS ABC: r = − 0.199, CrI95% [− 0.310, − 0.082], ICC = 0.173 [0.052, 0.362]), and Oral Tactile SEEK (VABS DLS: r = − 0.187, CrI95% [− 0.291, − 0.070], ICC = 0.067 [0.007, 0.193]).

Correlations with core autism features and psychiatric symptoms

Although not associated with cognitive or adaptive behavior scores to a practically meaningful degree, most subconstructs in the HYPER response pattern displayed small-to-moderate and practically significant associations with the RSC domain of the RBS-R (range of practically significant rs = 0.197–0.358). Additional domains of core autism features (SRS, RBS-R, RSM/SIB) were also significantly associated with General HYPER, as well as several of the modality-specific HYPER subconstructs derived from their respective unidimensional scales. Several non-HYPER domains of sensory reactivity, namely Speech HYPO, and Visual/Tactile/Movement SEEK, also demonstrated practically significant associations with one or more aspects of core autism features; these relations ranged from small to moderate in magnitude (Fig. 3; Additional file 1: Table S6). ADHD symptoms demonstrated small and inconclusive positive associations with all HYPER domains, although this symptom cluster did predict Speech HYPO (r = 0.297, CrI95% [0.135, 0.449]) and three of the SEEK domains (Tactile: r = 0.218, CrI95% [0.077, 0.360]; Oral Tactile: r = 0.285, CrI95% [0.156, 0.402]; Movement: r = 0.229, CrI95% [0.078, 0.370]) to a practically meaningful extent. Relations with internalizing psychopathology were similarly selective, with the only practically meaningful associations being with General HYPER (r = 0.218, CrI95% [0.082, 0.347], ICC = 0.140 [0.041, 0.292]), Taste HYPER (r = 0.178, CrI95% [0.086, 0.266], ICC = 0.071 [0.005, 0.203]), and Speech HYPO (r = 0.286, CrI95% [0.125, 0.453], ICC = 0.148 [0.054, 0.285]). Externalizing and total psychopathology scores were more broadly associated with sensory constructs across all three response patterns (range of practically significant rs = 0.179–0.329). Notably, the majority of non-significant correlations between sensory reactivity and core autism features or psychiatric symptoms had BFROPE values that were inconclusive rather than demonstrative of trivially small (i.e., practically insignificant) associations. Based on current evidence, many of these inconclusive correlations were most likely to have true population values in the “small but practically significant” range of r = [0.1, 0.2].

Discussion

Despite the recent elevation of sensory reactivity differences to the status of a diagnostic criterion for autism [15, 16], there has been relatively little empirical work examining the underlying latent structure of these core sensory features within the autistic population [see also 51. By analyzing caregiver-reported sensory reactivity differences in a heterogeneous cross-sectional sample of nearly 4000 autistic children, the current study sought to investigate the hierarchical structure of sensory hyperreactivity (HYPER), hyporeactivity (HYPO), and sensory seeking (SEEK) across the full spectrum of children captured under the label of autism spectrum disorder. Utilizing modern psychometric techniques, we developed structural models of HYPER, HYPO, and SEEK in individual sensory modalities, subsequently testing whether each construct is most appropriately studied at the level of a single supra-modal sensory response pattern (e.g., an overall SEEK score) or separately for each modality within the sensory response patterns (e.g., separate scores for Visual SEEK, Auditory SEEK, and Tactile SEEK). Of the three sensory response patterns included within current autism diagnostic criteria [15, 16], only HYPER demonstrated unambiguous evidence of an interpretable supra-modal construct, whereas supra-modal HYPO scores as currently operationalized were found to have limited construct validity. The evidence for supra-modal SEEK scores was more ambiguous, as we were unable to generate an adequately-fitting bifactor model of SEEK that included all relevant sensory modalities, but once modalities measured with only single items (i.e., Auditory, Olfactory, and Gustatory) were removed, the model containing the four remaining modalities (Visual, Tactile, Oral Tactile, and Movement) demonstrated an adequately fitting latent structure and acceptable general factor saturation. Although our findings did not conclusively support the construct validity of a SEEK composite that includes all seven standard modalities (i.e., those operationalized by the SP or SEQ SEEK response pattern scores), the more limited “General SEEK'' construct described here (consisting of only Visual, Tactile, Oral Tactile, and Movement items) may be a useful supra-modal aspect of the sensory autism phenotype if replicated in future studies. Additionally, irrespective of the construct validity of supra-modal scores, nearly all modality-specific sensory subconstructs demonstrated added value over and above their respective response pattern scores, indicating that modality-specific HYPER, HYPO, and SEEK scores (e.g., Visual HYPER, Visual HYPO) are able explain additional individual differences in sensory reactivity to a greater degree than a single supra-modal HYPER, HYPO, or SEEK score. These findings have implications for researchers interested in characterizing, explaining, or intervening on sensory reactivity in autistic individuals, as they suggest that some of the supra-modal response pattern scores commonly used in these areas may have previously unrealized psychometric limitations. HYPER, HYPO or SEEK scores that are limited to one modality (i.e., single-modality subconstruct scores) could potentially prove advantageous in some contexts, although additional research is necessary to determine the extent to which these measures demonstrate broader construct validity and practical utility over established supra-modal HYPER, HYPO, or SEEK scores.

The current study also provided a wealth of information about the measurement of sensory reactivity in autistic youth based on caregiver-report questionnaires. Using the most frequently employed caregiver-report sensory measures in the autism literature (the SP and SEQ), we attempted to generate unidimensional scales to operationalize each combination of modality × response pattern as its own unique sensory subconstruct. However, based on a priori psychometric criteria, we were unable to generate acceptable unidimensional scales for three of five HYPO modalities (Visual, Olfactory, Gustatory) and three of seven SEEK modalities (Auditory, Olfactory, Gustatory), necessitating the use of single-item indicators of these subconstructs (and one doublet) in later structural analyses. Moreover, within the HYPO domain, the two subconstructs that did produce sufficiently reliable scales reflected fairly specific subsets of the total item pool (e.g., HYPO to pain and temperature rather than all somatosensory stimuli), suggesting that they did not fully operationalize the “Auditory HYPO” and “Tactile HYPO” constructs that we had originally intended to measure. Thus, despite broadband sensory reactivity measures such as the SP, SEQ, and SP-3D:I typically including HYPER, HYPO, and SEEK items for each sensory modality, a sizable minority of “modality × response pattern” subconstructs demonstrated inadequate construct validity within this large autistic sample. In the current study, it is unclear whether this finding stems from instrument-specific measurement issues (i.e., inadequate construct coverage within the specific questionnaires from which items were drawn) as opposed to more general issues regarding the theoretical definition of the construct or its ability to be reliably operationalized as a set of observer-reported questionnaire items. As both sets of issues are likely to contribute in different cases, we suggest some sensory subconstructs (particularly within the HYPO and SEEK domains) are (a) underrepresented in existing questionnaires (i.e., more questions are needed), (b) poorly defined (e.g., Visual HYPO items have unclear relations with specific aspects of visual perception), (c) difficult for caregivers to report on reliably (e.g., few indicators of Gustatory SEEK are present in most children, limiting the pool of potential items available to capture this construct), and/or (d) of potentially limited theoretical relevance when predicting clinical outcomes (e.g., Olfactory HYPO may be unlikely to substantially influence the expression of other core features of autism). Future work should attempt to evaluate which, if any, of these poorly operationalized sensory subconstructs are relevant to autism research and clinical practice and if so, how they can be reliably measured.

As the majority of work examining sensory constructs in both autism and other clinical populations has utilized supra-modal scores from the SP, SEQ, SP-3D:I, or similar measures (e.g. [29, 30, 82, 149, 150]), our findings signal the need for sensory autism research to broaden the ways in which sensory reactivity differences are characterized, potentially shifting away from the field’s nearly exclusive reliance upon supra-modal HYPER, HYPO, and SEEK scores for this purpose. Single-modality measures of HYPER, HYPO, and/or SEEK represent a viable alternative method of assessing these constructs and may be particularly useful when substantive research hypotheses include associations with other sensory constructs in a single sensory modality (e.g., tactile detection thresholds, visual evoked potentials; see [151] for a recent example). Moreover, there is a great need to develop more comprehensive measures of modality-specific HYPER, HYPO, and SEEK subconstructs, either by expanding upon the item banks employed in the current study, adapting questionnaires used in other fields (e.g. [152]), or developing and validating entirely novel measures (e.g. [86]). By more densely sampling each subconstruct of interest, these measures could ostensibly increase the reliability, validity, conceptual breadth, and clinical utility of modality-specific response pattern scales compared to the short and relatively general item pools currently included in longer broadband sensory measures. Importantly, we are not suggesting that researchers entirely abandon the study of supra-modal sensory constructs—particularly HYPER, for which we have found some empirical support for the supra-modal response pattern—as there is certainly value in the investigation of these higher-order constructs as well. Rather, in future studies where both supra-modal and modality-specific subconstruct scores could feasibly be interpreted, we strongly recommend that researchers use contextual factors to determine which “level of analysis” is most appropriate or informative to answer the substantive research question(s) at hand.

For researchers who do choose to characterize sensory reactivity at the single-modality subconstruct level going forward (e.g., examining only Auditory HYPER or Auditory HYPO in the context of an auditory neuroscience study), it is notable that the “level of analysis” chosen to study a problem will likely frame the ways in which sensory features of autism are conceptualized and studied more broadly as an aspect of autism’s heterogeneity (e.g. [10, 153, 154]). In particular, when developing clinical interventions for sensory reactivity in autism, a focus on modality-specific sensory subconstruct outcomes may motivate clinicians and researchers to investigate the efficacy of intervention modalities that are more focused on specific subconstructs rather than sensory reactivity in general (e.g., the use of sound generators to treat hyperacusis, a specific type of Auditory HYPER [155]). Personalized interventions that seek to assess an autistic child’s specific pattern of sensory reactivity differences and ameliorate challenges associated with each domain could also be assessed within this framework, using modality-specific assessments of each sensory response pattern (e.g., an outcome measure specifically focused on Visual HYPER or Pain/Temperature HYPO) to monitor the effectiveness of each putative “active ingredient” of the intervention. A shift in measurement practices will also allow researchers to associate these single-modality behavioral subconstructs with psychophysical and/or neurophysiological measures within a given modality (e.g. [75, 76]), informing theories of the neurocognitive underpinnings of certain types of sensory reactivity in autism (e.g. [41, 156, 157]). Though we do not claim a single-modality perspective to be advantageous in all cases or for all research questions (particularly for those focused on “real-world” multisensory contexts), we believe that a greater diversity of theoretical approaches and frameworks within sensory autism research is needed to make optimal progress towards improving the lives of autistic people within this line of work.

In addition to our examination of the latent structure of sensory reactivity and assessment of evidence to support each “level of analysis,” we also employed random-effects integrative data analysis to estimate the meta-analytic associations between all sufficiently interpretable sensory subconstructs (i.e., General HYPER, six HYPER subconstructs, two HYPO subconstructs, and four SEEK subconstructs) and relevant demographic and clinical correlates (i.e., age, sex, cognitive abilities, adaptive functioning, core autism features, and co-occurring psychiatric symptoms). In general, the majority of associations were modest in size (with a few exceptions, e.g., large correlations between RBS-R RSM/Movement SEEK and RBS-R RSC/General HYPER), and Bayes factor tests indicated that many of the observed effects (particularly cross-sectional associations with age, sex, adaptive functioning, and cognitive abilities), were small enough to be practically equivalent to zero. Notably, none of the assessed sensory variables were significantly associated with sex or cognitive abilities as continuously quantified, although intellectual disability status was associated with moderately higher levels of Oral Tactile SEEK (i.e., seeking out the sensations of non-food objects in one’s mouth, not necessarily for consumption). Significant negative associations were also observed between certain HYPO/SEEK scores and adaptive behavior scores (with Pain/Temperature HYPO demonstrating the most consistent associations); however, these effects were relatively small in magnitude (|r| values < 0.199).

In line with the classification of sensory reactivity differences as a core diagnostic criterion for autism classified under restricted/repetitive behaviors and interests, most sensory subconstructs correlated moderately with one or more of the RBS-R subscales (i.e., RSM [lower-order repetitive behaviors] and/or RSC [higher-order repetitive behaviors]). Notably, the largest summary effect observed in the current study was the correlation between Movement SEEK and the RBS-R RSM subscale, although this was likely driven to some extent by overlapping item content (e.g., both SEQ 2.1 Item 27/SEQ 3.0 Item 76 and RBS-R item 4 contain jumping and spinning in circles as exemplars). Nevertheless, multiple RBS-R subscales demonstrated practically meaningful positive correlations with the majority of sensory constructs considered in the current study. Associated psychiatric symptoms also demonstrated small to moderate correlations with many HYPER subconstructs and several modality-specific HYPO and SEEK subconstructs, suggesting that outside of other core autism features, sensory reactivity (particularly HYPER) is most robustly related to transdiagnostic psychiatric symptomatology, particularly in the externalizing domain. Notably, as the sensory subconstructs of Tactile and Movement SEEK demonstrated practically significant positive correlations with externalizing symptoms and features of ADHD but not internalizing symptoms, these two domains may be reflective of an underlying liability for dysregulated or impulsive behavior (see also [81]). Multiple domains of SEEK also showed negative correlations with age, potentially suggesting that these traits decrease over time as children develop increased capacity to regulate their motor impulses with age (e.g. [158, 159]). Although cross-sectional correlations such as those explored in the current study are insufficient to determine causal relationships between sensory reactivity and other clinical constructs [160], the present findings can nevertheless be useful in generating hypotheses for future targeted investigations of the causal interplay between sensory constructs and other core/associated features of autism.

Despite only two HYPO subconstructs (Speech HYPO and Pain/Temperature HYPO) being considered within the analysis of meta-analytic correlates, it is notable that these two domains of sensory reactivity diverged strongly in terms of their correlations with non-sensory variables. Speech HYPO demonstrated practically significant positive correlations with all core autism features (i.e., SRS, RBS-R subscales) and domains of psychiatric symptoms, whereas none of these domains were associated with Pain/Temperature HYPO. Notably, a child not responding to their name or other speech stimuli is frequently considered a core feature of autism outside of the sensory domain, conceptualized as a failure to orient attention to socially salient stimuli (e.g. [161,162,163]). Thus, it is notable that observed Speech HYPO feasibly be present in the absence of underlying differences in sensory reactivity (e.g., due to differences in broader social or attentional processes). Future studies, particularly those that include multi-method assessments of both social-communicative and sensory factors, may be necessary to determine whether the underlying causes of Speech HYPO are indeed sensory in nature, thereby investigating the appropriateness of classifying this subconstruct as a sensory reactivity difference.

The Pain/Temperature HYPO subconstruct was significantly negatively associated with multiple domains of the VABS, though no correlations with core autism features nor psychiatric symptoms reached the threshold for practical significance. There was also a modestly increased level of Pain/Temperature HYPO in autistic individuals with a categorical label of intellectual disability, although this result fell slightly below our threshold for practical significance. Though these results seemingly indicate that insensitivity to pain and temperature covary with reduced adaptive functioning in the autistic population, we strongly caution against overinterpretation of these findings due to the substantial limitations of quantifying response to pain in autism based on solely reports from caregivers [164, 165]. Although a co-occurring diagnosis of intellectual disability or more significant impairments in adaptive behavior may be more common in individuals with additional rare neurological conditions that truly include insensitivity to pain as a symptom (e.g., congenital insensitivity to pain with anhidrosis [166]), it is also quite possible that proxy reporters such as caregivers underestimate the pain or discomfort of autistic children who are not able to communicate their internal states in typical ways [165]. With the recent development of methods to better capture the internal pain experiences of autistic individuals with intellectual disability and/or limited language [167], additional work is greatly needed to determine whether caregiver reports of seeming insensitivity to pain correspond with self-reports of pain experience in this population, providing more conclusive evidence for or against the claim that autistic individuals with more significant adaptive/functional impairments are truly less sensitive to pain and temperature than autistic individuals who are more cognitively- and adaptively-able (vs this difference being driven by atypical communication of pain or distress).

Overall, the findings of the current study with regard to studied HYPO subconstructs suggest that Speech HYPO and Pain/Temperature HYPO represent theoretically distinct aspects of the autism phenotype with completely non-overlapping significant correlates and divergent future directions relevant to construct validation. Therefore, for applied researchers hoping to investigate these aspects of the autism phenotype using caregiver-report questionnaires, we strongly recommend that these two HYPO subconstructs in particular be studied at the single-modality level, as the nuanced associations between modality-specific variables and external correlates may be obscured by the use of supra-modal HYPO scores that combine subconstructs into a single variable when assessing individual differences. Notably, it is currently unclear whether these HYPO subconstructs demonstrate equally divergent patterns of external correlations when measured using other techniques (e.g., clinician observation [84, 85]), and this remains an important avenue for future research.

Strengths and limitations

The current study had a number of strengths, including its very large sample size, representation of autistic children and adolescents across a wide range of ages and developmental levels, sensory phenotyping of many combinations of modalities and response patterns with widely used caregiver-report measures, and state-of-the-art statistical approaches that allowed for the pooling of partially overlapping sensory item scores and evaluation of between-dataset effect heterogeneity. However, it was not without limitations. Most notably, the studies that comprised the dataset utilized vastly different methods; each had substantially different inclusion/exclusion criteria, geographic locations, and assessment batteries. To allow for maximal pooling of similar data across studies, we combined measures of the same construct (e.g., different versions of the same questionnaire, standard scores on different measures of an ostensibly similar construct such as FSIQ or internalizing symptoms) into single variables, potentially introducing additional heterogeneity due to noninvariance between the different measures or measure versions. For sensory constructs, this pooling was also done at the item level to allow for different versions of the same measure (i.e., SP1 and SP2, SEQ-2.1 and SEQ-3.0) to be calibrated on the same latent scale using IRT. Though many items on different questionnaire versions were extremely similar, version-specific changes in anchor wording, item stems, or order effects could theoretically have resulted in noninvariance of the homologous items, again increasing overall heterogeneity. Nevertheless, the random-effects IDA model utilized in the current study allowed for the heterogeneity of each effect to be quantified (i.e., using ICCs and prediction intervals), thereby helping to contextualize both the population summary effect and the range of possible effects observable under different study conditions. The measurement of many sensory subconstructs was also a limitation, as despite the large initial item pool, a number of subconstructs had relatively few initial indicators and were, therefore, difficult or impossible to form into viable unidimensional scales from the start. For constructs that could not be operationalized in the current study using a psychometrically adequate unidimensional scale, we opted to use an ad-hoc single item indicator (or in one case, a doublet) such that these subconstructs would still remain in each supra-modal bifactor model. However, it is unclear whether the use of single-item indicators partially contributed to the psychometric inadequacy of the higher-order HYPO and SEEK constructs, and future studies in which all modality-specific subconstructs are adequately captured are necessary to rule out poor subconstruct measurement as a potential cause of supra-modal construct invalidity for sensory response pattern scores.

Another major limitation of the current study was the fact that multisensory items were removed from the questionnaires before psychometric analyses were undertaken. Although this choice greatly simplified the bifactor models and their computation due to the lack of specific-factor cross-loadings, it is notable that “real-world” sensory experiences are inherently multisensory in nature [168]. By removing items containing multiple sensory modalities, we may have inadvertently excluded a number of relevant sensory behaviors in real-world contexts from the measurement models, limiting the content validity of the supra-modal constructs operationalized by the general HYPER, HYPO, and SEEK factors. Though it remains unknown whether these items would have been retained in our models based on psychometric criteria or excluded due to misfit, future studies are warranted to investigate the utility and properties of sensory reactivity bifactor (or indeed more complex hierarchically structured) models that include multisensory items in addition to single-modality subconstructs.

As an additional limitation, the questionnaires used in the current study were all based on caregiver-report of a child’s behavior; even in cases where autistic individuals were capable of self-reporting on their own sensory experiences (or provided such data), this information was not included in the current investigation. As sensory percepts are fundamentally subjective experiences, reports solely based on the observations of untrained proxy reporters (i.e., caregivers) may be capturing only the most extreme and/or distressing sensory reactivity differences, potentially also introducing confounding according to the child’s language or communication ability (see also [169]). Moreover, it is quite possible that our conclusions regarding the inadequacy of supra-modal HYPO and SEEK scores (and/or the adequacy of supra-modal HYPER scores) are limited to caregiver-reported sensory measures, and additional work is needed to test the appropriateness of such scores using other measurement methods, including self-report (e.g. [170, 171]), clinician-rated behavioral observation (e.g. [84, 85, 172, 173]), and parent/caregiver interview (e.g. [84]) tools. Ideally, further studies of autistic youth and adults capable of self-report should attempt to utilize multimodal sensory measurements that include both self- and informant-reports simultaneously (e.g., [174]; see also [84] for a measure combining clinician observation with caregiver interview), therefore allowing both an individual’s internal experience and observed behavior to contribute to their ratings of sensory reactivity. For the subpopulation of autistic individuals who cannot reliably report on their own experiences (e.g., very young children, individuals with severe/profound intellectual disabilities, many of the individuals labelled with so-called “profound autism” [175, 176]), multimodal measures of sensory features remain just as important, despite self-reports being inaccessible, and we strongly encourage researchers to consider alternative ways to augment parent or caregiver-reported sensory questionnaires when examining sensory differences in this particular segment of the autistic population (e.g., [neuro]physiologic measures, behavioral observations, clinician ratings, parent/caregiver interviews, cognitively accessible psychophysical tasks [177,178,179,180,181]).

Considering the statistical limitations of the study, it is worth noting that all associations between caregiver-reported sensory reactivity differences and clinical/demographic variables were estimated using models that did not control for other relevant demographic or clinical variables (e.g., age, sex, IQ/DQ, or language level). Thus, the meta-analytic correlation estimates in our current study may overestimate the strength of hypothesized effects due to the presence of (often substantial) residual confounding [160]. On the other hand, the current study only examined unconditional, linear associations between variables; therefore, it is also possible that the strength of any conditional and/or nonlinear relationship was underestimated. Future work should attempt to quantify such effects of various sensory predictors on relevant clinical outcomes over and above other potentially confounding variables (e.g. [103]). Lastly, it is notable that the current investigation relied solely on cross-sectional data, limiting our ability to draw conclusions regarding potential developmental trends in sensory features or the predictive validity of sensory reactivity for other relevant outcomes. Although some studies have begun to demonstrate the predictive utility of sensory reactivity in autistic children and other populations such as infants at elevated likelihood to develop autism (e.g. [182,183,184,185,186,187]), these studies have largely used supra-modal response pattern scores; therefore, additional large-scale, longitudinal studies are necessary to determine which single-modality sensory subconstructs (or combinations thereof) can be utilized as clinically-relevant predictors of core and frequently co-occurring features of autism.

Conclusion

The past decade has seen a substantial rise in the number of studies examining the sensory aspects of autism [30], but to date, relatively little published work has examined the latent structure or construct validity of proposed sensory (sub)constructs, particularly those that span multiple sensory modalities. By compiling a large, cross-site dataset of richly-phenotyped autistic children, we conducted an integrative data analysis that specifically investigated the hierarchical structure of the three canonical sensory “response patterns” (i.e., HYPER, HYPO, and SEEK). Although much research to date has focused on the examination of response patterns that span multiple modalities, the current study demonstrates that some of these supra-modal construct scores (in particular those purported to tap hyporeactivity [HYPO] and to a lesser extent, sensory seeking [SEEK]) are contaminated to a substantial degree by modality-specific variance, making these supra-modal scores difficult to interpret when such variance is not explicitly partialed out (e.g., in the context of a latent variable model). Depending upon the nature of the research question (e.g., if assessing sensory correlates within the same modality or a mechanism of change that is likely to work at the level of a single sensory modality rather than the supra-modal level), modality-specific subconstruct scores may be preferable, or at least represent a viable alternative to supra-modal scores for characterizing individual differences in sensory reactivity in autistic children and adolescents, though additional research is needed to further develop modality-specific sensory measures beyond the limited subsets of items available in broadband inventories currently in use. We therefore recommend that applied researchers studying the sensory aspects of autism tailor the sensory reactivity construct(s) they hope to measure to their specific research questions, rather than exclusively and uncritically relying on supra-modal response pattern scores.

Using integrative data analysis models, we also examined meta-analytic bivariate associations between single-modality sensory subconstructs and various other clinical outcomes, with other measures of core autism features (e.g., subscales of the RBS-R) and psychiatric symptoms demonstrating particularly strong relations with most aspects of sensory reactivity. Although the empirically derived sensory subconstruct measures in the current study correlated meaningfully with other clinical outcomes, there remains a great need to expand existing measures and/or to develop novel measures that sample each modality-specific subconstruct (and when relevant, multiple distinct aspects or subdimensions of that subconstruct) in greater detail, as well as to explicitly investigate caregiver-reported reactivity in multisensory contexts that were not tested in the current study. Notably, the field of sensory research remains ripe for cross-disciplinary collaboration between clinical and behavioral scientists, occupational therapists, psychologists, neuroscientists, and autistic individuals themselves (e.g. [188,189,190,191,192]), as a synthesis of clinical, behavioral, neuroscientific, and lived experience perspectives on sensory reactivity within and across modalities is likely to produce valid and useful assessments of specific aspects of the autism phenotype, their underlying psychological and neural mechanisms, and their unique clinical correlates. Basic and applied research into the sensory features of autism has immense potential to improve the lives of many autistic individuals across the lifespan, but in order to realize this potential, systematic efforts must be made to rigorously define all sensory constructs of interest and develop psychometrically sound measures of such constructs for use in both research and clinical practice.