Introduction

Autism spectrum disorder (ASD) is an early-onset neurodevelopmental disorder characterized by impairment in social communication/interaction and restricted repetitive behaviors (American Psychiatric Association, 2013). It has been well known that core symptoms of ASD manifest in very early childhood and sustain throughout development. Current epidemiologic estimates in the US indicate that ASD occurs in 1 out of 44 children (Maenner, 2021) and causes significant burden for individuals, their family, and society (Baxter et al., 2015; Kohane et al., 2012; Lavelle et al., 2014; Marsack-Topolewski & Church, 2019; Ou et al., 2015; Taneja et al., 2017).

Given the importance of early intervention to reduce lifelong impacts of ASD, accurate screening is a necessary first step. Screening measures typically involve parents and/or teachers to identify children with ASD concerns. Common screening measures include the Modified Checklist for Autism in Toddlers-Revised with Follow Up, Social Responsiveness Scale-2, Autism Spectrum Rating Scales, and Social Communication Questionnaire. More comprehensive diagnostic measures consist of parental interview, such as the Autism Diagnostic Interview–Revised and the Monteiro Interview Guidelines for Diagnosing the Autism Spectrum, and direct behavioral observation, such as the Autism Diagnostic Observation Schedule-2 and the Childhood Autism Rating Scale-2 (Constantino & Gruber, 2012; Goldstein & Naglieri, 2009; Le Couteur et al., 2003; Lord et al., 2012; Lord & Rutter, 2003; Monteiro & Stegall, 2018; Robins et al., 2014; Schopler et al., 2010). As the prevalence of ASD has significantly increased over the past 10 years with the revision of diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders-5th edition (DSM-5), which identifies broader phenotypes of autism, there has been greater need for screening measures that can effectively identify children with ASD concerns (American Psychiatric Association, 2013; Maenner, 2021; Maenner et al., 2014).

The Autism Spectrum Rating Scales is a relatively understudied instrument which was developed for screening of children with ASD concerns (Goldstein & Naglieri, 2009). Since its first publication in 2009, when it was originally developed within the framework of the DSM-IV, it has gone through two modifications: (1) in 2012, it was altered to accommodate for non-verbal children; (2) in 2014, it was updated to match the DSM-5 criteria. It has 2 age-versions (age 2–5 years and 6–18 years) and 2 informant-versions (parent and teacher rating). The Autism Spectrum Rating Scales manual indicates that the initial psychometric study included 2560 ratings with 40 males and 40 females at each age, representative of the US population across several demographic variables. The initial psychometric study, as shown in Table 1, compared the ASD sample with the general population sample, and the Autism Spectrum Rating Scales 6–18 years Parent Report (ASRS) yielded excellent sensitivity and specificity (both > 90%). The test-retest reliability (rated over 2- to 4-week interval) and the inter-rater reliability were also excellent (correlation coefficient ranging 0.87–0.9 and 0.83–0.92, respectively). The convergent-validity was investigated by comparing the ASRS with the Gilliam Autism Rating Scale-2nd edition (GARS), the Gilliam Asperger’s Disorder Scale (GADS), and the clinician administered the Childhood Autism Rating Scale (CARS) (Gilliam, 2001, 2006; Schopler et al., 2010). The ASRS Total T-score had moderate correlations with the GARS Autism Index and the GADS Asperger’s Disorder Quotient (mean corrected correlation coefficient 0.59 and 0.55, respectively). The correlation between the ASRS Total T-score and the CARS total score was a bit lower (mean corrected correlation coefficient 0.43).

Table 1 Studies with Autism Spectrum Rating Scales 6–18 years parent report

After the initial psychometric study by the developers, there were three replication studies. The Modified Chinese ASRS (MC-ASRS) was developed for the Chinese people and it demonstrated good sensitivity (94%) and favorable specificity (82%) with excellent area under the curve (AUC, 0.95) in Receiver Operating Curve (ROC) analysis, when comparing the ASD sample with the general population sample (Zhou et al., 2017). However, when comparing ASD vs. intellectual disability (ID) in the MC-ASRS study, the measure showed lower sensitivity (77%) and poor specificity (52%) with a suboptimal AUC (0.71) (Li et al., 2018). The most recent ASRS study conducted in the US with 139 children with ASD and 283 children with non-ASD clinical disorders indicated high sensitivity (91.4%), but very poor specificity (16%) and AUC (0.60) (Camodeca, 2019). In summary, ASRS studies comparing ASD vs. general population demonstrated good validity, however it requires further validation in clinically referred samples comparing ASD vs. non-ASD clinical disorders.

The COVID-19 pandemic has resulted in both challenges and opportunities in evaluating ASD with a hybrid model of in-person and telehealth. The challenges were that there are limited available standardized observational measures to be used in telehealth, and it has been recognized that direct behavior observation of children is a crucial component in the ASD diagnostic process (Ellison et al., 2021; Falkmer et al., 2013; Huerta & Lord, 2012). However, the opportunities for telehealth evaluations during the pandemic have led clinicians to rethink the ASD diagnostic process and develop alternative models, and there is more weight on parental information on children’s behaviors when diagnosing ASD (Conti et al., 2020; Ellison et al., 2021; Jang et al., 2022; Matthews et al., 2021; Reisinger et al., 2022; Wagner et al., 2021, 2022; Zwaigenbaum et al., 2021). Currently, there is no ASRS replication study in the setting of a hybrid model of telehealth and in-person diagnostic evaluation during the pandemic and it is an important gap to be addressed for clinicians who evaluate children with ASD concerns. The pandemic has caused multiple psychological and social complications in the population, including social isolation, loss of direct interpersonal interactions essential for normal development, increasing unemployment and poverty, and increasing rate of mental health problems in children and adults (Bzdok & Dunbar, 2022; Giesbrecht et al., 2022; Han et al., 2020; Kwong et al., 2021; Meherali et al., 2021; Salmon et al., 2022). These factors may have caused more challenges when diagnosing ASD as the pandemic has influenced the nature of case presentations, though ASRS administration has remained the same before and after the pandemic.

Given the limited amount of research on ASRS (Table 1), there is a need to replicate previous findings with an independent sample in the setting of a telehealth and in-person hybrid model of diagnostic evaluation. To fill this gap in the literature, the current study aims: (1) to assess validity of the ASRS (sensitivity, specificity, positive predictive value, negative predictive value, accuracy, AUC) in a clinically referred sample of children during the pandemic, (2) to compare its performance across demographic, child characteristics, and treatment/intervention services within the ASD group.

Method

Participants

Data for this cross-sectional study were obtained from 490 children (347 with ASD, 143 without ASD) at a Mid-Atlantic urban tertiary ASD-specialty center between July 2020 and March 2022. The inclusion criteria were: (1) age between 6 and 17 years, (2) completion of ASRS (6–18 years Parent Report) at most 6 months before the diagnostic evaluation, (3) completion of a comprehensive diagnostic evaluation by a physician or psychologist, (4) determination of clinical diagnosis (ASD vs. No ASD), and (5) parental consent to participate in the clinical research registry approved by the local Institutional Review Board, allowing their child’s de-identified information in the medical record to be used for research. The registry’s consent rate for this study was 82%; details on the registry were reported elsewhere (Kalb et al., 2019).

Measures

Sociodemographics

Demographic information about the child was obtained from the medical record and electronic scheduling system. This includes child age (at ASRS administration), sex (male vs. female), race/ethnicity (classified as White, Black, Hispanic, Asian, Multi-Racial, Other/Unknown), insurance type (medical assistance, private insurance, military insurance), and the highest level of education attained by the child’s primary caretaker (less than high school, high school diploma, Associates or trade school, Bachelors, Graduate). To avoid issues with multicollinearity, insurance type and primary caregiver education were combined into a single variable, ‘social capital’, as follows. When the child had private or military insurance and the primary caregiver had a graduate level of education, this was determined to be high social capital. Low social capital corresponded to an undergraduate or lower level of education, in combination with medical assistance for the child. Other combinations were labeled as moderate social capital. A similar variable has been used previously (Azad et al., 2019).

ASRS

The ASRS (6–18 years Parent Report) has 71 items, is scored on a 5-point Likert scale, and yields two classification T-score scales (Total, DSM-5), three diagnostic T-score subscales (Social Communication, Unusual Behavior, Self-regulation), and eight treatment T-score subscales (Peer Socialization, Adult Socialization, Social/Emotional Reciprocity, Atypical Language, Stereotypy, Behavioral Rigidity, Sensory Sensitivity, Attention). All T-scores were age normed based on the standardization sample. Details about the ASRS were presented above, and further information can be found in the manual (Goldstein & Naglieri, 2009).

Child Healthcare and Educational Services

Information about the child’s current involvement in therapies and interventions was obtained by parent survey during the clinic’s intake process. Developmental therapies targeting ASD symptoms or developmental delays (early intervention, speech therapy, occupational therapy, physical therapy, social skills training) were grouped together in a single dichotomous variable. Similarly, a second dichotomous variable was created to describe psychological interventions (behavioral therapy, individual counseling, family therapy, and/or academic behavioral intervention plans). A third dichotomous variable describes if the child is prescribed any psychiatric medication, and a fourth dichotomous variable indicates if the child has an individual education plan (IEP).

Child Characteristics

Information about previous ASD diagnoses was also collected at intake, including educational classification of autism, which is distinguished from clinical diagnoses made outside of the school setting. Given the ASRS has a separate scoring procedure for children who cannot speak or speak infrequently (Goldstein & Naglieri, 2009), the information on child’s expressive language ability was obtained via parent report at the time of ASRS administration. A child was classified as nonverbal (“cannot speak or speaks infrequently”) when the parent indicated that the child did not use phrases or more complex sentences, or when the parent indicated that the child used alternate forms of communication (picture exchange communication system, speech generating device, and/or sign language).

Procedure

The ASRS was completed by parents of 6–17 years old children at most six months before the diagnostic evaluation. The median time and mean time between the ASRS administration and the diagnostic evaluation were 47 days and 64 days, respectively. The diagnostic evaluation was conducted by physicians (38%) or psychologists (62%), often coupled with speech-language pathologists, who specialize in neurodevelopmental and mental disorders, with emphasis on ASD. The diagnosing physician or psychologist developed a clinical diagnosis (ASD vs. No ASD) based on DSM-5 criteria, using parent-reported symptoms, medical and developmental history, ASRS results, test results of speech-language pathologists when available, and direct behavior observations of children. Due to the COVID-19 pandemic, assessments were conducted via telehealth or in-person with personal protective equipment, and the Autism Diagnostic Observation Schedule-2 (ADOS-2) was not administered consistently and validly. Even though some portions of ADOS-2 were used in the telehealth assessments and a modified ADOS-2 was conducted with personal protective equipment at the in-person visits, it is not valid to yield an algorithm score or ADOS-2 classification in these situations, and thus we do not include this in the current study.

Analysis

Descriptive statistics were used to characterize sociodemographics of the sample, whereas standard bivariate techniques (Pearson chi-squared, Mann-Whitney U test) compared differences across ASD and Non-ASD groups. Variables examined included race/ethnicity, sex, age at ASRS administration, telehealth vs. in-person ASD evaluation, social capital, previous ASD diagnosis, parent-reported verbal ability, and current participation in the four types of intervention described above. To investigate the classification abilities of the ASRS to distinguish between ASD and Non-ASD groups, a variety of statistics were employed. This included t-tests and corresponding effect sizes. Cohen’s D was used as an effect size indicator and classified as small (0.2–0.5), medium (0.5–0.8) or large (> 0.8) (Cohen, 1988). ROC analyses were then carried out for all ASRS scales. Area under the ROC curve (AUC) and its significance (Mann-Whitney U test) were determined for each scale. AUC can be interpreted as no discrimination (< 0.5), poor (0.50–0.69), acceptable (0.70–0.79), excellent (0.80–0.89), or outstanding (≥ 0.90) (Hosmer et al., 2013). For diagnostic scales (Total T-score and DSM-5 T-score), sensitivity, specificity, positive and negative predictive value, and accuracy were calculated with various cutoff values.

An analysis of the performance of the DSM-5 T-score in the ASD group was then performed to examine variables associated with agreement of the DSM-5 scale (at cutoff T = 65) with ASD diagnosis (false negatives vs. true positives). Sex, race/ethnicity, social capital, previous ASD diagnosis, and the four groups of therapies/interventions were used as indicators. Since verbal ability and age at ASRS administration are part of the ASRS scoring algorithm, they were not considered. First, bivariate associations between each variable and agreement were examined using chi-square tests. Variables that were significant at p = 0.05, were then used as independent variables in a logistic regression. All analyses were completed using R (v4.1.1) (R Core Team, 2021) and its standard data processing and statistical packages, including ROCR (v1.0-11) (Sing et al., 2005) and effsize (v0.8.1) (Torchiano & Torchiano, 2020).

Results

Descriptive and Bivariate Analysis of ASD Status and Characterizing Variables

The sample consisted of 490 children with a completed ASRS, including 143 (29%) without ASD and 347 (71%) with ASD. Telehealth visits accounted for 55% of the diagnostic evaluations in the sample. As shown in Table 2, the sample was predominantly male (71%) with ages between 6 and 17 (mean 10.4, standard deviation 3.3). About half (52%) of the sample was White non-Hispanic (20% Black, 15% multiracial, 7% Asian, 4% Hispanic, 2% other/unknown) and about a third (29%) had low social capital as defined above (39% moderate, 32% high). A significantly (p = 0.004) higher proportion of males (75%) received a diagnosis of ASD, compared to females (61%). A clinical ASD diagnosis was also significantly associated with having a previous ASD diagnosis (p < 0.001), lower parent-reported verbal ability (p = 0.04), non-White race/ethnicity (p = 0.03), current receipt of developmental therapies/interventions (p < 0.001), and an individual education plan (p < 0.001). There was no significant association between ASD status and psychological/psychiatric treatment, social capital, age at time of ASRS administration, or telehealth/in person evaluation (all p > 0.05).

Table 2 Bivariate analysis of ASD and characterizing variables

Effect Size and ROC Analysis of ASRS Scales

While both the ASRS Total T-score and DSM-5 T-score were significantly elevated in the ASD group (both p < 0.05), the effect sizes were small (Cohen D 0.25 and 0.35, respectively) and AUCs demonstrated poor discrimination between groups (AUC 0.57 and 0.60, respectively). The ASRS diagnostic subscales had similar performance, except for the Self Regulation subscale, which was not significant. Details can be found in Table 3. Information for the ASRS treatment scales can be found in Supplementary Table 1. Frequency distributions for ASD and non-ASD groups can be found in Fig. 1 for the ASRS Total T-score and DSM-5 T-score. Notably, ASRS scores across groups overlapped substantially, indicating poor discrimination. Sensitivity, specificity, positive and negative predictive values, and accuracy for different cut-off points are reported for the ASRS Total T-score in Table 4, and the ASRS DSM-5 T-score in Table 5. An optimal cutoff could not be chosen for either scale, as sensitivity and specificity could not be jointly optimized. ROC curves for the ASRS Total and DSM-5 T-scores can be found in Fig. 2. Specificity was the greatest concern, which can also be seen in the low negative predictive values. However, given this is a clinical sample of ASD with a 70% diagnostic rate, even sensitivity was not optimal. On the DSM-5 scale 7% of ASD cases, and on the Total scale 12% of ASD cases, would not meet even the “slightly elevated” cut-off values (T-score 60).

Table 3 ASD group differences in ASRS diagnostic scales and subscales
Fig. 1
figure 1

Frequency distribution

Table 4 ROC analysis for ASRS total T-score

Analysis of DSM-5 T-Score Sensitivity Across Characterizing Variables

To understand the factors related to misclassification, demographic, previous ASD diagnosis, and treatment/intervention variables were compared across False Negatives (N = 111) and True Positives (N = 234) in Table 6. Significant differences in sensitivity were found between values of race/ethnicity and social capital (p < 0.05). There were no significant associations between sensitivity and previous ASD diagnosis or current treatments/interventions. The logistic regression model revealed high social capital (compared to moderate social capital) and multiracial race/ethnicity (compared to White non-Hispanic) were associated with a decreased and increased likelihood of a true positive, respectively. See Table 7 for model coefficients.

Table 5 ROC analysis for ASRS DSM-5 T-score
Fig. 2
figure 2

ROC curves

Table 6 DSM-5 T-score subgroup sensitivity at cutoff 65
Table 7 Multivariate analysis of DSM-5 T-score sensitivity across subgroups

Discussion

The current study found the ASRS did not perform well as a valid measure for detecting ASD in children referred to a specialty ASD clinic. This replicates prior findings while employing a larger, more diverse sample. Specifically, we found both the ASRS Total T-score and DSM-5 T-score demonstrated poor AUC at or below 0.6, and there were no optimal cut-off scores that could discriminate between the ASD and non-ASD groups. Since this is a high-risk sample of ASD (70% prevalence), any cut-off to maximize sensitivity to an acceptable level (90% or above) led to a very high rate of false positives (80% or more). The current finding is the most consistent with Camodeca et al. that showed maximizing sensitivity led to poor specificity combined with overall low accuracy and AUC when comparing an ASD group vs. a non-ASD clinical group (Camodeca, 2019).

While prior ASRS studies comparing ASD group vs. community group showed good clinical utility to identify ASD from the general population sample (Goldstein & Naglieri, 2009; Zhou et al., 2017), the current study (after the pandemic) and Camodeca et al’s study (before the pandemic) indicate that the ASRS has limited clinical utility for differentiating ASD from other clinical disorders. As a parallel, the issue of false positives and poor AUC in other ASD screening measures using clinical samples have also been documented in the literature. For example, the specificity of the Social Responsiveness Scale-2 (SRS-2) was as low as 8% and the AUC of the Social Communication Questionnaire (SCQ) was as low as 0.67 (Aldridge et al., 2012; Capriola-Hall et al., 2021; Chesnut et al., 2017; Cholemkery et al., 2014).

While ASD screening measures aim to identify children at risk of ASD in the population, diagnostic measures (e.g. ADI-R, ADOS-2) aim to assist in confirming ASD and differentiating other mental and developmental disorders. The ADOS-2 is often regarded as a gold standard in ASD diagnostic evaluation (Falkmer et al., 2013; Harstad et al., 2015). Recent ADOS-2 replication studies with the largest sample size (N = 3556) in the literature indicated that using both lower threshold and higher threshold with clinical interpretation ADOS-2 yielded high sensitivity (95%+) and favorable specificity (63–73%) (Hong et al., 2021, 2022). Another replication study with a large sample size (N = 2158) showed high sensitivity (85.4–100%) and specificity (80.4–96.8%) (Kim et al., 2022). Even though false positives occur when using the ADOS-2, it provides much clearer diagnostic directions than screening measures. Using screening measures at the primary care setting and diagnostic measures at ASD specialty clinics, as a tiered system of care delivery, would yield a more efficient and accurate diagnostic process.

In our study, though ASRS Total T-score and DSM-5 T-score were statistically different between ASD group vs. non-ASD clinical group, the effect size was small. This finding indicates that similar levels of ASD symptoms were reported in both the ASD and non-ASD clinical groups in our study. ASD traits in mental and developmental disorders have been documented in the literature and it is well known that diverse mental and developmental disorders are associated with increased ASD traits, including intellectual disability, attention deficit hyperactivity disorder, bipolar disorder, depressive disorder, personality disorder, anxiety disorder, obsessive compulsive disorder, eating disorder, etc. (Abu-Akel et al., 2017; Domes et al., 2016; Dudas et al., 2017; Freeth et al., 2013; Griffiths et al., 2017; Grzadzinski et al., 2011; Hoekstra et al., 2009; Postorino et al., 2017). Considering overlapping features in ASD and other mental/developmental disorders, clinicians need to combine all clinical information to determine an ASD diagnosis, including screening measures, observational measures, parental interview, and medical and developmental history.

Using a cut-off value of 65, the current study also found a difference in true positives vs. false negatives in the ASD group based on race/ethnicity and social capital. Being multiracial was associated with a lower false negative, and having high social capital was associated with a higher false negative. Given limited knowledge on social factors influencing ASRS measurement bias, there are multiple potential explanations. First, it is possible that there was a referral/selection bias for diagnostic evaluations. Multiracial children may be less likely to be referred unless they display high symptoms (causing inclusion of more severe cases) and children with high social capital may be more likely to be referred (causing inclusion of less severe cases), which may have contributed to this finding. Second, it is possible that there was different service utilization for multiracial children (less service utilization) and children with high social capital (more service utilization) which led to differing severity of ASD symptoms. This topic requires more research to elucidate, with a longitudinal design along with both parental questionnaires and observational measures.

The current study should be interpreted in light of its strengths and weaknesses. For strengths, the study sample was fairly large and diverse in race/ethnicity (48% of non-White) and social capital (29% with low social capital), representing the real world and improving generalizability of the findings. The sample also included confirmed clinical diagnoses. For limitations, use of the clinical registry could introduce selection/referral bias. Physicians and psychologists who determined diagnosis were not blind to ASRS results and it is possible that they were biased by ASRS results in the diagnostic determination process. However, this would likely lead to more alignment between the clinician and the ASRS and our results likely overestimate agreement. We could also not implement the ADOS-2 systematically and validly for ASD diagnosis due to the pandemic. Our practice of not using ADOS-2 scores due to required variations in administration (e.g., clinician wearing a mask; administration virtually) reflects current clinical practice at ASD-specialty centers during the pandemic. Other important developmental information (e.g. intellectual quotient, language level, adaptive functioning), which would have better characterized the sample and may have influenced validity statistics with stratification, was not available. In the literature, it has been well noted that intellectual disability and/or language disorder diagnoses are associated with increased autistic traits and false positives in ASD evaluations (Andreou et al., 2022; Hoekstra et al., 2009; Leyfer et al., 2008; Li et al., 2018). Though the ASRS has two scoring algorithms for fluently speaking children vs. nonverbal children, language level information would have brought more insight on how the ASRS performs. The time interval between the ASRS administration and diagnostic determination in this study was set for 6 months due to the waitlist at our center. While it would be ideal to minimize the time interval to minimize potential behavioral changes that could reduce agreement between the ASRS and diagnostic determination, this would have significantly reduced our sample size, power, and generalizability. Despite these limitations, the current study adds important knowledge on ASRS and ASD screening measures for clinicians who evaluate ASD concerns.