Autism spectrum disorder (ASD) is characterized by impairments in social relatedness, communication and restricted patterns of behavior and interests (APA 2000). The variability of symptom expression in ASD is striking, a fact which has contributed to the continuous revisions and broadening of definitions and diagnostic criteria since Kanner’s (1943) original descriptions. Symptoms vary depending on level of cognitive functioning, verbal ability and age, amongst other things (Volkmar et al. 1997). The current consensus is that these differences are due to variations in severity rather than distinct subtypes (Gilchrist et al. 2001; Howlin 2003). This is reflected in the proposed revisions for DSM-V, where the diagnosis Asperger’s disorder is subsumed under the diagnosis Autism Spectrum Disorder.

In Britain, the prevalence of undiagnosed autistic adults is found to be 1% (Brugha et al. 2009). This indicates undiagnosed ASD is a major public health issue in England, and most likely in other countries as well. Since most cases of ASD are diagnosed in childhood much effort has been put into developing strategies for early detection. Thus, many ASD individuals with milder impairments go through childhood and adolescence without receiving a diagnosis. In a large British study, Barnard et al. (2001) found that 29% of individuals with high functioning autism and 46% of those with Asperger’s disorder had not received this diagnosis until late adolescence or adulthood, which is in line with experiences in Sweden (Rydén and Bejerot 2008). One reason for this is probably the relatively recent inclusion of Asperger’s disorder in the diagnostic manuals DSM-IV and ICD-10. Other reasons may be that high intelligence and verbal ability can compensate for, and camouflage, other impairments, or that existing difficulties are mistaken for expressions of other psychiatric or psychosocial problems (Gillberg 2002). Also, other psychiatric disorders and symptoms often coexist with ASD (Bejerot and Wetterberg 2008). Impairments may become more pronounced in the transition to adulthood, when demands on self reliance and ability to structure one’s life increase, while social skill becomes even more important for academic as well as occupational achievements (Hendricks and Wehman 2009; Tantam 1991). Thus it is important to be able to make a diagnosis of ASD in adults.

The symptoms and impairments of ASD manifest themselves differently with increasing age. Several cross sectional and longitudinal studies suggest a trend towards general symptom abatement in adolescence and adulthood (Billstedt et al. 2007; Esbensen et al. 2009; Fecteau et al. 2003; Seltzer et al. 2003; Seltzer et al. 2004; Shattuck et al. 2007). There is some evidence that there is a risk of excluding older, high functioning individuals when standard diagnostic instruments and algorithms are utilized (Boelte and Puostka 2000; Fecteau et al. 2003; Lord et al. 1994).

There are several specific difficulties in diagnosing autism spectrum disorders in adults. Significant others who can provide information about childhood symptoms may be absent, which is necessary both for ADI-R (Lord et al. 1994) and DISCO (Leekam et al. 2002). Also, many subjects may not want to involve their parents in the diagnostic procedure. Presently there are only two validated, self-administered scales that purport to measure autistic symptoms in adults. One is the Autism Spectrum Quotient (AQ, Baron-Cohen et al. 2001) a research and screening instrument described in detail below. The other is the Ritvo Autism and Asperger Diagnostic Scale (RAADS, Ritvo et al. 2008).

The RAADS was developed (Ritvo et al. 2008) to accommodate the need for diagnostic tools specifically tailored for adults with ASD. The objective of the present study is to evaluate the Swedish version of the RAADS-R (a modified version of the RAADS) with respect to internal consistency, test re-test reliability, diagnostic accuracy and concurrent validity.

Methods

Participants

The total sample comprised 272 adult subjects ages 19–75. Two groups of participants were recruited: 75 with ASD (the ASD group) and 197 without ASD (comparison cases). See Table 1 for their sex ratio and age distribution. Subjects with ASD were recruited among patients diagnosed at the Neuropsychiatric unit, Northern Stockholm psychiatry (n = 17) or at a specialized unit in Lund (n = 6). In addition, 52 subjects with ASD who were participants in various research projects at Northern Stockholm Psychiatry were included. All subjects with ASD were examined by an experienced clinician and their diagnosis was confirmed by either the Autism Diagnostic Observation Schedule-Generic (ADOS-G, Lord et al. 2000) (in Stockholm) or the Diagnostic Interview for Social and Communication Disorders (DISCO) (in Lund). Administration of these instruments requires extensive training. Seventy-three subjects were diagnosed with Asperger’s disorder, and 2 with PPD-NOS (atypical autism). The standardized assessment of ASD in the Neuropsychiatric units in Stockholm and Lund include intelligence testing with WAIS. All included subjects had an IQ above 70, in other words no one fulfilled the diagnostic criteria for intellectual disability. All subjects had received their initial ASD diagnosis in adolescence or adulthood.

Table 1 Sex ratio and age by group

The comparison cases consisted of 61 doctors and medical students, 69 university students from three campuses in Sweden, and 60 subjects who comprised comparison cases in the research studies mentioned above. In addition, 7 psychiatric patients who were assessed for ASD but did not meet criteria were included. Out of these subjects, 6 met criteria for other psychiatric disorders (schizotypal personality disorder, ADHD, social anxiety disorder, depression, bipolar disorder, and delusional disorder). In the comparison cases another 6 subjects reported that they had a psychiatric diagnosis (depression, social anxiety disorder, generalized anxiety disorder, personality disorder NOS, and obsessive compulsive disorder). The study was approved by the Regional Ethic committee in Stockholm, and informed consent was obtained from all participants.

Measures

Two measures were used in the study: RAADS-R (Ritvo et al. 2010) and the AQ (Baron-Cohen et al. 2001). The RAADS-R is a revised version of the Ritvo Autism and Asperger Diagnostic Scale (RAADS), a self rating scale developed by Ritvo et al. (2008) to serve as an aid in the diagnosis of ASD in adults of normal intelligence. Items were formulated from DSM-IV and ICD-10 criteria for autism, which were operationalized to match the symptom expression in adults based on the authors’ clinical experience. The original RAADS encompassed 78 items which were divided into three subscales to assess functioning in the domains of social interaction, language/communication and sensory motor/stereotypies (Ritvo et al. 2008). Following the 2008 pilot study, some alterations were made to the scale. Revisions included elimination of three items to improve internal consistency, adding more items on circumscribed interests, and some modifications to the subscales, like splitting up the sensory motor/stereotypies subscale into two separate scales.

Presently, the RAADS-R encompasses 80 items which are divided into four domains to assess functioning in: (a) social interaction, (b) language, (c) circumscribed interests and (d) sensory motor symptoms. Each item is formulated as a statement from the patient’s point of view (e.g. “I often don’t know how to act in social situations”). 17 items are reversed in order to avoid response bias and to elicit information about skills or preferences acquired throughout the life span (e.g. “I like to have close friends”) (Ritvo et al. 2008). See “Appendix” for the full content of the various subscales. The statements are answered on a four point Likert scale with the qualitative alternatives “never true”, “true only when I was young (before the age of 16)”, “true only now” and “true now and when I was young”. The 63 “positively worded” statements are scored from 0 to 3, so that the longer a symptom has been present the more points it yields, and the 17 reversed statements are scored in the reverse order (marked with an * in the “Appendix”). Higher scores are indicative of ASD in all subscales. The original RAADS pilot study (Ritvo et al. 2008) yielded promising results. In a sample comprising 37 subjects with autistic disorder or Asperger’s disorder, 41 subjects with no psychiatric condition and 16 subjects with various psychiatric disorders outside the autism spectrum, RAADS demonstrated perfect sensitivity and specificity, as all subjects with an ASD obtained scores of 77 or higher whereas all subjects without an ASD scored at or below 64. Internal consistencies for the three subscales ranged from poor (α = 0.60) to good (α = 0.84). In a recent multi-center study the RAADS-R also demonstrated excellent diagnostic accuracy as well as improved internal consistency (Ritvo et al. 2010).

The RAADS-R was translated into Swedish by Susanne Bejerot, M.D., with assistance of Dr Lena Nylander. It was back translated by a bilingual translator, after which it was compared to the original and no modifications were deemed necessary.

The AQ (Baron-Cohen et al. 2001) was designed as a brief, self-administered questionnaire purporting to measure the degree to which any adult with normal intelligence has “autistic traits”. The rationale underlying the scale is the assumption that autism lies at the upper end of a spectrum of traits which are normally distributed in the population (Baron-Cohen et al. 2001). The AQ comprises 50 items, divided into five domains: (a) social skill, (b) communication, (c) attention switching, (d) attention to detail, and (e) imagination. The questions are answered on a 4-point Likert scale, where “definitely disagree” and “slightly disagree” are scored as 0, and “slightly agree” and “definitely agree” is scored as 1 for half the questions, while the rest are reversely worded and scored.

The AQ has been evaluated both as a research instrument and as a screening instrument. In a pilot study, Baron-Cohen et al. (2001) found that AQ scores produced the hypothesized group differences between subjects with and without ASD, between students of science versus humanities, and between men and women in the general population. AQ has also been found to have screening properties (Hoekstra et al. 2008; Woodbury-Smith et al. 2005), however, in one study it did not differentiate between patients with mild ASD and patients with other psychiatric conditions (Ketelaars et al. 2007). The internal consistencies of the subscales have ranged from poor to fair in different studies (Austin 2005; Baron-Cohen et al. 2001; Hoekstra et al. 2008; Hurst et al. 2007). Several studies examining the factor structure of the AQ have found that a two- or three-factor solution fitted the data better compared to the five originally proposed domains (Austin 2005; Hoekstra et al. 2008; Hurst et al. 2007). Swedish participants were administered a translated version of the AQ which has not been validated.

Procedure

All participants completed RAADS-R. A subset of 39 ASD patients and 49 comparison cases completed AQ as well. If the subject did not understand a question an investigator was available to offer clarification. All personal data was coded and all data analyses were made in SPSS, version 17. The response rate was set at a minimum of 80% of the questions for inclusion in the study. This led to the exclusion of two subjects, both female ASD patients. For the remaining data, isolated missing scores were replaced with the individual mean.

Results

Internal Consistency and Test–Retest Reliability

The internal consistency was assessed separately in the ASD group and in the comparison cases. Cronbach’s coefficient alpha for the total scale was estimated at 0.92 in the ASD group and at 0.94 in the comparison cases. Internal consistencies for the four subscales were: social interaction α = 0.87/0.89 (ASD group/comparison cases), language, α = 0.58/0.22 (ASD group/comparison cases), circumscribed interests α = 0.73/0.73 (ASD group/comparison cases), and sensory motor α = 0.81/0.77(ASD group/comparison cases). Item 2 in the language subscale (I often use words and phrases from movies and television in conversations) had a negative corrected item-total correlation, and by removing it alpha for this subscale could be increased to 0.70/0.40 (ASD group/comparison cases).

Test -retest reliability was assessed in a subset of subjects comprising 12 with ASD who had completed RAADS-R on two separate occasions with 3–6 months interval. The total scores on the two occasions were strongly and positively correlated (r = 0.80, p = 0.002). Strong and significant correlations were also obtained for three of the subscale scores: social interaction (r = 0.76, p = 0.004), circumscribed interests (r = 0.73, p = 0.002) and sensory motor (r = 0.84, p = 0.001). For the language subscale the correlation was not statistically significant (r = 0.43, p = 0.161).

Correlation with the Autism-Spectrum Quotient

The degree of agreement between RAADS-R and AQ was assessed by comparing 35 subjects with ASD and 49 comparison cases. Correlation analyses between RAADS-R and AQ total and subscale scores were performed separately in the comparison cases and in the ASD subjects. In the ASD group there was a strong, positive correlation between RAADS-R and AQ (see Table 2).

Table 2 Correlations (Pearsons r) between RAADS-R and AQ total and domain scores in 35 subjects with ASD

In the comparison cases, a Spearman’s rank order coefficient was computed as the variables were not normally distributed. The correlation between AQ and RAADS-R total scores was strong (ρ = 0.70, p < 0.0001). RAADS-R subscale; social interaction, circumscribed interests, and sensory motor all had moderate to strong correlations with AQ Total score (ρ = 0.51–0.72, all p < 0.0001). The language subscale however was not significantly correlated with AQ total or any of the subscale scores (ρ = 0.06–0.17, all p > 0.05).

Distribution of Scores

The distribution of scores is shown in Fig. 1a–e. As evident from the histograms, the scores of the comparison cases had strong positive skewness (2.17) and were markedly peaked (kurtosis = 6.78). The scores of the ASD group did not depart significantly from normality (skewness = 0.02, kurtosis = −0.33). The median, minimum, and maximum scores of the two groups are shown in Table 3.

Fig. 1
figure 1

Total RAADS-R and domain scores in the comparison cases group and ASD group. a Total RAADS-R. b Subscale: Social interaction. c Subscale: Language. d Subscale: Circumscribed interests. e Subscale: Sensory motor

Table 3 Median, minimum and maximum RAADS-R total and domain scores in the ASD group (N = 75) and the Comparison cases (N = 197)

Group and Sex Differences

To explore group differences ANOVAs were performed comparing RAADS-R total and subscale scores by Diagnosis (ASD versus comparison cases) and Sex. Five subjects were excluded from the analysis due to missing information on sex. Mean total and subscale RAADS-R scores for the ASD and comparison cases are shown in Table 4, together with the results of the ANOVAs. As indicated, main effects of Diagnosis were found across all tests, the ASD subjects scoring higher than the comparison cases on the full scale as well as all four subscales. There was no main effect of Sex on the Total score, but there was a significant two-way interaction between Diagnosis and Sex. In the comparison cases the males obtained higher total scores than females, whereas in the ASD group females obtained higher total scores than males. T-tests revealed that these sex differences were not significant when assessed either in the comparison cases (t = 1.193, df = 137, p = 0.166) or in the ASD subjects (t = −1.847, df = 69, p = 0.069).

Table 4 Mean RAADS-R total and domain scores and standard deviations for males and females in the comparison cases versus ASD group

On the social interaction and circumscribed interests subscales there were no significant main effects of Sex, nor any Diagnosis × Sex interaction. On the language subscale there was no main effect of Sex, but a significant Diagnosis × Sex interaction, comparison case males scored higher than comparison case females (t = 2.370, df = 194, p = 0.019). Females in the ASD group scored somewhat higher than males, though this difference did not reach significance (t = −1.801, df = 69, p = 0.076). Lastly, on the sensory motor subscale women generally obtained higher scores than men. There was also a significant Diagnosis × Sex interaction. T tests showed that women in the ASD group, but not in the comparison cases, obtained significantly higher scores than males (t = −3,769, df = 69, p < 0.0001).

Eight comparison cases and 3 subjects in the ASD group had scores that deviated markedly from the mean (at least 2 standard deviations). If these outliers were excluded from the analysis, the patterns of which Diagnosis and Sex differences were significant did not change.

Ability to Differentiate Between ASD and Comparison Subjects

In order to further examine the ability of RAADS-R to distinguish between the two groups, a ROC-graph was generated. The area under the ROC curve (AUC) was estimated at 0.96 (Std. err. 0.012, 95% CI 0.94–0.98), indicating high overall accuracy. This means that the probability of a randomly selected subject with ASD scoring higher than a randomly selected subject with no ASD was approximately 96% in this sample. Table 5 shows the sensitivity and specificity of RAADS-R total score at various cut-offs between 50 and 100. If sensitivity and specificity are given equal priority, a cut-off of 72 achieved the best compromise, with sensitivity 0.907 and specificity 0.929.

Table 5 Sensitivity and specificity of RAADS-R at various cut-off scores (N = 272)

Discussion

The present study evaluates the psychometric properties of the Swedish version of the RAADS-R. The results indicate that RAADS-R is reliable, has good diagnostic validity and thus can be a useful aid in the diagnostic assessment of ASD in adults.

Internal Consistency and Test Re-test Reliability

The internal consistency was fair or good for three of the subscales: social interaction, circumscribed interests and sensory motor. The language subscale however demonstrated poor internal consistency as measured with Cronbach’s alpha. Part of the explanation for this is likely attributable to the fact that this subscale only consists of 7 items, as Cronbach’s alpha is a function of both intercorrelation among items and scale length (Nunnally 1978), but it could also be a result of cultural nuances between the English speaking world and Sweden. For example, one item (I often use words and phrases from movies and television in conversations) was reversely correlated with the scale, implying that this item does not work the way it was intended, and that modification or removal of this item should be considered in the Swedish version. Preliminary estimates of the test re-test reliability of the total score and three of the domain scores were very promising, again with exception of the language subscale. However, these results should be interpreted with caution as the sample size for the test–retest analysis was very small.

Diagnostic Validity

As expected, the ASD group obtained significantly higher scores than the comparison cases group on the total RAADS-R score as well as all four domain scores, indicating that the Swedish RAADS-R captures symptoms, characteristics and experiences that are relevant to the differentiation of patients with ASD from neurotypical subjects. Ritvo et al. (2010) suggests a cut-off of 66 for differentiating between patients with and without autism spectrum disorders in a study including nine-centers in four English-speaking countries. In the present study specificity could be somewhat increased, while maintaining the same level of sensitivity, if the cut-off was set somewhat higher, at 72. Although lower than in the Ritvo et al. (2008) study, the levels of sensitivity and specificity obtained in the present study must be considered good for a self rating instrument. Self rating thus seems to be a viable method of assessing impairments in adults of normal intelligence with ASD. This is supported by previous studies which have also found that these individuals generally have insight into, and are able to reliably report on, their own difficulties and way of functioning (Baron-Cohen et al. 2001; Cederlund 2007; Ritvo et al. 2008; Woodbury-Smith et al. 2005).

However, the overlapping by nine percent in each group serves as a reminder that self ratings are not exact or perfect. Moreover, some individuals with severe forms of ASD tend to lack sufficient insight, and for this reason give normal responses. This underscores the need for complementary basic instruments for systematic observations, such as the High functioning Autism Asperger Scale (HAGS) (Bejerot et al. 2001). It is also recommended that a clinician be present during the completion of RAADS-R in order to clarify any confusion and to assess the reliability of the patients’ responses (Ritvo et al. 2008).

As previously noted, 8 comparison cases (4 males and 4 females) obtained very high RAADS-R scores. 4 of these had undergone neuropsychiatric assessments and 3 were diagnosed with social anxiety disorder, ADHD, bipolar disorder, and schizotypal personality disorder. Another individual demonstrated many characteristics of ASD, and was assessed as having fulfilled diagnostic criteria in childhood; however presently he had no significant impairments. The fact that 3 out of the 12 subjects with a psychiatric disorder other than ASD obtained very high scores could be considered a problematic result. This underscores the importance of examining whether RAADS-R can differentiate between ASD and other psychiatric conditions, which may have overlapping symptoms. In addition, self-rating instruments alone are not sufficient in ambiguous cases; here clinical interviews are crucial to obtaining an accurate diagnosis.

Concurrent Validity

The overall strong and positive correlations between RAADS-R and AQ support the concurrent validity of the two instruments, although correlations provide only a rough estimate of the similarities and differences between them. The correlations between the domain scores are difficult to interpret as no factor analysis was performed on either instrument in the present study and previous factor analytic studies have suggested that the internal structure of the AQ does not fit the suggested domains (Austin 2005; Hoekstra et al. 2008; Hurst et al. 2007). The language and sensory motor domains within the RAADS-R had relatively modest correlations with the total AQ score compared to the social interaction and circumscribed interests domain scores. For the language subscale this might be due to poor internal consistency. The sensory motor subscale however showed good internal consistency and produced large group differences, implying that this subscale measures something unique and which is not included in the AQ conceptualization of autism. If one examines the content of the two scales this makes sense, as the AQ does not include questions on abnormal responses to sensory stimuli but stresses cognitive factors.

Sex Differences

A trend for females with ASD to score slightly higher than males with ASD is indicated, and this pattern was either the reverse or not replicated in the comparison cases. The higher scores in females with ASD could have several explanations: females may have greater insight into their symptoms than males; they may exaggerate their symptoms more; they may in fact have more symptoms than the males; female ASD might be more difficult to detect, thus the ones that are detected may have more extreme symptoms; or it could simply be a Type I Error. The sex difference in the comparison cases supports the male brain theory for autism, i.e. that males in general have more “autistic traits” than females in the normal population. Females with ASD scored higher than males on the sensory motor subscale. This may point towards a true sex difference in the symptomatology. Perhaps, in the future, these traits could serve as markers in genetic studies. In the comparison cases men scored slightly higher than women on the language subscale, but due to various problems with this subscale, one should be cautious with interpretations at this stage.

Limitations and Suggestions for Further Research

Some methodological limitations should be noted. First, the participants of the two groups compared in the study were not matched with respect to gender, age or intelligence. The age distributions of the two groups were roughly similar and all participants were in the range of intelligence above intellectual disability (i.e. IQ > 70). However, the ratio of females to males was proportionally greater in the comparison cases compared to the ASD group. The fact that women were in majority in the comparison cases might possibly have led to a slight overestimation of the specificity of the RAADS-R as comparison case females obtained slightly lower mean scores than comparison case males, although this difference was not statistically significant. Furthermore, not all of the subjects in the comparison case group were seen in person by the investigators or screened for psychiatric disorders. This is true for 29 of the students as well as for the 61 doctors and medical students. A few of these subjects did obtain remarkably high scores on the RAADS-R. As it was completed anonymously the investigators did not have the possibility to examine those who were high scorers.

A third limitation has to do with the fact that the scores in the comparison cases were essentially non-normally distributed (which is to be expected with this type of instrument), thus using parametric statistics would be somewhat dubious. ANOVAs have proven to be rather robust against a deviation from normality, as long as it is not caused by outliers (Tabachnick and Fidell 2007). In an attempt to compensate for this, the analyses were also performed with outliers removed, and results were identical with the exception of larger effect sizes overall. However it is possible that the non-normality of the distribution may have affected the results in some way; the sex effects are probably the most vulnerable as the differences in means between men and women are much smaller than the differences between the ASD group versus the comparison cases.

Finally, future studies are needed to assess the sensitivity and specificity of the RAADS-R for subjects with other specific DSM diagnoses such as OCD, Social Anxiety Disorder, severe personality disorder and schizophrenia. It should be noted that different cut-off limits may be optimal with other comparison populations.

Conclusion

The results of the present study indicate that the Swedish RAADS-R is a reliable and valid instrument that can be a useful tool for clinicians when diagnosing the possibility of ASD in adults. This self-administered rating-scale is easily administered and user-friendly, properties which both are valuable and cost-effective. Three of the subscales have adequate psychometric properties, with the language subscale being the weakest for reasons discussed.