Introduction

Autism spectrum disorder (ASD) is characterized by persisting deficits in social communication and interaction, alongside repetitive, stereotyped behavior and restricted interests (APA, 2013). The number of adults diagnosed with ASD has increased dramatically in the past decade and ASD now accounts for a large burden on health care (Fombonne 2009; Keyes et al. 2012). The global prevalence varies greatly but is approximately 1% (Elsabbagh et al. 2012), with 1.8% in men and 0.2% in women (Brugha et al. 2011). Autistic traits have moderate to high heritability, are highly stable, and distributed on a continuum in the general population, where ASD is at one extreme of the population distribution (Hoekstra et al. 2007; Robinson et al. 2011).

Studying autistic traits can give further insight into how they relate to mental processes (Kuo et al. 2014), individual differences (Rivet and Matson 2011), and psychiatric disorders such as anxiety and depression (Rosbrook and Whittinham 2010). Screening for autistic traits in the general population may be helpful in epidemiological research because it may provide necessary sample size to investigate relationships between autism phenotype severity and theoretically important factors. Furthermore, examining autistic traits in general population samples can serve as ‘analogue studies’ for ASD, providing access to larger, more easily accessible samples and thus allowing more complex statistical analyses to be conducted (e.g. Jackson and Dritschel 2016; Kunihira et al. 2006).

Among the variety of screening tools developed to quantify autistic traits, over the past decade the most commonly used is probably the Autism-Spectrum Quotient (AQ; Baron-Cohen et al. 2001a). The AQ has been used to screen clinical samples (Woodbury-Smith et al. 2005) and to predict performance on cognitive tasks (Stewart et al. 2009), social cognition (Baron-Cohen et al. 2001b), spontaneous facial mimicry (Hermans et al. 2009), gaze preference to social and non-social stimuli (Bayliss and Tipper 2005), and auditory speech perception (Stewart and Ota 2008).

The AQ is a self-administered questionnaire for measuring the degree to which adults with normal intelligence show autistic traits. It consists of 50 questions, with 10 questions assessing five different domains relevant for autistic traits (social skill, attention switching, attention to detail, communication, and imagination). Adequate test–retest reliability has been shown in the AQ (Baron-Cohen et al. 2001a) and the AQ sum scores are normally distributed in the general population (Hurst et al. 2007). Cross-cultural equivalence in Dutch and Japanese samples has also been shown (Hoekstra et al. 2008; Kurita et al. 2005; Wakabayashi et al. 2006).

However, some aspects of the AQ are still questionable. Baron-Cohen et al. (2001a) originally proposed a unidimensional structure of the AQ based on descriptive item analysis and sum score distribution across ASD and non-ASD groups. The sum score is by far the most commonly used AQ result, yet Baron-Cohen et al. (2001a) only found adequate internal consistency (defined as Cronbach’s alpha above 0.70; Nunnally and Bernstein 1994) in one of the five autism trait domains in the AQ. Low Cronbach’s alpha indicates a lack of correlation between the items in a scale, which suggests deviation from unidimensionality. The low degree of internal consistency in the AQ has been extensively replicated (e.g., Austin 2005; Hoekstra et al. 2008; Hurst et al. 2007; Kloosterman et al. 2011; Stewart and Austin 2009).

To date, studies using more advanced statistical methods, such as factor analysis, have demonstrated that the AQ may consists of five (Kloosterman et al. 2011; Lau et al. 2013), four (Stewart and Austin 2009), three (Austin 2005; Hurst et al. 2007) or two (Hoekstra et al. 2008) dimensions. The two-factor model (actually two higher-order factors and four primary factors) was confirmed in a validation of a 28-item short form of the AQ (Hoekstra et al. 2011). Thus, the unidimensional structure assumed by Baron-Cohen et al. (2001a) has not been replicated.

A common feature of previous studies is that the psychometric analyses are mostly based on non-ASD samples. The choice of mainly student samples may be reasonable, given that the AQ is directed towards autistic traits in the general population. However, the feasibility of the AQ and the theoretical basis of an autistic trait continuum require that the properties of the AQ are similar among those with and without ASD.

Another common feature of these studies is that they apply classical test theory techniques, such as principal component analysis, exploratory factor analysis or confirmatory factor analysis. As shown by Gorsuch (1997), factor analysis on ordinal data, if treated as interval data, can result in spurious factors. In addition, item distributions may differ from each other and therefore items will tend to load on the same factor as other items with similar distributions. One will thus make erroneous conclusions about the scale, especially when sum score, as in AQ, is used to define the degree of an underlying trait. Consistent with this, Stewart and Austin (2009) noted that their initial exploratory factor analysis suggested a large number of poorly defined factors. Consequently, these numerous factors may possibly reflect distribution properties and not the underlying construct being measured. Therefore, we will take a different approach in the present study and examine the dimensionality of AQ using Rasch analysis.

Rasch models (Rasch 1960) have currently been applied in the development and validation of unidimensional scales with interval scale properties based on frequency questions or Likert items. They facilitate calibration of the observed test values with the underlying latent property (Linacre 1994). Rasch analysis can thus determine the degree to which items in the AQ accurately characterize autistic traits. Rasch models facilitate analysis of whether an instrument meets the requirements of invariance; for instance whether the scale works in a similar manner among men and women with and without ASD. Finally, the Rasch model is a method to validate the interval properties of a scale. An advantage of Rasch analysis is that it makes no assumptions about the distribution of the latent property, whereas in classical test theory techniques, normally distributed latent variables are required. Hence, the aim of the study was to test the scale properties of the Swedish AQ using Rasch analysis.

Methods

Participants

Two samples, an ASD group and a non-ASD group, were recruited for this study. The ASD group was recruited from the Centre for Adult Habilitation, Region Örebro County, Sweden. A total of 401 adults diagnosed with ASD and without intellectual impairment (i.e., IQ > 70) were invited to participate and 130 of them volunteered (68 men and 62 women, age 18–62, mean = 29.3 years, SD = 9.9). No age difference was found between the participants and the non-participants; however, the proportion of participating men (28%) was significantly lower than the proportion of participating women (40%) (χ2 = 6.25, p < 0.05).

The non-ASD group consisted of 219 university students recruited from various departments at Örebro University (93 men and 126 women, age 18–55 years, mean = 23.8 years, SD = 5.7). None of them reported having an ASD diagnosis. No age difference was found between men and women (t(217) = 0.68, p = 0.50) and the sex ratio of the sample was equivalent to that of the university (i.e., 60% women).

The ASD and non-ASD groups differed in regard to sex and age. The ASD group had significantly more men than the non-ASD group (χ2 = 9.43, p < 0.01) and the ASD group was on average older than the non-ASD group (t(347) = 9.06, p < 0.001).

The Autism-Spectrum Quotient

The Autism-Spectrum Quotient (AQ; Baron-Cohen et al. 2001a) is a 50-item self-report questionnaire for measuring the degree to which an adult with normal intelligence has the traits associated with the autistic spectrum. The items, which are given in Table 3, assess five different domains (10 items per domain): social skill, attention switching, attention to detail, communication, and imagination. All items are scored on a four-point rating scale ranging from 1 = definitely agree to 4 = definitely disagree. The scorings are reversed (from 4 = definitely agree to 1 = definitely disagree) for the items in which an “agree” response indicates an autistic trait. The following items were reversed: 2, 4, 5, 6, 7, 9, 12, 13, 16, 18, 19, 20, 21, 22, 23, 26, 33, 35, 39, 41, 42, 43, 45, and 46. All item scores are summed; thus, AQ sum score can vary between 50 (at the lowest extreme of the autistic trait continuum) and 200 (at the highest extreme of the autistic trait continuum).

The AQ was translated into Swedish after permission from Professor Simon Baron-Cohen. The translation was performed independently by two professional translators. The two translations were compared and the few minor discrepancies that emerged, which consisted of different choices of synonymous words or sentence structure, were discussed with the translators. Subsequently, a third professional translator translated the Swedish version back into English to confirm equivalence with the original. Hence, the Swedish version of AQ is linguistically similar to the English original. The Swedish translation is available from the first author.

Procedure

The adults with ASD received the study information, the study consent form, the AQ questionnaire, and a prepaid envelope by post. The students (non-ASD group) were informed verbally about the study and completed the AQ questionnaire during lectures. No course credit was received.

Data Analysis

IBM SPSS Statistics version 22 (IBM Corp, Armonk, NY) was used to summarize participant characteristics and to evaluate group differences using t-tests. A p value below 0.05 was regarded as significant. The AQ rank-ordered scores were analyzed using Rasch rating scale model with Winsteps 3.81.0 (Linacre 2014). Detailed explanation of Rasch models is given elsewhere (Engelhard 2013). In brief, Rasch analysis converts rank-ordered data into interval logit measures, giving each person and each item a logit measure. Logit stands for Log-Odds Unit and form an equal interval linear scale. The logit scale is unaffected by variations in the distribution of measures and independent of the particular items included in a test or the particular samplings of people (Wright 1993). Thus, an ‘AQ person measure’ represents the degree to which a person shows autistic traits (the higher the logits, the higher the degree of autistic traits). An ‘AQ item measure’ represents how difficult any particular item may be to endorse given a specific degree of autistic traits (the higher the logits, the more difficult to endorse). Rasch analysis enables the researchers to identify whether any items are misleading and whether the rating categories have been used as intended by the instrument developer.

Rating Categories

The four rating categories were examined according to four criteria (Linacre 2002): (i) there should be at least 10 responses in each rating category, (ii) the average AQ person measure should be lower in a category representing low AQ than in one representing high AQ, (iii) the transition point between each two categories (threshold) should follow an increasing level of the underlying autistic trait, and (iv) the category outfit mean square should be less than 2.0. The rating scale graphs generated by Winsteps were used to examine the ordering of thresholds and how the rating categories were positioned along the latent variable.

Item Properties

Point–measure correlations, local item independence, and fit statistics were used to examine the item properties. Point–measure correlation of each item reports the relationship between the group’s performance on the item and the group’s performance on the whole instrument. All items are expected to correlate positively in the direction of the latent variable, if any items show negative correlations it is assumed that these items are considered invalid. Local item independence assessed whether responses to any item were unrelated to any other item when trait level was controlled; thus, the endorsement of any item should not affect the probability of endorsement of the other items. Violation of local item independence may affect parameter estimates. An item residual correlation of at least 0.7 (i.e., common variance approximately 0.50) was set as a criterion for item dependency (Linacre 2009). Fit statistics detect the extent to which the response pattern observed in the data matches the one expected by the model. In this study, an item was considered as misfit if infit and outfit mean square was greater than 1.50.

Differential Item Functioning

Differential item functioning (DIF) was used to examine whether an item performed differently for the ASD group than for the non-ASD group. For this study, item DIF was considered present if the difference between two groups on an item measure was 0.5 logits or more and reached significance (p < 0.05) in a t test (Karami 2012).

Scale Reliability

Scale reliability was evaluated in terms of person reliability, an index similar to Cronbach’s alpha: for the range 0–1, coefficients above 0.70 are considered as a minimum for group use and coefficients above 0.85 for individual use (Tennant and Conaghan 2007).

Unidimensionality

Principal components analysis of residuals was used to examine whether the five AQ domains measure different dimensions or work together to measure one dimension. We used two criteria: at least 50% of the total variance should be explained by the first latent variable and any additional factor should explain less than 5% of the remaining variance after removal of the first latent variable (Linacre 2009).

Targeting

We explored the potential use of the AQ to measure a clinical population by examining the targeting of item difficulty (not too easy, not too hard) to the individual’s trait level in the person–item map. The map orders person and item measures along the same scale, which enables us to examine whether the AQ has enough items to discriminate people with different levels of autistic traits. The item difficulty range is expected to match the range of autistic trait levels in the ASD group. A value around zero thus indicates that the items are well targeted for the people in the sample (Tennant and Conaghan 2007).

Sensitivity and Specificity

Sensitivity and specificity of the AQ as a screening tool for ASD was evaluated using the receiver operating characteristic (ROC) curve and area under the curve (AUC) calculated for the full AQ scale and the five AQ domains. The Youden index (Youden 1950), which is the point at which the tangent to the ROC curve is parallel to the chance line, was used to find the optimal cut-off scores. This index has been used in the development of diagnostic assessments for ASD (Cohen et al. 2010) and is regarded as one of the most stringent statistical method to identify a cut-off or threshold in diagnostic measures.

Results

Person Measures

Mean AQ person measure for the ASD group was significantly higher (t(347) = 15.02, p < 0.01) than the mean AQ person measure for the non-ASD group (Table 1). No significant differences between men and women were found in either group.

Table 1 Mean AQ sum scores and mean person measures

Rating Categories

Both groups fulfilled the rating scale criteria (Table 2). That is, there were more than 10 responses in each rating category, average person AQ measure increased with the rating category, thresholds were ordered, and category outfit mean square was below 2.0. The probability that an individual with a given autistic trait level will select a response category is shown in Fig. 1. For any given point along the x-axis (representing autistic trait continuum), the category most likely to be chosen by an individual is shown by the category curve with the highest probability. An optimally functioning scale should have each category most likely to be selected for an equal interval on the scale, which the AQ demonstrated.

Table 2 Summary statistics for the four AQ rating scale categories
Fig. 1
figure 1

The category probability curves of the AQ, illustrating the range over which each of the four categories is most likely to be chosen. Boundaries occur at points along the scale where the category most likely to be chosen changes from one to the next. The 1, 2, 3 and 4 category curves on the graph represent the four rating categories, from 1 = definitely agree to 4 = definitely disagree. The x-axis is AQ person measure minus AQ item measure in logits. (AQ Autism-Spectrum Quotient, logit log-odd unit)

Item Properties

Point–Measure Correlations

Three items, 29 “I am not very good at remembering phone numbers”, 30 “I don’t usually notice small changes in a situation, or a person’s appearance”, and 49 “I am not very good at remembering people’s date of birth”, had point–measure correlations lower than zero (−0.02, −0.18, and −0.01, respectively). The negative correlations suggest that people with high scores on these items had a lower autistic trait level, not higher as was expected.

Local Item Independence

All items showed standardized residual correlations below 0.7. The greatest standardized residual correlations were between items 17 and 38 (0.58) and between items 44 and 47 (0.58).

Fit Statistics

As shown in Table 3, the logit measures of the 50 items ranged from 0.99, most difficult to endorse (item 09 “I am fascinated by dates”), to −1.11, easiest to endorse (item 30 “I don’t usually notice small changes in a situation, or a person’s appearance”), both in the domain Attention to detail. Five items were misfit: item 21 in Imagination and items 9, 29, 30, and 49 in Attention to detail.

Table 3 Item raw scores, logit measures and fit statistics for the Autism-Spectrum Quotient

Item DIF

Five items showed DIF between the ASD and non-ASD groups: items 13 (−1.09 logits), 22 (−0.93 logits), and 44 (0.79 logits) in the domain Social skill, item 14 (0.81 logits) in Imagination and item 19 (−0.91 logits) in Attention to detail. Note that items 13, 19, and 22 have reversed scoring. Given identical levels of autistic traits, items 13 “I would rather go to a library than a party”, 19 “I am fascinated by numbers”, and 22 “I find it hard to make new friends” were thus more likely to be endorsed by those in the ASD than those in the non-ASD group, whereas items 14 “I find making up stories easy” and 44 “I enjoy social occasions” were more likely to be endorsed by those in the non-ASD group.

Unidimensionality

The principal components analysis of all 50 items showed that the AQ instrument did not fulfill the unidimensionality criteria. The raw variance explained by the measures (which should be above 50%) was 26.2% and the unexplained variance in first contrast (which should be below 5%) was 7.7%. Three clusters were formed and we repeated the analyses with each cluster of items. Only one cluster fulfilled both criteria, showing a raw variance explained by the measures of 52.8% and an unexplained variance in first contrast of 2.7%. This cluster consisted of 12 items: 11, 13, 22, 44, and 47 in the domain Social skill, 10, 32, 34, and 46 in Attention switching, and 17, 26, and 38 in Communication.

Targeting

The range of AQ items targeted well, showing person means and item means close to each other, with a mean measure of −0.31. The targeting in the ASD group was excellent, with a mean measure of 0.04 (Fig. 2) whereas it was acceptable in the non-ASD group, as indicated by a mean measure of −0.61 (Fig. 3).

Fig. 2
figure 2

Person–item map for the ASD group: AQ person measures in relation to AQ item measures in logits. (M mean, S 1 standard deviation (SD) from the mean, T 2 SD from the mean, AQ Autism-Spectrum Quotient, ASD autism spectrum disorder)

Fig. 3
figure 3

Person-item map for the non-ASD group: AQ person measures in relation to AQ item measures in logits. (M mean, S 1 standard deviation (SD) from the mean, T 2 SD from the mean. (AQ Autism-Spectrum Quotient, ASD autism spectrum disorder)

Scale Reliability

Person separation was 2.52 and person reliability was 0.86. Item separation was 7.29 and item reliability was 0.98. There was a strong correlation between AQ logits and AQ sum score (r = 0.998, p < 0.001).

Sensitivity and Specificity

The ROC curves for the full AQ scale and the five AQ domains are shown in Fig. 4. The AUC was significant for all domains, the sensitivity varied between 48 and 75%, and specificity varied between 66 and 93% (Table 4). A sum raw score of 118 was identified as the optimal screening cut-off score for the full AQ scale, where Youden’s index was 0.65 (95% CI 0.55–0.71). The correct classification was 85.2% (95% CI 80.6–88.1%), the positive predictive value was 0.85 (95% CI 0.79–0.91), and the negative predictive value was 0.85 (95% CI 0.82–0.87). The AQ sum score distribution for ASD and non-ASD participants is shown in Fig. 5.

Fig. 4
figure 4

Receiver operating characteristic (ROC) curves illustrating the ability of the full AQ scale and AQ domains to identify any ASD cases at alternative cut-off points. Note that a perfect measure would have an area under the curve of 1.0, whereas a measure with no diagnostic value would have an area of 0.5, with the ROC curve laying on the diagonal. (AQ Autism-Spectrum Quotient, ASD autism spectrum disorder)

Table 4 Area under the curve (AUC), sensitivity, and specificity for the Youden index based cut-off scores for the full AQ scale and five AQ domains
Fig. 5
figure 5

AQ sum scores in ASD and non-ASD participants (AQ Autism-Spectrum Quotient, ASD autism spectrum disorder)

Discussion

The study tested the scale properties of the Swedish AQ using the Rasch rating scale model, with mixed results: several scale properties were good to excellent whereas others were poor. On the one hand, the AQ fulfilled the rating scale criteria, had minimal DIF, adequate item properties, adequate item and person separation and reliability, and excellent targeting for the ASD group; on the other hand, the AQ did not meet the criteria for a unidimensional scale.

In regard to item properties, five items were misfit and thus did not fit the expected model: item 21 in the domain Imagination and items 9, 29, 30, and 49 in Attention to detail. Three of the items (29, 30, and 49) had negative point–measure correlations, with the scoring orientation on these items opposite to the orientation of the latent variable (the degree of autistic traits). Reasons for negative point–measure correlations can, for instance, be person-specific knowledge, guessing, or reverse scoring. It is notable that all three items are negatively worded and that these items were also scored higher by the non-ASD group than the ASD group, suggesting that the items do not represent a measure of autistic traits and need revision. This is in line with previous studies finding low or negative domain loadings for these items (Austin 2005; Hoekstra et al. 2008; Hurst et al. 2007; Stewart and Austin 2009). It should be noted that in the development of the AQ, Baron–Cohen and colleagues (Baron–Cohen et al. 2001a) found that items 29 and 30 were scored higher by controls than adults with Asperger’s syndrome or high-functioning autism, but nevertheless were retained in order to reduce the group differences.

No item pair was locally dependent, although item residuals were moderately correlated between “I enjoy social chit-chat” (item 17) and “I am good at social chit-chat” (item 38), and between “I enjoy social occasions” (item 44) and “I enjoy meeting new people” (item 47). In both pairs, the items are similar in meaning. Even if they fit the model, use of highly similar worded items will boost the items’ correlation with the total score while providing no unique information about the responder. In the presence of local dependency, it is recommended that one of the similar items should be excluded due to potential redundancy.

Five of the 50 items showed DIF, three from the Social skill domain, one from the Imagination domain, and one from the Attention to detail domain. Interestingly, the DIF indicated that these items exaggerated the group differences in the expected direction. That is, people with ASD are expected to be less socially skilled and imaginative and more attentive to details than those without ASD; these items thus highlight the group differences more distinctly than the other items in the AQ. Absence of DIF is crucial for an adequate scale (Tennant and Conaghan 2007), but given this overestimation bias—that only five out of 50 items showed DIF and that all but one of these items were below 1 logit—it would appear that the AQ items, for all practical purposes, are adequate for people with as well as without ASD.

The AQ items targeted well at the individuals with ASD. However, as shown in the person–item maps, most of the non-ASD respondents were clustered at the lower end of the measures, indicating a low position on the autistic continuum, while many of the items were concentrated at the higher end of the continuum. This would suggest that the set of AQ items is less appropriate for measuring degree of autistic traits in the non-ASD group. Furthermore, the result is reasonable given that the AQ was developed to screen adults with Asperger’s syndrome or high-functioning autism, who are more likely to endorse many of the items. During piloting of the AQ, Barron-Cohen (2001a) excluded the items (except items 29 and 30) if non-ASD people selected ‘definitely disagree’ or ‘slightly disagree’ more often than did people with Asperger’s syndrome or high-functioning autism. Consequently, non-ASD respondents would be less likely to endorse items on the AQ and they will thus show worse targeting.

The Rasch analysis supported most of AQ scaling properties but failed to support Barron-Cohen et al.’s (2001a) assumption that AQ measures a single latent variable, namely, the degree of autistic traits. This result is in line with previous research using factor analysis (Austin 2005; Hoekstra et al. 2008; Hurst et al. 2007; Stewart and Austin 2009) and Mokken scaling (Stewart et al. 2015). The hypothesized single latent variable is not consistent with the multidimensional nature of ASD, as expressed in the Diagnostic and Statistical Manual of Mental Disorders, DSM-5 (American Psychiatric Association 2013), or with the fact that Barron-Cohen (2001a) selected the AQ items from the domains in the “triad” of autistic symptoms. The use of a single AQ sum score may therefore not adequately express the multifaceted aspect of ASD.

By reducing the AQ to 12 items from the Social skill, Attention switching, and Communication domains, we were able to meet both criteria for unidimensionality. Intriguingly, nine of these items (11, 13, 17, 22, 26, 34, 38, 44, and 47) are among the ten items that passed the Mokken scaling test on people with ASD (Stewart et al. 2015). Hoekstra et al. (2008), using CFA, found that the AQ consisted of two second-order factors, one of them including Social skill, Attention switching and Communication. Using different evaluation methods we thus converged on a similar conclusion: the AQ measures more than one latent variable and consists of an unnecessarily large number of items in order to measure a unidimensional autistic trait. Despite this, a majority of empirical studies use the AQ sum score as the sole measure of an autistic tendency. If the AQ measures a set of (somewhat related) constructs, what exactly does an AQ sum score mean and what consequences does this have for our understanding of autism?

According to the psychometric literature, if the assumption of unidimensionality is violated, any statistical analysis based on it would be misleading. Specifically, estimates of the latent variables and item parameters will generally be biased because of model misspecification, which in turn leads to incorrect decisions on subsequent statistical analysis, such as testing group differences and correlations between latent variables (e.g., Horton et al. 2013).

It should be noted that unidimensionality is a relative matter. The judgment of whether a scale is sufficiently unidimensional should ultimately come from outside the data and be driven by the purpose of measurement, clinical, and theoretical considerations (Andrich 1988; Cano et al. 2011; Rasch 1960).

A pragmatic way to salvage a situation like this would be to treat the AQ sum score as an index, in other words, a formative latent variable (see Simonetto (2012) for an overview). A formative latent variable is defined by a number of non-interchangeable composite indicators, such as income, education, and occupation in the variable socioeconomic status, or weight and height in the variable body mass index. Consequently, a formative latent variable does not exist at a deeper conceptual level than its defining composite indicators (Law et al. 1998). Following this path, AQ sum score will lose content validity and serve as a mere observable outcome and predictor variable.

To what extent, then, can the AQ predict presence of ASD? The person reliability and separation indices of the AQ were adequate, as were the item reliability and separation indices. The AQ has the potential to classify three groups of people (low, average, and high degree of autistic traits) and is at a level of sensitivity required for both group and individual use (Tennant and Conaghan 2007). The AQ may also be able to separate more than ten item difficulty levels, which confirms its item difficulty hierarchy, in other words, its construct validity. The AQ sum score differentiated well between the ASD group and the non-ASD group. The AUC was above that found on similar populations in Britain (e.g. Woodbury-Smith et al. 2005) but lower than that reported in the Netherlands (Wouters and Spek 2011) or Australia (Broadbent et al. 2013). Regarding the AQ domains, the ROC indicated that the domains Social skill, Attention switching, and Communication had adequate AUC (above 80%), whereas the AUC of Imagination was fair and the AUC of Attention to detail, though above chance, was poor (below 60%). This is in line with the large proportion (40%) of misfit items in this domain and with previous studies showing that Attention to detail is the poorest domain in the AQ for differentiating people with and without ASD diagnoses (Allison et al. 2012; Wouters and Spek 2011).

The AQ logits and sum scores obtained for each individual were highly correlated (r = 0.998); suggesting that summed raw scores adequately reflected true change along the autistic traits continuum that the AQ quantifies. However, it should be borne in mind that the conversion to logits would only be motivated if the sample characteristics are similar to those of the present study. Consequently, Rasch analyses are needed prior to using the AQ on other populations.

Limitations

Although this study provides an important contribution to our understanding of the AQ and the assessment of autistic traits in people with and without ASD, there are a number of limitations that warrant discussion. First, the groups were not matched for sex and age. The participants in the non-ASD group were younger and included a larger proportion of women than the ASD group. Despite sex and age differences, the DIF analyses showed few discrepancies between the ASD and non-ASD groups. Consistent with previous research, there was no difference between mean AQ sum scores of men and women with ASD (Baron-Cohen et al. 2001a, 2006; Hoekstra et al. 2008).

Moreover, the sample size fulfilled the requirement of stable calibration for Rasch analysis but the subgroups for DIF analysis were too small (see Linacre 2013) to draw a definite conclusion regarding whether, for example, sex- or age-related DIF was present in the items in either the ASD or the non-ASD group. Therefore, any conclusions regarding sex or age differences between groups should be interpreted with caution.

Furthermore, some of the ASD participants attached comments to their questionnaires that it was somewhat challenging for them to complete so many questions. It is reasonable to conclude that some people with ASD, regardless of their motivation to complete the questionnaire, may have lacked the ability to do so. Although all people with ASD registered in the county were invited to participate, the results are only generalizable to those with the ability to complete the AQ questionnaire. This may have less impact on estimated AQ scale properties, because the reported level of autism traits as quantified by AQ is probably an underestimation of the true level in the ASD population. In addition, the non-ASD sample completed the AQ anonymously, which meant that we could not verify whether any of them had an ASD diagnosis or would fall within that category.

Conclusions

Our findings suggest that several measurement properties of the AQ were good and that it had adequate sensitivity and specificity to distinguish people with ASD from those without ASD, though the AQ sum score did not perform better than the Social skill domain alone. Nevertheless, the AQ cannot be described as a unidimensional measurement of the degree to which adults with normal intelligence show autistic traits. Thus, the AQ sum score is probably best regarded as an index. The complementary Rasch analysis showed that the 50-item AQ could be reduced to a 12-item subset with little loss in explanatory power. Following replication on a new sample, this subset of AQ items has the potential to efficiently measure the degree to which adults with and without ASD show autistic traits.