Screening for Autism Spectrum Disorder with the Achenbach System of Empirically Based Assessment Scales

Integrating a screening possibility for Autism Spectrum Disorder (ASD) within the widely applied Achenbach System of Empirically Based Assessment (ASEBA) scales could be of great value for daily clinical practice. The present study explored the ability of the school-aged Child Behavior Checklist (CBCL) and the Teacher’s Report Form (TRF) to screen for ASD in children and adolescents (aged 6 to 18 years) within a mixed clinically referred sample. Different screening variants were compared for the CBCL, TRF, and the combination of CBCL and TRF: the separate withdrawn/depressed, social problems, and thought problems syndrome scales; combinations of these syndrome scales; and special ASD scales composed of relevant individual items. Analyses were performed for: youth with a DSM-IV-based clinical diagnosis; youth for which the clinical DSM-IV diagnosis was confirmed by a standardized assessment (the Autism Diagnostic Interview-Revised); and youth with a DSM-IV- based clinical diagnosis of ASD that also met the DSM-5 criteria. The results clearly demonstrated that the special ASEBA-based scales – in particular when completed by the parents – were most predictive of ASD. The results also indicate that following the initial screen with these ASEBA scales, further thorough diagnostic assessment is necessary to definitively establish whether young people really suffer from ASD.

Autism spectrum disorder (ASD) is a pervasive developmental disorder characterized by persistent deficits in social interaction and communication in combination with restricted, obsessive, or repetitive patterns of behavior (American Psychiatric Association 2000. Due to the polythetic criteria as used in psychiatric classification systems, ASD is quite heterogeneous in terms of symptom composition and severity. Youth with severe forms of ASD as described in the DSM-5 or the classical Autistic Disorder according to the DSM-IV can be identified rather well. However, children and adolescents with milder forms of ASD like Pervasive Developmental Disorder-Not Otherwise Specified (PDD-NOS) or high-functioning children with ASD are more difficult to recognize. In fact, high-functioning youth with ASD are often diagnosed relatively late, that is in elementary or even secondary school (Daniels and Mandell 2014), or the ASD may not be recognized at all (Kim et al. 2011). When these young people are referred to mental health care, they often display a wide range of secondary complaints like anxiety, depression, and behavioral problems (Simonoff et al. 2008). They are often 'diagnostic puzzles' for clinicians and thorough diagnostic assessment is required. The Autism Diagnostic Interview-Revised (ADI-R; Rutter et al. 2003) and Autism Diagnostic Observation Scale (ADOS-2; Lord et al. 2012) are helpful standardized instruments to support the diagnostic process, but the administration of these measures is quite time-consuming, labor-intensive, and thus relatively expensive (Howlin and Asgharian 1999). Easy-to-administer screening instruments that indicate the possible presence of ASD in school-aged youth would be welcome as they enable clinicians to quickly identify those at risk for ASD who require an in-depth examination and to prevent others from redundant or irrelevant diagnostic procedures.
Several instruments have been developed that can be used to screen for ASD in school-aged children and adolescents, including the Social Responsiveness Scale (SRS; Constantino and Gruber 2005), the Social Communication Questionnaire (SCQ; Berument et al. 1999), the Children's Communication Checklist (CCC; Bishop 1998), and the Autism-Spectrum Quotient (Auyeung et al. 2008; Baron-Cohen et al. 2006). Despite the fact that these ASD screening instruments are psychometrically sound and available, they are not regularly applied in daily clinical practiceprobably because there is little room during the intake procedure for administering extra assessment instruments with a special focus on only one type of psychopathology. However, in many countries (including The Netherlands) clinics routinely apply the Achenbach System of Empirically Based Assessment (ASEBA; original version: Achenbach 1991; revised version: Achenbach and Rescorla 2001) scales as a broad screen to get an initial impression of the behavioral, emotional, and social problems (as well as competencies) of clinically referred children and adolescents. The ASEBA scales make use of multiple informants: Parents complete the Child Behavior Checklist (CBCL), teachers the Teacher Report Form (TRF), while children themselves from 11 years onwards fill in the Youth Self Report (YSR). Interestingly, it has been suggested that the ASEBA scales might also be valuable for the screening of ASD. Routine-wise scoring by means of these scales might increase clinicians' awareness of ASD and might lead to better detection of this disorder and indication of appropriate interventions. It is good to note that the preschool version of the Child Behavior Checklist (CBCL 1.5-5; Achenbach and Rescorla 2000) already contains a DSM-oriented scale of pervasive developmental problems that can be effectively used to screen for ASD (e.g., Havdahl et al. 2016;Myers et al. 2014;Muratori et al. 2011;Sikora et al. 2008), but in the school-aged version of the Achenbach scales (for children and adolescents of 6-18 years) a standard subscale for assessing the symptomatology of ASD has not been included. However, both the original and the revised ASEBA scales contain various items (e.g., "Strange behavior"; "Doesn't get along with other kids"; "Would rather be alone than with others") and syndrome scales (i.e., depressed-withdrawn, thought problems, social problems) that are highly relevant for ASD. Thus, it is not surprising that researchers have begun to evaluate individual syndrome scales, combinations of these scales, and specific sets of items of the ASEBA scales as potential screeners to detect children and adolescents that might be suffering from this type of psychopathology.
Most studies have relied on the parent version of the Achenbach scales, the CBCL, and made use of the existing syndrome scales to explore whether these are statistically different for youth with and without ASD. A first attempt was made by Bölte and colleagues (Bölte et al. 1999) who compared school-aged CBCL subscale scores of youths aged 4 to 18 years who were diagnosed with autistic disorder and a control group of youth with a mix of other psychiatric problems. It was found that the ASD youth scored higher on social problems, thought problems, and attention problems but lower on somatic complaints as compared to youth in the clinical control group. Adopting a similar approach in a small Brazilian sample of clinic-referred children aged 4 to 11 years, Duarte et al. (2003) noted that only the thought problems subscale of the CBCL differentiated between youth with ASD and youth with other psychiatric disorders. In a further study by Hoffmann and colleagues (Hoffmann et al. 2016), the screening utility of the CBCL syndrome scales was explored in an outpatient sample of high-functioning (IQ > 70) 4-to 18-year-old children and adolescents with ASD. The researchers made comparisons between an ASD only group, an ASD plus comorbid ADHD group, and control groups of children and adolescents with only ADHD or with internalizing disorders only. Only the social problems subscale was found to differentiate between both ASD groups and the control groups. Biederman et al. (2010) took the research on this topic one step further by not only examining whether individual CBCL syndrome scales differentiated youth with (high-functioning) ASD from clinically referred youth without this condition (the mean age in both groups was on average 11 years), but also by looking at the combination of scales. Logistic regression analyses revealed that the withdrawn/depressed, social problems, and thought problems syndrome scales were significant predictors of an ASD diagnosis, but Biederman et al. (2010) also demonstrated that an ASD profile consisting of the combination of these three subscales had the best potential to discriminate between youth with and without ASD. In an attempt to replicate these results, Havdahl et al. (2016) found that children with ASD aged 6-13 years scored significantly higher on the CBCL syndrome scales withdrawn/depressed, social problems, and thought problems than children in a mixed clinical control group. However, these authors also critically noted that the capacity of these subscales to predict the diagnosis of ASD was relatively weak. Within this sample, the combination of the withdrawn/depressed and thought problems scales (which the researchers named the WTP profile) turned out to have, relatively speaking, the best discriminative power.
Using a clinical sample of 4-to 18-year-old children and adolescents from Singapore, Ooi and colleagues (Ooi et al. 2011) also examined the utility of the school-aged CBCL as a screener for ASD. In line with the previous findings, the withdrawn/depressed and thought problems syndrome scales differentiated between youth with ASD and youth in other clinical comparison groups (i.e., attention-deficit/hyperactivity disorder (ADHD)inattentive type, ADHDhyperactiveimpulsive or combined subtype, and a referred but undiagnosed group). The social problems subscale had less discriminative power and only differentiated youth with ASD group from youth with ADHDinattentive type and the referred but undiagnosed group. Interestingly, the researchers also construed a special ASD scale that only contained individual CBCL items that discriminated between the ASD group and the comparison groups. The results showed that this ASD scale had better discriminative power than each of the three original syndrome scales.
A final study that seems relevant to discuss within this context was conducted by So et al. (2013) who also construed a special ASD scale based on ASEBA items for which scores statistically differed between youth with ASD and clinically referred youth with other psychiatric conditions and typically developing children and adolescents. Strong points of this study were that (a) the selection of items and the subsequent testing of the screening potential of the scale were performed in different samples (ages 6 to 18 years); and (b) that not only parents were used as informant (by completing the CBCL), but also the teachers (who filled in the TRF). Teacher information might make a valuable contribution to the detection of children and adolescents at risk for ASD as they have the opportunity to observe youth's functioning within the school setting and especially during interaction with peers. Thus, So et al. (2013) investigated the discriminative power of a CBCLbased, a TRF-based, and a combined CBCL/TRF-based ASD scale. The results demonstrated that both the CBCL-and the TRF-based ASD scales were able to discriminate between youth with ASD and youth with internalizing and externalizing disorders, referred children and adolescents without a diagnosis, and the total clinical control group. In addition, results showed that combining the parent and teacher information (the CBCL/TRF-based ASD scale) improved the chance of correctly identifying children and adolescents with ASD.
Altogether, research exploring the school-aged ASEBA scales as a potential screener for ASD seems to indicate that: (1) although the original syndrome scales can be used for this purpose, there are indications that it is better to construe and employ a special scale consisting of a number of selected items (Ooi et al. 2011;So et al. 2013); and (2) screening based on multiple informants (i.e., parents and teachers) may have better predictive value than screening based on only one informant (So et al. 2013), which is also well in line with the general recommendation regarding the assessment of child and adolescent psychopathology (De Los Reyes and Kazdin 2005). Given the potential benefits and utility of an ASD screener that is incorporated in a widely used assessment instrument such as the ASEBA scales, and given the small number of studies conducted so far, it seemed worthwhile to further investigate the utility of the CBCL and TRF as a screen for ASD.
With this in mind, the present investigation was designed to examine how the school-aged ASEBA scales can best be used for this purpose. In a mixed sample of clinically referred children and adolescents, we compared various ways of using this instrument for the identification of youth with ASD. A number of screening variants were evaluated, which relied on: (1) the separate withdrawn/depressed, social problems, and thought problems syndrome scales; (2) the ASD profile of Biederman et al. (2010), which combines these three syndrome scales; (3) the WTP scale as proposed by Havdahl et al. (2016), which only combines the withdrawn/depressed and thought problems syndrome scales; (4) the special ASD scale as construed by Ooi et al. (2011); and (5) the special ASD scale as developed by So et al. (2013; see Table 1 for an overview of the scales and the corresponding items). This research builds on previous studies but also extends this work in two important ways: (1) a consequent multi-informant approach was taken, implying that we tested each screening method not only relying on parent-report (CBCL) but also on teacher-report (TRF) data; (2) the capacity to discriminate between youth with ASD from other clinically referred youth may also critically depend on how ASD is defined. Currently, we employ the diagnostic criteria as defined in the DSM-5 (American Psychiatric Association 2013), whereas all the previous studies on the screening potential of the school-aged ASEBA scales used DSM-IV-based criteria to classify ASD. Further, although there is considerable consensus that standardized assessments such as the ADI-R and the ADOS-2 increase the diagnostic accuracy of ASD (Falkmer et al. 2013), their use in clinical practice as well as research is not widespread (Lord 2010) and indeed the earlier studies evaluating the ASEBA scales as a screen often did not make use of a standardized instrument to confirm the diagnosis of ASD. Therefore, in order to make a good comparison with the previous empirical work, we first of all tested the potential of the ASEBA instrument to screen for youth with a DSM-IV-based clinical diagnosis of ASD without the confirmation of a standardized assessment instrument. Subsequently, we examined the screening potential of the ASEBA scales for DSM-IV and DSM-5 diagnoses of ASD that were also confirmed by the ADI-R (Rutter et al. 2003). It was hypothesized that, irrespective of how the ASD diagnosis was defined, (1) the special ASD scales construed by Ooi et al. (2011) and So et al. (2013) would be better in screening for ASD than the existing syndrome scales; and (2) screening based on multiple informants (i.e., the combination of CBCL and TRF data) has better predictive value than screening based on only one informant.

Participants
The total sample, which in the further course of this paper will be referred to as the clinical DSM-IV diagnosis group, consisted of 132 participants of whom: 75 were diagnosed with ASD, and 57 were not diagnosed with ASD but with other DSM-IV diagnoses (clinical control group; see left columns of Table 2). 1 In the ASD group, 71% was diagnosed with PDD-NOS, 19% with Asperger's Disorder, and 11% with Autistic Disorder. The majority of the participants in the clinical control group had ADHD of the combined (47%) or inattentive (23%) subtype as primary diagnosis. In both groups, comparable comorbidity rates were found [χ 2 (1) = 1.50, p = .22]. In the ASD group, 84% of the children were diagnosed with at least one comorbid diagnosis and 43% even had 2 or more comorbid diagnoses. In the clinical control group, these percentages were 75% and 39%, respectively. The sample mainly contained participants with a Caucasian ethnic background (>90%). There were more boys than girls (79% versus 21%) and the mean age was 11.63 years (SD = 3.32), with a range of 6 to 18 years. IQ scores were only available for 67.4% of the participants and varied between 71 and 135 with an average IQ of 100.30 (SD = 14.58). There were no significant differences between the ASD and clinical control groups with regard to age [F(1, 132) = .14, p = .71], gender distribution [χ 2 (1) = 2.82, p = .09], and IQ [F(1, 89) = 2.95, p = .09].
To test the screening potential of the ASEBA scales under more stringent diagnostic conditions, we also made comparisons between ASD and clinical control participants for whom the diagnostic status (i.e., ASD or clinical control) had been confirmed by the ADI-R. Participants for whom there was no consensus between the clinical DSM diagnosis and the outcome of the ADI-R were discarded. In case the ADI-R was scored according to the DSM-IV criteria, 35 participants were removed, leaving 48 participants in the ADI-R supported DSM-IV diagnosis ASD group versus 49 participants in the clinical control group. When scoring the ADI-R by means of the DSM-5 criteria, 29 participants were discarded, leaving 57 participants in the ADI-R supported DSM-5 diagnosis ASD group versus 46 participants in the clinical control group. Irrespective of the DSM criteria that were used, the ADI-R supported ASD and clinical control groups did not differ in terms of age, gender distribution, and IQ (see Table 2).

Procedure
The sample consisted of children and adolescents who were referred to a specialized outpatient mental health center in The Netherlands that provides diagnostic and treatment facilities for children and adolescents with a broad range of psychiatric and psychosocial problems. All participants were subjected to an extensive diagnostic procedure according to the longitudinal-expert-all data (LEAD) principle (Spitzer 1983). J Psychopathol Behav Assess (2020) 42:25-37 Table 1 An overview of the ASEBA, CBCL, and TRF scales/items used to screen for ASD ASD profile (Biederman et al. 2010) WTP scale (Havdahl et al. 2016) Withdrawn/depressed Thought problems Social problems ASD scale (Ooi et al. 2011) ASD scale (So et al. 2013) There is liƩle that he/she enjoys Cannot get his/her mind off certain thoughts; obsessions  This means that diagnoses were made, and revised when necessary, by a multidisciplinary team that included licensed psychiatrists and psychologists, using information from multiple sources (i.e., child, parent, teachers) and multiple assessments (i.e., intake interviews, psychiatric examination, psychological assessment, and observations) during various stages of the clinical process. The CBCL and TRF were routinely completed as part of the intake assessment in order to screen for relevant problem areas, but these instruments were not decisive in the diagnostic process.
Most of the data (66%) were collected by examining existing case records. Cases referred to the outpatient center between 2011 and 2016 were included if the ADI-R had been administered during the diagnostic process, and CBCL and/or TRF data were available. Note that for these cases the ADI-R had already been administered during the diagnostic process because the presence of ASD was hypothesized. To further enlarge the ASD and control groups, additional parents of youth with ASD (who for some reason had not been tested with the ADI-R during the diagnostic process, for example because ASD had already been diagnosed in the past by another mental health institution) or ADHD were actively recruited for the study (between July 2014 and February 2016). Thus, in these cases the ADI-R was completed solely for the purpose of this study, which is why these parents received a small incentive (a voucher) for their participation. The study was approved by the local ethical review committee and participation only occurred after informed consent had been given.

Assessments
As noted in the introduction, the school-aged versions of the CBCL and TRF of the ASEBA instrument (Achenbach and Rescorla 2001) are widely employed standardized scales to assess competences and problems in children and adolescents aged 6 to 18 years. The problems part of these scales contains 113 items referring to behavioral, social, and emotional symptoms, for which parents and teachers are asked to rate the applicability to the child during the past period (parents: 6 months; teachers: 2 months) using a 3-point Likert scale (0 = not true, 1 = somewhat or sometimes true, and 2 = very true or often true). The psychometric properties of the CBCL are well-established in clinical and typically developing youths (Achenbach and Rescorla 2001;Bérubé and Achenbach 2010), and for children and adolescents with ASD (Pandolfi et al. 2012). In the present study, three separate syndrome scales were used as they are considered as particularly relevant for ASD: Withdrawn/depressed (CBCL α = .76; TRF α = .68), thought problems (CBCL α = .69; TRF α = .68), and social problems (CBCL α = .77; TRF α = .68). Two combinations of these existing subscales were also cal- The ADI-R (Rutter et al. 2003; Dutch version by De Jonge and De Bildt 2007) is a comprehensive, semi-structured diagnostic interview that is used to assess the typical features of ASD. The interview consists of 93 items referring to youth's developmental history and current behaviors in three domains: (1) language and communication, (2) reciprocal social interactions, and (3) restricted, repetitive, and stereotyped behaviors and interests. The ADI-R was administered to the parents of the children and adolescents by experienced psychologists who had been trained in conducting this interview. Parents' responses were scored by the interviewer with a score of 0 indicating the absence of a symptom, or a score of 1, 2, or 3 signaling the presence and specifying the severity of a symptom. The psychometric properties of the ADI-R are well-established (Lord et al. 1994). To verify the presence or absence of a DSM-IV diagnosis of ASD, the diagnostic algorithm of the ADI-R consisting of 42 items was applied. In order to confirm a DSM-IV ASD diagnosis, participants had to meet the cut-off for the social domain and either the communication or the repetitive domain; or meet criteria of the social or communication domain and having a required minimal number of symptoms related to the criteria of other domains (Risi et al. 2006). To confirm the DSM-5 diagnosis, the ADI-R was used following the scoring guideline as described by Huerta and colleagues (Huerta et al. 2012), which implies that participants have to score positively on at least one symptom of all three criteria referring to persistent deficits in social communicational and social interaction, and at least one symptom for two out of four criteria regarding restricted repetitive patterns of behavior, interests, or activities.

Data Analysis
In order to explore which of the five screening variants of the ASEBA is best suitable to predict the presence of ASD, a series of logistic regression analyses was conducted. In these analyses, raw scores on the CBCL and TRF scores were the predictors, while ASD diagnosis as obtained in three ways (i.e., clinical DSM-IV diagnosis, ADI-R supported DSM-IV diagnosis, and ADI-R supported DSM-5 diagnosis) was the dependent variable. For each outcome variable, separate analyses were conducted for the 5 screening variants (i.e., separate syndrome scales, the ASD profile, the WTP scale, and the two special ASD scales) and for the 3 informants (parent, teacher, and combined). Because we employed raw ASEBA scores, age and gender were included as covariates in the analyses.
In order to examine and compare the ability of the ASEBA screening variants to correctly classify children with and without ASD, Area Under the Curve (AUC) scores were calculated. The AUC is the area under the Receiver Operating Characteristic (ROC) curve which plots the true positive rate (sensitivity) against the false positive rate (specificity) for all possible cutoff points. The AUC ranges between 0.5 (chance level) and 1.0 (perfect fit). AUC scores between 0.50 and 0.69 can be considered as poor, scores between 0.70 and 0.79 as fair, scores between 0.80 and 0.89 as good, and scores of 0.90 and higher as excellent (Ferdinand 2008).
For the best ASD screening variants, the most optimal cutoff points were established. These cut-off points were based on a statistical optimal equilibrium between sensitivity and specificity. For each scale and corresponding cut-off point, the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. In order to test the hypothesized added value of combining CBCL and TRF scores (as compared to the use of only one informant), DeLong et al. (1988) test for pairwise comparison of ROC curves was applied.

Predicting the Clinical DSM-IV Diagnosis
CBCL As can be seen in the upper panel of Table 3, the results showed that the social problems and the thought problems syndrome scales on their own did not discriminate between the ASD and clinical control group. The withdrawn/depressed scale, the ASD profile (Biederman et al. 2010) and the WTP scale (Havdahl et al. 2016) did show statistically significant effects, but the OR's ranging from 1.07 to 1.22 were fairly small and the associated AUC scores (0.67-0.68) were relatively low. The results for the ASD scale of Ooi and colleagues (Ooi et al. 2011) and the ASD scale of So and colleagues (So et al. 2013) were more convincing, with OR's being 1.38 and 1.68, respectively and fair AUC scores of 0.74 and 0.78.
TRF The analyses performed on the TRF revealed that neither the separate syndrome scales nor their combinations yielded significant results (see middle panel of Table 3). Again, So et al.'s (2013) and Ooi et al.'s (2011) ASD scales did significantly discriminate between youth with and without ASD, although the OR's of 1.20 and 1.26 were quite modest and the AUC scores of 0.67 and 0.66 should be interpreted as poor.
CBCL & TRF When combining the CBCL and TRF, only the WTP scale (Havdahl et al. 2016) and the ASD scales of Ooi et al. (2011) and So et al. (2013) were found to significantly discriminate between youth with and without ASD (see lower panel of Table 3). When looking at the OR's the two specially developed ASD scales seemed to perform better than the WTP profile, although the AUC scores for all three models were in the fair range.
Predicting the ADI-R Supported DSM-IV Diagnosis CBCL The analyses for the ADI-R supported DSM-IV diagnosis revealed statistically significant effects for all scales except for thought problems (see upper panel of Table 4). The OR's varied from 1.10 to 1.62 and AUC scores ranged between 0.67 and 0.83. In line with the previously described results, the ASD scales of Ooi et al. (2011) and So et al. (2013) produced the highest OR's and AUC scores well above .80.
TRF The analyses using TRF scores as predictor of ADI-R supported DSM-IV diagnosis involved a smaller number of participants and were therefore subject to power problems. This might explain why both the logistic regression analyses and the ROC analyses yielded no significant effects (see middle panel of Table 4). Thus, none of the proposed TRF scales was able to predict an ADI-R supported DSM-IV diagnosis.
CBCL & TRF Despite the small sample size, the combined CBCL and TRF ASD scales of Ooi et al. (2011) and So et al. (2013) significantly discriminated between children with ASD and the clinical comparison group. The OR's were respectively 1.29 and 1.31 and the accompanying AUC scores 0.77 and 0.78 (see lower panel of Table 4). Other scales did not significantly distinguish between youth with and without ASD.

Predicting the ADI-R Supported DSM-5 Diagnosis
CBCL The logistic regression analyses testing the various CBCL scales as predictors of an ADI-R supported DSM-5 diagnosis of ASD (see upper panel of Table 5) revealed that, with the exception of the thought problems syndrome scale, all scales differentiated between youth with and without ASD. However, note that the OR's were quite divergent with values ranging from 1.11 to 1.75, and an inspection of the AUC scores indicated that especially the ASD scale of So et al. (2013) had the best predictive value with an AUC score of 0.86, closely followed by the ASD scale of Ooi et al. (2011) which had an AUC score of .83.
TRF Only the the ASD scale of Ooi et al. (2011) was able to significantly predict the ADI-R supported DSM-5 diagnosis, with an OR of 1.54 and a fair AUC score of 0.75 (see middle panel of Table 5).

CBCL & TRF
The results of the logistic regression analyses using the combined CBCL and TRF scales as predictors again showed that the ASD scales of Ooi et al. (2011) and So et al. (2013) significantly discriminated between youth with and without ASD (see lower panel of Table 5). OR's were 1.44 and 1.57, respectively, and the corresponding AUC scores of 0.81 and 0.86 were good. Individual syndrome scales as well as their combinations did not show significant effects.

Additional Examination of the Best Screening Scales
Overall, the results demonstrated that the ASD scales of Ooi et al. (2011) and So et al. (2013) were best in discriminating between ASD and clinical control youth, and thus had most predictive power for the diagnosis of this disorder (as established in various ways). In order to further examine the screening potential of these two scales, we determined the most optimal cut-off points and explored their sensitivity, specificity, positive predictive value, and negative predictive value. The optimal cut-off points and the relevant statistics are presented in Table 6. As can be seen, results were highly comparable for both ASD scales. That is, their predictive validity was most optimal in case the CBCL data were used. For the ASD scale of Ooi et al. (2011), the CBCL was capable of correctly identifying between 73 to 83% of the children with ASD (sensitivity) and between 69 to 72% of the children without the disorder (specificity). In addition, 71 to 75% of the children and adolescents meeting the cut-off score on the CBCL was indeed diagnosed with ASD (PPV) and 65 to 79% of the children and adolescents who did not meet the cut-off indeed did not have the disorder (NPV). For the ASD scale of So et al. (2013) similar figures were found. Thus, when employing the CBCL, between 72 to 76% of the children and adolescents were correctly identified with ASD (sensitivity) and 65 to 71% of them were identified as not having the disorder (specificity). Further, 71 to 76% of the youth meeting the cut-off score were indeed diagnosed with ASD (PPV), while 64 to 75% of the youth who did not meet the cut-off score indeed were not diagnosed with the disorder (NPV). On first sight, the Ooi et al. (2011) and So et al. (2013) scales based on the TRF or the combined CBCL and TRF data appeared less useful than the ASD scales based on the CBCL. As can be seen in Table 6, in particular the specificity and NPV decreased when the TRF or combined CBCL/TRF data were employed. However, it should be noted that DeLong et al. (1988)

Discussion
A screening possibility for ASD within the widely applied ASEBA scales CBCL and TRF could be of great value for daily clinical practice as this might facilitate the diagnostic process without additional costs, time, and effort. In the present study, five different screening variants were compared using CBCL, TRF, and combined CBCL and TRF data within a clinically referred sample. When using the CBCL as a screen for ASD, the results demonstrated that the special ASD scales of Ooi et al. (2011) and So et al. (2013) had the best potential to discriminate children with ASD (as diagnosed in various ways) from children without ASD, with odds ratios and AUC values being clearly superior to those found for screens based on individual or combinations of CBCL syndrome scales. A similar conclusion was true for the TRF: Again the special ASD scales of Ooi et al. (2011) and So et al. (2013) had the best screening potential, although it should be noted that odds ratios and AUC values were somewhat lower than those obtained for the CBCL. Note that this pattern of findings was robust and not dependent on the way ASD had been defined (i.e., clinical DSM-IV diagnosis or ADI-R supported DSM-IV or DSM-5 diagnosis). Altogether, these findings confirm our first hypothesis that the special ASD scales as construed by Ooi et al. (2011) and So et al. (2013) are better in screening for ASD than the existing syndrome scales of the ASEBA instrument.
In passing, it should be mentioned that our data suggest that parents (CBCL) are better informants when diagnosing youth with ASD than teachers (TRF). In a way, this makes sense as parents can observe their child in wide range of situations and oftenvia communication with the teacherare also well aware of the child's behavior and functioning at school. In contrast, the teacher's information is mainly based on what can be observed at school; he/she has little information about the child's behaviors in other situations. Meanwhile, it may well be that the methodology of the present study was more 'in favor' of parents than of teachers, because the former were obviously more involved in the diagnostic process (e.g., administration of the ADI-R) than the latter. When using the combined CBCL and TRF data, the special ASD scales again produced the best outcomes. However, no evidence was obtained for our second hypothesis that the employment of combined CBCL and TRF data would enhance the screening potential as compared to using ASEBA scale data of only one informant. This is in contrast with So et al. (2013) who found that the predictive power for detecting ASD significantly increased when taking the TRF into account. Note, however, that even in the So et al. (2013) study the additional value of including the TRF in the screening procedure was also quite limited. Depending on the comparison group used, the combined CBCL-and TRF-based ASD scale yielded 1-8% increase in the accuracy of predicting ASD as compared to the ASD scale based on the CBCL or TRF alone.
It should be mentioned, that the ASD scales of Ooi et al. (2011) and So et al. (2013) display considerable overlap in terms of item content. The six common items of the CBCL/ TRF that are included in both versions appear to cover delayed development/communication (i.e., "Acts too young for his/her age" and "Speech problem"), social problems (i.e., "Would rather be alone than with others" and "Withdrawn, doesn't get involved with others"), and awkward/stereotyped behaviors (i.e., "Repeats certain acts over and over; compulsions" and "Strange behavior"), which nicely capture the key symptoms of ASD (American Psychiatric Association 2000.
Not surprisingly, the screening potential of both scales appeared to be highly comparable.
The predictive value of the ASD scales of Ooi et al. (2011) and So et al. (2013) is comparable to that of the standard syndrome and DSM-oriented scales of the Achenbach scales for detecting other types of child and adolescent psychopathology such as anxiety disorders, depression, ADHD, and disruptive behavior disorders (Ebesutani et al. 2010;Ferdinand 2008). However, in absolute terms, the screening potential of these ASD scales appears limited. That is, the sensitivity, specificity, PPV, and NPV percentages were clearly inferior to those obtained for specific screening instruments such as the SCQ, SRS, and CCC that can be employed for detecting ASD in youth with IQ > 70 (Auyeung et al. 2008;Charman et al. 2007). However, given their quite good sensitivity and PPV percentages, the ASEBA-based ASD scales as proposed by Ooi et al. (2011) and So et al. (2013) especially when parents are used as informant, seem to be useful as a first step in the diagnostic process. Subsequently, the more specific instruments could be administered, before launching more elaborate diagnostic tests such as the ADI-R and the ADOS-2.
The present study suffers from a number of limitations. First, the sample size was relatively small and this was especially true for the teacher report data, which may have resulted in limited power for testing the predictive value for ASD by means of the Achenbach scales. Second, the present study made use of a convenience sample of children and adolescents who were referred to a regular outpatient treatment facility. This implied that most cases were "diagnostic puzzles" for which the ASD picture was not particularly clear. In other words, most youth who were ultimately diagnosed with ASD were not showing symptoms on the extreme end of the spectrum, whereas at least part of the youth in the clinical control group at least showed some ASD-like characteristics. Table 6 Sensitivity, specificity, PPV, and NPV (rounded off) percentages for the optimal cut-off points found for the ASEBA screening scales as proposed by Ooi et al. (2011) and So et al. (2013) for identifying ASD (as defined by clinical DSM-IV diagnosis, ADI-R supported DSM-IV diagnosis, and ADI-R supported DSM-5 diagnosis) Clinical DSM-IV diagnosis ADI-R supported DSM-IV diagnosis ADI-R supported DSM-5 diagnosis In addition, most youth in the control group had a primary diagnosis of ADHD, a developmental disorder showing considerable comorbidity and overlap in symptoms and problems with ASD (Ronald et al. 2014;Taurines et al. 2012). In other words, the composition of this sample probably made it more difficult to distinguish between youth with and without ASD, and as such the screening potential of the ASEBA scales might be even better than the current findings suggest. Third and finally, the reliability of the special ASD scales of Ooi et al. (2011) and So et al. (2013) appeared to be quite modest.
That is, previous studies documented Cronbach's alphas in the .56 to .78 range (Ooi et al. 2011;So et al. 2013), whereas in the present investigation internal consistency coefficients between .57 and .61 were found. This may well have to do with the fact that ASD symptoms are quite heterogeneous (Masi et al. 2017): they include social-communicative difficulties as well as restrictive and repetitive patterns of behavior and individual children and adolescents with this condition can be quite different in terms of symptomatology. Moreover, as noted above, the present sample included few severe cases of ASD, which may also have undermined the emergence of more substantial correlations among items, which is a prerequisite for finding a high internal consistency coefficient for the special ASD scales. In spite of these shortcomings, the present study replicated that the ASEBA-based scales as proposed by Ooi et al. (2011) and So et al. (2013) especially when completed by the parent are valuable screens for ASD in a clinically referred sample. These special ASD scales turned out to be better predictors of ASD than the existing syndrome scales and their combinations, and this was independent of the way the diagnosis was established (i.e., clinical versus ADI-R supported diagnosis, DSM-IV versus DSM-5). Clearly more research with large, representative, and well-described clinical and non-clinical samples is needed to definitively evaluate the utility of the CBCL and TRF as screening instruments for ASD. More precisely, based on the results of the present study, it is clear that future research should concentrate on the two special ASD scales of Ooi et al. (2011) and So et al. (2013). In addition, a major point of interest for future research is determining adequate cut-off scores. There is an inversely proportional relation between the sensitivity and the specificity. Increasing the sensitivity and decreasing false negatives is always at the expense of the specificity and false positives. The optimal cut-off score is highly dependent on the purpose and context of the instrument. Within a referred sample, high sensitivity would be preferred when the ASEBA is used as an initial screen. However, such a strategy seems less useful when screening the general population as it may result in the detection (and referral) of many false positives.
Although additional research is needed, the results have clear clinical implications as they underscore the potential of an ASD subscale included in the Achenbach scales. This ASD subscale could be a good, easy-to-administer initial screen for detecting this type of psychopathology. Because of the wide use of the ASEBA instrument, no extra costs have to be made. The results also indicate that this is only an initial screen and that further diagnostic evaluation (i.e., ADI-R, ADOS-2) is needed to definitively establish the diagnosis. Meeting the cut-off score should alert clinicians of the possibility of ASD and encourage them to include ASD in their differential diagnostic consideration and their decision making with regard to further diagnostic procedures and/or referral strategy.