In the past few decades, many studies have documented behavioural manifestations of ASD during the first 2 years of life [15]. Nevertheless, there is still a large delay between the first parental concerns, the first consultation, and the age at which the diagnosis is made [68]. Early identification and subsequent intervention lead to a better prognosis for the child. Intervention may prevent secondary developmental disturbances [911] and reduce family stress [6, 12] and societal costs [1315]. Thus, there is a need to develop methods and instruments for early identification of ASD.

The first attempt to develop a prospective screening instrument for ASD was made in Europe by Baron-Cohen and his colleagues in the UK with the Checklist for Autism in Toddlers (CHAT; [16]). In over 20 years that have elapsed since the CHAT was presented, much progress has been made, with more than 20 ASD screening instruments currently available at international level (Table 1). It remains to be seen, however, whether current screening instruments fulfil the criteria for large-scale implementation [41]. Although a number of studies have shown that early ASD screening is feasible, there are still several issues to be addressed. Experts have noted that few screening instruments are well-evaluated and that it is important for both clinical and research purposes to collect more structured, in-depth information on existing screening procedures [42].

Table 1 ASD screening tools

Novel screening instruments have been developed in Europe over the past decade, including the Early Screening of Autistic Traits in The Netherlands (ESAT; [22, 43]) and the Checklist for Early Signs of Developmental Disorders in Belgium (CESDD; [24]). Screening instruments have also been translated, culturally adapted and tested in countries other than those where they were originally developed, e.g. the Modified-Checklist for Autism in Toddlers (M-CHAT; [19]) in Spain [44] and in Sweden [27]. Other European countries, such as France, Italy and Finland, are currently engaged in evaluating other screening procedures for which results are still to be published.

To date there has been little exchange of information among researchers across Europe regarding the details of the screening procedures used and the difficulties encountered during screening. There are very few studies that report on rigorous direct comparisons of different screening procedures in similar circumstances [45, 46]. Rather than developing new screening instruments, a careful look at previous and ongoing ASD screening programmes in Europe might instead provide key insight for improving current and future screening procedures. Examination of the same screening procedures in different samples and contexts may be a good way of identifying strengths and weaknesses. In addition, evaluating the effectiveness of different adaptations of existing screening procedures may contribute to identifying the factors that influence screening outcomes.

The COST Action ‘Enhancing the Scientific Study of Early Autism’ (ESSEA) has brought together a group of European researchers who use screening instruments to identify ASD prospectively at an early age [47]. One of the aims of this collaboration is to identify which screening instruments perform best in a given context. Current health care, social and educational systems across Europe vary greatly in terms of expertise and capacity to identify children with ASD at a young age, often leading to marginalisation and disparities between social classes on the mean age of diagnosis [48, 49]. The positive effects of early screening to reduce racial/ethnic and socio-economic status inequalities in age of first diagnosis are promising [50] although these effects have to be further explored [51]. Indeed, there are no European ASD screening guidelines. Even within individual countries, societal, demographic and service factors might affect how screening works, and yet these factors do not tend to be well described in studies. The purpose of this paper is thus to describe the procedures used in ASD screening studies conducted across Europe, and to summarise the respective factors and methodological issues which might have influenced the results of the different studies.

Current situation of ASD screening studies in Europe

To obtain a complete picture of the status of ASD screening in Europe, we used a two-pronged search process (See Fig. 1). A search of the scientific literature was made covering the PubMED and PsycINFO databases and using the following search terms: ‘autism’ OR ‘autism spectrum disorder’ AND ‘screening’ or ‘identification’ or ‘detection’, with “1992–2012 Pub-date” and “English language” as advanced filters. This search retrieved over 700 citations. Perusal of the titles, authors and abstracts of these citations to discard any study that had been not undertaken in Europe, yielded a net total of 16 papers. When reviewing these papers, the following additional selection criteria were applied for their final inclusion: (a) design: population based; (b) participants: children under the age of 4 years at first screening and with no prior diagnosis of developmental delay (no school-age tool); and, (c) gold-standard diagnostic procedure: DSM-IV-TR criteria for pervasive developmental disorders (PDDs), also known as autism spectrum disorders (ASDs) [52] and the autism diagnostic observation schedule (ADOS; [24]). The reference lists of all relevant studies were checked to identify any additional publications. Using these selection criteria, papers reporting screening at school age, as in Finland [53, 54] and the UK [5560], were excluded. Similarly excluded were the study conducted in Ireland [61] because it did not use the DSM-IV as standard diagnostic procedure, and the study undertaken by Allison et al. [20] because it was not population based. Eight studies reporting 15 screening procedures for young pre-schoolers with ASD in Europe were retained for review.

Fig. 1
figure 1

Searching strategy for ASD screening studies in Europe. Letter a indicates new literature review and consultation of ESSEA-COST members have been carried out just before March 2014 but none new ASD screening studies in Europe have been published either communicated to main authors apart from the already included

Secondly, researchers within the ESSEA COST action network were approached to ascertain whether there might be any other ongoing, as yet unpublished screening programmes. As a result, a further three screening procedures were identified in France, Italy and Finland, and preliminary data were incorporated into this review, leading to a total of 18 different screening procedures. Where published studies failed to provide data on sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV), these measures were estimated from the data, if available (to be taken with caution since different protocol adaptations are used). In addition, all main authors were asked to provide clarification regarding the procedures and results of their studies, as well as verification of the information to be included in this paper. An overview can be found in Table 2.

Table 2 Overview of European screening studies

This table shows information on the number of completed and ongoing ASD screening studies across Europe. Over 70,000 children have been screened in Europe to date. Nine of the 28 European Union Member States (32 %) have conducted or are conducting ongoing ASD screening studies (although some were one-off research studies, as in the UK). Italy and Spain are the only Southern European countries which have reported any ASD screening experience (ongoing health surveillance programmes in both cases). Belgium is the only country where the screening study was set in child day-care centres rather than in primary care. Five countries have used or are using the M-CHAT as their screening instrument of choice (sometimes together with another ASD screening tool). A contemporary map of Europe with the information compiled through the ESSEA COST network in 2012 is depicted in Fig. 2.

Fig. 2
figure 2

Map of the situation of ASD European screening studies in 2012–2013

Through the ESSEA-COST network, we also gathered first-hand information about ASD screening in Norway. The Autism Birth Cohort (ABC), a sub study of the Norwegian Mother and Child Cohort Study (MoBa) has included several ASD checklists on the 18-month questionnaire, i.e. M-CHAT, ESAT and the Non-Verbal Communication Checklist (NVCC) (Schjolberg, submitted). At age 36 months, the 40-item Social Communication Questionnaire (SCQ) has been used to screen for ASD in the complete MoBa cohort (N ~ 100,000). Screen-positive children underwent a full-day diagnostic evaluation using ADI-R and ADOS. The entire MoBa cohort is followed up at 8 years with the complete SCQ enabling researchers to examine ASD symptom patterns from early age to 8 years. Linkage to the Norwegian National Patient Registry (NPR) makes it possible to identify false negatives from the early screening. This study represents the largest sample of children screened for ASD in Europe (approximately 100,000), though it is not an ASD screening programme per se and indices on the screening tools are not yet published. The study is described in Stoltenberg et al. [65], and the relationship between screen positivity at 36 months and subsequent ASD diagnosis at assessment are being prepared for publication (Bresnahan et al., in prep.). Beuker et al. [66] have examined whether ASD symptoms in 18-month-old children fit the 3-factor structure, as described in DSM-IV. Characteristics of M-CHAT at 18 months compared to later diagnostic status based on clinical assessment or NPR (ASD vs non-ASD) are in preparation for publication (Stenberg et al., in review).

A second reading of the full text of the selected papers was completed by the main authors of this paper (PGP and AH). Study methodologies were thoroughly reviewed to identify differences among screening procedures, as well as the main factors that might influence screening programme results. As a result, a list was drawn up containing ten critical factors to be considered when assessing screening studies. To contextualise these factors, additional information from both European and non-European studies was included, where appropriate.

Factors to be considered when evaluating screening studies

The ten factors to be borne in mind when assessing screening studies are: (1) broad-based analysis of validity indices; (2) prevalence rates and PPV interpretation; (3) age of screening; (4) level of functioning and autism severity; (5) selection and formulation of items; (6) cut-off criteria; (7) protocol adherence; (8) informants; (9) parental non-compliance rate; and (10) setting characteristics: organisation of services, as shown in Table 3. Each of these methodological issues will now be addressed in turn.

Table 3 Factors to be considered when evaluating screening studies

Broad-based analysis of validity indices

Studies report several parameters that assess the efficacy of screening instruments. Sensitivity and specificity are often considered the most important criteria of validity. A major challenge, however, is the interpretation of these values. Although interpretation is facilitated by the establishment of quantitative criteria, with values of 0.70 or higher being acceptable for developmental disorders [67], a more comprehensive approach to interpreting these parameters is called for. A trade-off between sensitivity and specificity is common. A screening procedure with a high sensitivity will often have a high false-positive rate, thereby lowering its specificity. Screening methods with a high specificity will usually sacrifice sensitivity by increasing the false-negative rate. This is also demonstrated by some of the screening procedures in Table 2. For instance, the CHAT-1 + CHAT-2 (second administration of CHAT after a high-risk result in CHAT-1) has an excellent specificity of 1.00 combined with a very poor sensitivity of 0.21 [62]. It has been suggested that sensitivity is the measure of greatest concern [68, 69]. The drawback of many false negatives (low sensitivity) is that many children who will go on to develop ASD are missed. This precludes early diagnosis and early initiation of treatment and family support for such children and their families. On the other hand, a low specificity also has negative implications. False positive cases are evaluated through costly assessment procedures, not to mention the possible stigmatisation of the child and the additional family stress caused by falsely alarming parents [70]. These consequences resulting from an erroneous positive identification could be considered as negative side effects of a screening programme with insufficient specificity. However, when interpreting the false-positive rate, it is crucial to consider the proportion of false-positive cases that have another developmental delay or disorder. Dietz et al. [43] reported that 25 % of all ESAT false-positive cases had a language disorder, and 18 % of the false-positive cases were diagnosed with intellectual disability. These findings raise questions as to whether screening procedures should target ASD specifically or developmental disorders and delays in general [70]. Instead of immediately rejecting a screening procedure with a high false-positive rate, a more in-depth look may indicate that the screening procedure is helpful in detecting children who benefit from further diagnostic assessment and treatment at an early age. The amount of false-positive cases having another developmental disorder justifies the need to examine developmental trajectories, to gain insight into which early signs are specific for ASD [71]. Performing screening through a two-stage process before any diagnosis referral (which is characteristic of most procedures in Table 2) may help to narrow down false-positive rate and thereby reduce the above-mentioned possible side effects of screening.

Prevalence rates and PPV interpretation

A high degree of variation in ASD prevalence has been reported. Age, diagnostic criteria and region have been found to be associated with ASD prevalence rates [72]. Although the PPV is often considered to be the most useful information for the clinician [73], its value depends on the prevalence rate in the population screened. This might explain why the PPV was lower in the Spanish M-CHAT study than in other M-CHAT studies [44]. The frequency of ASD cases observed in the Spanish study (0.92 % in Stage 1; and 0.29 % in Stage 2 based only on a general population sample) was much lower than that reported by other M-CHAT studies (e.g. 2,7 % in Kamio et al. [74]; 2.66 % in Robins et al. [19], 3.03 % in Kleinman et al. [75] and 2 % in Pandey et al. [76], in which most ASD cases came from their referred early intervention sample rather than from their general paediatric practices. These considerations highlight the importance of knowing the prevalence of ASD in the population targeted for screening, instead of relying on the PPV reported by another study with another prevalence rate in, say, a different age range [77]. One method for calculating the validity of a screening instrument which takes into account ASD prevalence is Bayes Theorem. According to this theorem, the chance of a disease being truly present depends on both the prevalence of the disease and the properties of the test, essentially the likelihood ratio [77, 78]. Rather than using the prevalence in a specific sample, e.g. by examining the clinicians’ records within that specific context, as recommended by Camp [73], some authors have instead used prevalence rates drawn from a different sample to estimate validity properties [73]. For instance, Groen et al. [78] used the prevalence rates reported by Baird et al. [62] to evaluate the validity of the ESAT [78]. When clinicians use these numbers to support their choice of the ESAT, it should be borne in mind that there might be a difference in prevalence rates between populations. This underscores the importance of clarity as regards the prevalence rate used in validity studies and the usefulness of pre-test odds and likelihood ratios. Since prevalence of autism in the general, unselected population is very low [79], Groen et al. [78] suggest that one possibility of increasing the post-test odds is to increase the pre-test odds by applying screening instruments solely to selected children who are either found to have a deviant developmental path in routine developmental surveillance, or found to have high-risk status by other means.

Age of screening

Several studies show that parents have concerns about children who later develop ASD within the first 2 years of life: De Giacomo and Fombonne [80] report an average age of 19 months, while Chawarska et al. [81] report an age of 14 months. In the presence of intellectual disability, an older sibling, concerns for medical problems, or a delay in developmental milestones, the age of parental concern were lower [80]. Yet, detecting ASD at a very early age is not exempt from considerable difficulties, since it may be difficult to differentiate ASD from other developmental disorders [82], or even to differentiate ASD from typical development [83]. For example, repetitive behaviours are also present in young children with typical development [84]. Moreover, many behaviours that capture joint attention skills, such as gaze monitoring and protodeclarative pointing, develop gradually from age 9 to 18 months in typically developing children [85], and are only a clear clinical sign when they have not appeared after the age of 18 months. Difficulties in differentiating ASD from other developmental disorders at a very early age are consistent both with findings from the ESAT screening at 14 months which resulted in a high number of false positives, though none of these children had typical development [43], and with the CESDD study [24]. Dereu et al. [86] report many false positives, specifically in the younger age group. Moreover, the false-negative rate might also be higher at a young age due either to late onset of ASD or to the fact that about 30 % of children show regression after a period of typical development [81]. It is also plausible that milder variants of ASD, and children with a higher level of cognitive development could be missed at a young age [43]. Thus, when interpreting validity indices, it is important to consider the age of the sample during screening and diagnostics.

Level of functioning and autism severity

Since ASDs are associated with a broad range of intellectual and language skills that change over time, level of functioning and autism severity are important factors to consider when evaluating screening methods. Children who were not identified by the CHAT but were later diagnosed with ASD were found to be higher functioning in a variety of areas and were rated as less severe on autism assessment measures [87]. A study by Kleinman et al. [75] showed that this was similar for M-CHAT, with false-negative cases being higher functioning than positive M-CHAT ASD cases. The SCQ showed better discriminative validity in toddlers with intellectual disability than in those without intellectual disability, and also showed that IQ significantly predicted SCQ scores [88]. This may reflect the fact that higher functioning toddlers with ASD are more difficult to distinguish from their high-risk, non-spectrum peers than are low functioning toddlers. Since screening instruments are intended for broad use, an effect of IQ is a problem. In a different study, Oosterling et al. [88] reported that, after a screening procedure with the ESAT, about 75–85 % of the children referred before 36 months with narrowly defined autism had intellectual disability. Difficulties in screening for ASD in young children, and difficulties with diagnostic discrimination in high-risk children in particular, are issues that are not necessarily specific to the screening tool, especially with regard to specificity, but rather to the IQ or risk status of the children [88]. Hence, clarity regarding the characteristics of the sample used is very important when interpreting the psychometric properties of the instruments under investigation.

Selection and formulation of items

ASD screening procedures vary in the items included to identify children at risk. Social-communicative impairments are considered to be central to ASD [52] and are therefore always part of screening procedures. The item ‘lack of following joint attention’ was indeed one of the items that was most effective in distinguishing ASD from non-ASD cases when using the CESDD [88], and the CHAT mainly consists of items on initiating and following joint attention [16]. Social-communicative items in the ESAT, including ‘shows interest in people’, ‘smiles directly’ and ‘reacts when spoken to’ also discriminated best between children with and without ASD [43]. Even so, many studies have shown that screening procedures which focus exclusively on social-communicative impairments might overlook other early signs of ASD. In a familial, high-risk sibling sample, Zwaigenbaum et al. [4] showed that early behavioural markers for ASD include atypical markers in visual tracking, disengagement of visual attention and sensory-oriented behaviours. Gillberg [89] reported that ‘does not play like other children’ was among the three most discriminating items and further suggested that abnormal perceptual responses are important for identification of ASD. Other studies have supported the existence of abnormalities in play and sensory-motor behaviours at an early age [3, 90]. The results of these studies have broadened the focus of screening instruments for ASD, and this has been effective. Among the items with the highest odds ratio in the CESDD study were ‘lack of symbolic play’ and ‘unusual sensory behaviour’ [24]. In addition to the CESDD, many other screening instruments (ESAT and M-CHAT) have included items focused on play and sensory-motor behaviours. Baird et al. [62] suggest that specifically the combination of failing joint attention and pretend play at 18 months indicates risk of developing ASD. The fact that sensory and motor items have not been included in all screening tools might be due to the fact that parents do not mention these items spontaneously. However, when parents have been questioned about these items specifically, they report having noticed such abnormalities from an early age [22]. At a young age it might be useful to take play-related behaviours into account, while at an older age, impairments in social interaction and communication might become more specific behavioural markers for ASD. In the ESAT study some items, such as ‘gaze following’, had a relatively high proportion of negative answers for children younger than 12 months because this trait is still developing in the first year of life [43]. For instance, the First Year Inventory (FYI) [23] developed to assess behaviours in 12-month-old infants and the ESAT [43] developed for 14-month-old infants include more play-related and sensory-motor behaviours than does the SCQ [91], which was originally developed for individuals aged 4 years and over. On the other hand, the SCQ includes items such as ‘pronoun reversal’, ‘verbal rituals’ and ‘no friends’, which are more appropriate for somewhat older children. Differences in the formulation of items might also affect the responses. The CESDD, for example, includes the item ‘lack of showing objects to others to indicate interest’, which was recognised in 64.52 % of children with ASD. In contrast, the item ‘absence of showing’ in the ESAT and M-CHAT was recognised in only 26.67 and 28.57 % of children with ASD, respectively, while ‘no showing’ in the SCQ was recognised in only 13.04 % [86]. Baird et al. [62] also point to the fact that in the CHAT parents were asked to report whether their child had ‘ever’ produced certain behaviour, while if they had been asked if their children had only ‘rarely’ produced such behaviours, the instrument’s sensitivity might have been higher, though at the cost of its PPV and specificity.

Cut-off criteria

Instead of continuing to develop new screening methods for ASD, a more elaborate evaluation of current screening methods might be helpful. One way of achieving this is to explore different criteria within the same screening procedure, using different cut-off scores for different purposes and populations. Comparing the validity indices of the CESDD in combination with an SCQ cut-off of 11 to those of the CESDD in combination with an SCQ cut-off of 15 demonstrated that lowering the SCQ cut-off to 11 improved sensitivity from 0.42 to 0.70 while maintaining good specificity (Dereu et al., unpublished data). Oosterling et al. [63] also explored different criteria of the SCQ (cut-off 11 vs. 15) and the CHAT (high or high + medium risk considered positive). This study showed that, whereas sensitivity was higher for the SCQ cut-off of 11 as found in Wiggins et al. [92], specificity was higher for the SCQ cut-off of 15. In the case of CHAT validity, the high-risk + medium-risk criterion improved sensitivity considerably (from 0.18 to 0.48) while keeping specificity high, i.e. 0.99 for the high-risk criterion and 0.87 for the high-risk + medium-risk criterion. In addition, Scambler et al. [87] described how a slight change in CHAT criteria to allow parents to endorse either of two critical items, improved CHAT sensitivity by 20 % while maintaining specificity of 100 % in a group of children with developmental disabilities. In the Spanish M-CHAT study, false-positive cases were found to be reduced if the M-CHAT was only deemed to be positive after five [44] as opposed to three failed items [19].

Protocol adherence

Another factor that may cause variation in screening results is the fact that the same screening procedure is often implemented in different ways. Administration is not consistent across different studies. Researchers and clinicians adapt the original protocol of the screening procedure to their own needs and circumstances. The M-CHAT, for instance, comprises a 23-item yes/no parent report and a follow-up telephone interview. This interview was added to the initial M-CHAT protocol to reduce the number of false positives [19]. Kleinman et al. [75] found that by adding a telephone interview to the screening procedure, the PPV was improved from 0.36 to 0.74. This was especially important in the low-risk general population. Both Nygren et al. and Canal-Bedia et al. [27, 44] indicate that the interview is necessary because items are sometimes misunderstood. Although adding the phone interview proved effective, it should be noted that some researchers have adapted this procedure. Dereu et al. [86] did not include the telephone interview, so that positive screens on the M-CHAT were based exclusively on parent report. This may have affected the PPV, which was 0.29, for the procedure, which consisted of the CESDD with the M-CHAT but without the telephone interview. In some cases, however, it may be more effective to forget the interview. In a case where children fail seven or more items in M-CHAT initial screening, a follow-up interview may not be necessary [93]. Such children can be immediately referred for further evaluation. An alternative way of conducting the follow-up interview is to be seen in Spain, where the M-CHAT interview is computer-based and performed directly by the paediatrician after a positive result, by asking the parents about the failures, an approach that obviously facilitates administration of the follow-up process [64] or implementing the M-CHAT entirely in electronic format [94]. Another example of alternative administration can be found in the study by Oosterling et al. [63]: instead of using the CHAT as a separate instrument, items from the SCQ and CSBS-DP were combined to represent CHAT items, which probably influenced the results. When implementing a study protocol, adherence and deviation should be balanced, bearing in mind the specific purpose and resources of the study. It needs to be specified here that a revised version of the M-CHAT (M-CHAT-R/F; [95]) with an algorithm based on three risk levels has been recently published and recommended for primary care settings.

Informants and training

The information extracted from the studies reviewed shows that many different informants are used in ASD screening. Filipek et al. [96] noted that parents are often correct in their concerns about their child’s development. Although parents may not be as accurate when it comes to specific ASD deficits, they are almost always accurate in detecting a developmental problem [67]. Since parental checklists, such as the M-CHAT, are easy to administer, they are often used for screening purposes. Yet, parents may not know exactly what skills to expect at a certain age and are not able to compare their child with peers [86]. Furthermore, parents may also over- or under-report problems in their child. In the ESAT study [43], ASD experts evaluated children’s behaviour more negatively than did their parents, to the extent that 3 out of 18 children diagnosed with ASD would have scored below threshold on the 14-item ESAT if only parent rating had been used. Accordingly, parental information should be combined with observations by a professional, such as a physician. Physicians, and paediatricians in particular, possess knowledge about typical child development [88, 97] and are able to compare the behaviour of the child to that of his/her peers. It should be noted, however, that physicians have to base their clinical judgment on a brief observation of the child and a short conversation with the parents. Moreover, the behaviour of the child when examined by the physician or another clinician may not represent the child’s typical behaviour in a natural context. To prevent the problems posed by only parents’ or physicians’ reports, child care workers might also be very useful as informants; since they can compare behaviour and the development of the child directly to that of other children and are educated in typical development. In addition, children may behave more typically in a child care setting than at a medical practice, since children often visit child care on a regular basis [24].

Other authors have also suggested the possible contribution of child care workers to ASD screening in young children [98]. In the UK, the NICE guidelines recommend training professionals in early signs of ASD at pre-school and school ages [99]. It is important to understand that training physicians and professionals in recognising early signs of ASD might make a crucial difference in the results of screening. The DIANE Project in The Netherlands [88] is a good example of health care professional training, in which small groups of primary care workers attended a compulsory course of interactive training sessions. The main part of the training sessions included a review of early signs of autism and all ESAT items, illustrated by video clips showing children with abnormal or absent behaviour and others showing typically developing children, to clarify what could be expected of a young child at a certain age. In general, the results of this controlled study support the fact that the availability of an early identification tool, coupled with training for primary care workers in the early signs of ASD and their ongoing involvement in a screening programme can lead to earlier detection, referral and diagnosis of ASD. Lack of training could lead to disagreement over ‘cookbook’ guidelines, unfamiliarity with screening instruments and procedures, as well as inconsistent knowledge of ASD and fear of positive results among primary care providers [88].

Parental non-compliance rate

Parental non-compliance is an essential problem in many screening studies. It is, therefore, imperative to examine the differences between parents who are compliant and non-compliant with the screening instrument and to provide explanations for non-compliance. Firstly, parents are known to be more inclined to participate in cases where the atypical development of their child is more apparent. Screening scores have been shown to be higher in the children of compliant parents than in those of parents who declined further assessment [43]. Secondly, children of compliant parents were somewhat older at the time when their parents completed the questionnaire [86]. This may be related to the above factor. Parents may not comply because they do not have any concerns about the development of their child at very young ages, or alternatively, because the symptoms may not yet be apparent at this stage [43]. A possible solution could be to ask parents again the following year when their child is slightly older, something that may serve to increase the response rate. Dereu et al. [24] suggest that a more personal approach might improve parental compliance. This might explain why the response rate was lower for returning parent questionnaires than for further developmental assessment [24]. Another factor to facilitate compliance might be to limit the number of assessments requiring parents to come in person to the university or health centre with their child. In the study by Dietz et al. [43], the effort of undergoing a minimum of two, but preferably, five examinations at the department was an important obstacle to participation. Dereu et al. [24] also report that parents did not wish to subject their child to the burden of assessments, and for some parents it was just not feasible to come to the university. Socio-economic and ethno-cultural factors may also have an effect on compliance, i.e. Reznick et al. [23] report that Afro-American families and less-educated parents more often refuse to participate. One reason for this might be the fear of the stigma attached by some cultural groups for receiving a diagnosis [100].

Setting characteristics: organisation of services

A screening procedure cannot be implemented without taking the setting characteristics into account. The presence of a preventive health system, such as the well-baby clinics in The Netherlands and the well-baby check-up programme in Spain, offers the opportunity to screen at a population level as opposed to screening high-risk children alone [43, 44]. One advantage of the presence of such a system is also the high attendance rate, often related to compulsory vaccinations. Even where such a system is available, it is still relevant to examine whether the system is available to all residents and whether it covers families from all socio-economic and ethno-cultural groups. Canal-Bedia et al. [44] also note the need for coordination between the health system and early intervention units in Spain. Needless to say, when implementing a screening procedure, post-screening intervention in the form of diagnostic assessment and intervention programmes should also be made available. Coordination with such services is also crucial for identifying possible false-negative cases [64]. Another factor to be considered is that there might be many differences in physician training and education in the respective countries. This is something that should be assessed when implementing a screening procedure which relies on physicians as informants. In addition, when choosing the CESDD as a screening procedure, it is important to bear in mind that this instrument might not be as effective in countries where only few children attend child care facilities, either because of the expense involved or because only a minority of women work. Child care in such countries might also be provided by the extended family instead of professional child care workers. In these cases it might be better to choose another procedure, since the CESDD’s advantages (i.e. the ability of child care workers to compare the child’s development to that of peers) are not applicable.

Other methodological concerns about ASD screening studies

A major issue in studies that evaluate the validity of ASD screening procedures is that not all children were followed up. In particular, information on screen-negative cases is missing in many screening studies in Table 2. Some studies have attempted to ‘solve’ this problem by calculating the sensitivity and specificity based on general prevalence rates, e.g. Groen et al. [78] calculated validity indices for several screening instruments, using ASD prevalence numbers reported by Baird et al. [62]. As mentioned earlier, however, the prevalence rates of the populations studied may differ, particularly as prevalence estimates are age dependent, since some children might not clearly manifest the full range of ASD symptoms until social demands outstrip capacity, as recognised by the new DSM-5 diagnostic criteria [101]. Oosterling et al.’s study [63] reported sensitivity and specificity based on the percentage of children who had already been the focus of some concern about ASD, a very specific group: true validity indices cannot be ascertained in this case. Future studies should devote more effort to the follow-up of screen-negative cases to calculate the true validity indices in that specific sample, though it should be noted that following up such cases could be expensive since a majority may prove to be genuinely screen negative [44]. On the other hand, it is plausible that some screen-negative cases will receive a diagnosis. Higher functioning children, children with less severe autism, and children who exhibit regression have a high probability of being missed in screening procedures [96]. Extending the inclusion criteria by, say, also including children who fail language items may improve estimates of validity indices by detecting false-negative cases (Dereu et al. [24]). It is likewise important to continue monitoring screen-positive cases, to establish the validity of the screening procedure in terms of a clinical diagnosis over a longer period of time. For screening studies it is critical that the follow-up of children be envisaged in advance. This idea has also been supported in a recent study examining over twenty different ASD screening programmes in the USA. One of main conclusions is the importance of methodological rigour and the quality of measures in the screening studies [51]. In the CHAT study, only half the children in the medium-risk group were not further evaluated due to lack of resources [62].

In addition, future studies should be designed in such a way that makes it possible to examine the influence of sample-specific factors on screening results. Thus, a sample should include different age, socio-economic and ethno-cultural groups. Similarly, the study population should preferably include children across the whole range of intellectual functioning. Although this was done in the ESAT studies (Dietz et al. [43]), the original CHAT study excluded children with a clear developmental delay (Baird et al. [62]). Some studies did examine the influence of sample-specific factors on sensitivity and specificity, by examining the screening results for specific age, IQ and diagnostic group [62, 68]. In general, a sample size should also be large enough to ensure that the validity indices of a screening method can be reliably calculated.

Conclusions and implications for future research

The aim of this review was to provide an overview of the screening procedures that have been evaluated in research studies across Europe, and the issues and methodological concerns associated with these. Currently, only the screening procedure with M-CHAT in Spain is still being used in routine practice. The other screening instruments that have been evaluated in research studies, such as the ESAT and the CESDD, are available for use by professionals but are not part of routine practice.

We trust that this analysis will, not only inform the drafting of recommendations for early identification of ASD, but will also prove especially important to European countries with no experience in ASD screening when it comes to making the correct choices about how to implement a screening programme in a specific setting.

Although there is consensus on the importance of early detection from both a research and clinical point of view, choosing a screening procedure that fits a certain context may be still difficult. This choice has to be based on arguments beyond validity indices. As this review has shown, findings regarding screening should be interpreted with caution. It is critical that clinicians understand how to interpret data from published studies [102]. It should be noted that screening outcomes are influenced by several factors. Therefore, a more expansive and balanced way of evaluating screening methods, which takes into account all the factors that may influence the results of the screening, is recommended. In addition, methodological issues should also be considered. The fact that in many studies screened-negative cases are not followed up, may have distorted screening outcomes. It is important to identify missed cases. This may be done by longitudinal population studies which screen children from an early age until an age at which ASD is likely to be detected or is, at least, likely to be detected with a second measurement at a later age [75]. However, due to parental non-compliance and limited resources, this is often difficult to achieve [62, 75]. Screening information should be carefully communicated to parents [102]. The need of motivational strategies to ensure that families will participate longitudinally and will follow-up treatment recommendations has also been highlighted in recently published manuscripts. They support the usage of rigorous methodology and evaluation of further variables when screening, such as rates of referral and uptake of services which have been rarely documented in screening studies [51, 103].

In USA, M-CHAT-R/F has demonstrated to be an effective tool for screening low-risk toddlers, reducing the age of diagnosis by 2 years [95]. New possibilities stimulated by these findings could be assessed towards widespread ASD screening in Europe. Recent recommendations from American Academy of Child and Adolescent Psychiatry (AACAP) maintain the support to ASD screening to young children and in some instances also relevant to older children [104]. There are also now new doors opened with concrete suggestions about how to conduct cluster randomised trials of ASD early screening [105].

Our review has attempted to analyse the current situation of early detection of ASD in Europe. Although the issues surrounding screening are relevant for any screening procedure to be implemented in Europe and beyond, greater in-depth knowledge of inter-country differences is still required. The diversity in government policy, health care, educational, and social-care settings and cultures across Europe means that screening procedures cannot be fully standardised. Joining efforts towards screening populations in lower income countries that usually access later to the intervention services should be prioritised. For instance, a preventive care system with a high attendance, such as the well-baby clinic, may not be available in every European country, making it more difficult to implement routine developmental surveillance. Thus, implementation of routine screening for ASD and/or other developmental disorders may require a reorganisation of the health care system in many countries. Screening is only effective for clinical purposes when diagnostic centres and interventions are also available.

A detailed characterisation of the samples of participants in the different screening studies, taking into account important variables such as ethnicity and socio-economic status, is needed if further conclusions are to be drawn. Additionally, a pooled data analysis of the items shared by the different screening instruments used in the European context aims to yield interesting results (Maganto, in prep).

At the moment, as part of this ESSEA-COST Action, one of the four working groups (WG3: testing how well screening instruments work in prospectively identifying cases [47]) is carrying out ongoing survey whose main goal is to compare the current status of early developmental surveillance across the 28 Member States of the European Union. Thus far, over 17 countries have responded, including at least two different informants per country. The information collected will, not only show how ASD detection and diagnosis is approached in each country, but will also provide objective data for calculating screening programme performance indicators in those countries where a system for early detection of autism exists or has existed as compared to those where no such system is or has ever been in place.

To date, a wealth of ASD screening procedures is available in Europe. While knowledge is shared through international publications and conferences, collaborations, such as the ESSEA COST Action Network, contribute to sharing knowledge among researchers and clinicians in a more direct way. Future challenges for this network lie in raising awareness about early signs of ASD among parents, child care professionals and physicians across Europe, evaluating and adapting the use of current screening procedures for different countries, providing an accessible platform for sharing knowledge and resources among European researchers and clinicians, and, most importantly, improving developmental outcomes for children with ASD and their families. Notwithstanding encouraging experiences, there is still much to be done.