Autistic disorder is the most severe form of a group of autism spectrum disorders (ASDs) characterized by impairments in social interaction, deficits in verbal and non-verbal communication, restricted interests, and repetitive behaviors [1]. With a prevalence of 1 in 110 children, ASDs are among the most common forms of severe developmental disability [2]. The average recurrence risk of autism in siblings of affected children is approximately 10% [3]. This rate is much higher than the prevalence rate for ASDs in the general population, but lower than would be expected for a highly penetrant mutation in a mendelian disorder [4].

The inheritance pattern of autism in most families is complex and not compatible with simple Mendelian inheritance [5, 6]. There is significant interest in the early identification of infants at higher risk for autism because studies have shown that early intervention leads to significantly improved long-term outcome for the whole family [7, 8]. Several common variants localized in biological and positional (that is, under known linkage peaks) candidate genes have been associated with autism and some have been replicated in independent studies [9]. Further support for these associations comes from genes for which, in addition to autism-associated common variants, rare mutations and/or copy number variations (CNVs) have been shown to contribute to the disease, and/or for which gene-disrupted mice exhibited autism-like traits. These genes include CNTNAP2 [1013], RELN [1419] and GABRB3 [2023].

When taken individually, the risk of autism associated with variants remains modest, but Carayol et al. [24] recently showed that the accumulation of multiple risk alleles markedly increases the risk of autism in siblings of children who have been diagnosed with autism. They proposed a genetic score (GS) that, compared with studying polymorphisms individually, improves the identification of subgroups of individuals at greater risk of autism [24]. In the case of autism, tools for genetic risk assessment are highly desirable to complement available behavioral assessments.

Another important characteristic of autism is the sex difference with a 4.5:1 male to female ratio [2]. Second, intellectual disability, a key clinical dimension associated with outcome, is more frequent in females than males [25]. Third, the risk of epilepsy is 18 times higher in females than males [26]. This sex difference may partly be explained by sex-specific risk alleles or genes with different expression or activity based on sex [27, 28].

In the present study we propose to improve the genetic risk score model developed by Carayol et al. [24] by adding additional SNPs filtered for their relative importance using internal validation process and by also developing separate sex-specific genetic risk scores for males and females using a first sample of families with children with autism (exploratory sample). Their ability to better identify siblings of children with autism who are at high risk of autism was then evaluated and replicated in an independent second sample of autism families (replication sample).


The study design involved two independent family samples. The first sample (the 'exploratory' sample) consisted of 480 families from the Autism Genetic Resource Exchange (AGRE; repository with at least 1 sibling diagnosed with a 'strict' definition of autism according to the Autism Diagnostic Interview Revisited (ADI-R) and no unaffected siblings. A total of 844 affected siblings including 664 males and 179 females met the diagnostic criteria for 'strict' autism. Minimizing phenotypic heterogeneity can lead to an improvement of the study power [29]. Shao et al. [30] demonstrated that the use of homogeneous phenotype increases the power of linkage studies in autism. Linkage signals have been observed in studies in which the samples were stratified according to specific phenotypes such as the sex [28, 31, 32], delayed onset of phrase speech [30, 33, 34], and severe obsessive-compulsive behaviors [35]. Two genome-wide association studies using overlapping samples of children with autism identified two different common variants in CNTNAP2, a gene localized in the 7q34-7q36 region linked to language disability in autism [36]; one SNP has been associated with autism through the use of the quantitative trait 'age at first word' [10] and the other using a qualitative strict autism diagnosis [11]. Similarly, a recent genome-wide association study (GWAS) [37] reported the largest association with autism in MACROD2 using the strict autism diagnosis. Therefore, as in Shao et al. [30], we studied individuals with a strict autism rather than the heterogeneous broad autism spectrum disorder phenotype. The second sample (the 'replication' sample) included 187 families consisting of the 2 parents, at least 1 child with autism and 1 unaffected sibling from a sample collection at the University of Pennsylvania. This replication sample led to 351 children with autism (291 males and 60 females) with the same strict definition of the disease and 90 unaffected children (39 males and 51 females). Ethnicity was self-reported by parents as Caucasian, Asian, Hispanic or Latino, Black or African American, Native Hawaiian or other Pacific Islander, or of mixed ethnicity. Caucasians represented the major ethnicity, with more than two-thirds of families in each sample.

Ten autism susceptibility genes were selected for this study. Four of them (PITX1, EN2, SLC25A12 and ATP2B2) have been previously demonstrated to have a predictive ability and were used in a genetic score-based model [24]. Genes shown to be statistically associated with autism in at least one study using AGRE collection, even at the nominal level, and for which additional data support their implication in autism, were also included. Six genes fulfilled the statistical association condition, four of which were replicated in one or more independent study: HOXA1 [38, 39], GRIK2 [4042], ITGB3 [4346] and CNTNAP2 [10, 11]; one gene, MARK1, was found to be significantly overexpressed in brain from individuals with autism compared to unaffected individuals [47] and the last gene, JARID2 was chosen since one SNP, rs7766973, displays the strongest association with autism (P = 6.8 × 10-7 [48]) among the three GWAS performed on AGRE family data [37, 42, 48]. Table 1 lists the genes selected for the study and the associated SNPs with their deleterious alleles and corresponding frequencies.

Table 1 Risk allele frequency (defined as the allele associated with autism)

All parents and children from the exploratory sample were genotyped for these ten markers. Only SNPs that were selected for further investigation were genotyped in the replication sample. Genotyping was performed using TaqMan allele discrimination assays (Applied Biosystems, Foster City, CA, USA). Genotyping was performed in 384-well plates with 5 ng genomic DNA, 0.075 μl of 20 × SNP TaqMan Assay mix, 1.5 μl of TaqMan Universal PCR Master Mix and 1.425 μl of dH2O in each well. PCR was performed at 95°C for 10 min, followed by 50 cycles at 92°C for 15 s and 60°C for 90 s (9700 Gene Amp PCR System; Applied Biosystems). Plates were then subjected to endpoint reading (7900 Real-Time PCR System; Applied Biosystems). The alleles were called automatically using the SDS software (Applied Biosystems), and a visual inspection of genotype clusters was performed. Genotyping quality was assessed by signal intensity plots and missing genotype frequencies; any sample with poor clustering and missing fractions ≥5% per SNP were retyped. Parental genotypes were used to investigate Hardy-Weinberg equilibrium (HWE) and to check for Mendelian inconsistencies. Families with remaining inconsistencies were excluded.

The development of the genetic score model and the definition of the increased risk GS thresholds (that define the high-risk groups) were based on the exploratory sample with all affected children whereas, for the replication study using the second sample, the index cases were excluded.

A model that is efficient only in the sample in which it was developed does not have validity. To be valid, the results need to be reproduced in a separate independent population. A genetic score model, such as the one proposed in this paper, is generally built on the simple sum of deleterious alleles observed at each of the chosen genes. Thus, the reproducibility of the genetic score is conditioned by the reproducibility of the deleterious allele for each SNPs included in the model. Markers that are more reproducible carry stronger and more stable information. The reproducibility of the SNPs was analyzed using the bootstrap resampling process and a reproducibility index (RI) was estimated similarly to Ma [49] as follows: (1) generation of a 'pseudosample' consisting of 480 families by randomly sampling the 480 families of the exploratory population with replacement; (2) estimation of the genetic relative risk associated with the deleterious allele of each SNP as defined in Table 1; (3) repetition 1,000 times of steps 1 and 2; (4) estimation for each SNP of the RIs indicating the proportion of 'pseudosamples' in which the deleterious allele maintains a risk greater than 1.00 in males, in females or in both males and females.

A high RI indicates that the effect of a deleterious allele of a given SNP is maintained across the bootstrap pseudosamples and that this SNP is a good candidate for the reproducibility of the genetic score. A stringent RI = 0.80 in children with autism was set to select best SNPs. Then, the RI in males and females with autism was checked separately to discard SNPs that lack of stability in a particular sex. Since all variants have been associated with autism using AGRE family data, this internal validation process prevents from an optimistic evaluation of their association, that is, an overestimation of the effect of risk alleles, and a potential deterioration of this effect in an independent sample. The sex genetic scores (GS) was then constructed as follows:

G S sex = W all R S all + W sex R S sex

where sex = (male, female); RSall and RSsex are the risk scores built as the sum of deleterious alleles from genes with a high RI in males only (RSmale), in females only (RSfemale) or in both sexes (RSall); and Wall, Wmale, and Wfemale are the integer values of the corresponding genetic relative risks (GRR) associated with the corresponding risk scores (RSall, RSmale and RSfemale, respectively). These weights were calculated following Lin et al. [50] who showed that a weighted genetic score provided more predictive value than an unweighted genetic score.

Because the exploratory sample did not include unaffected children, all genetic relative risks were estimated as described in Carayol et al. [24] using the case-pseudocontrol approach proposed by Cordell and Clayton [51] and implemented in the DGCgenetics R package ( Sensitivity and specificity values of the GSs were estimated in the exploratory and the replication samples as in Carayol et al. [24]. Areas under the receiver operating curves (AUCs) were estimated in the exploratory sample and tested against the AUC = 0.5 null hypothesis to validate the discriminative power of the GSs. However, AUCs do not provide an informative tool of the clinical utility of the genetic score (here, the high-risk classification of siblings of children with autism). Cutoff values were chosen to define a high-risk group in the exploratory sample and the odds ratios were estimated. These high-risk thresholds (one for male and one for female) were selected considering a false positive rate lower than 20% (that is, specificity higher than 80%). External validation of the clinical utility of the high-risk GS group was then conducted in the replication sample. Positive predictive values in siblings of children with autism were estimated from the sensitivity, specificity and the sibling recurrence risk estimates in males and females. Since no data were available in the literature, we estimated the sibling recurrence risk to 0.16 in males and 0.04 in females assuming an overall 0.10 sibling recurrence risk [3] and a 4:1 male to female sex ratio [2].


None of the SNPs exhibited a departure from HWE and allele frequencies were similar between samples (Table 1). Table 2 lists the RI of each SNP based on the bootstrap analysis using the exploratory sample. Eight markers reached the stringent 80% RI threshold. SNPs rs2292813 (SLC25A12) and rs2235076 (GRIK2) were excluded because of their low reproducibility (RI = 52% and 36%, respectively). Among the eight remaining SNPs, two displayed low RI in males but RI of 100% in females, rs12410279 (MARK1, RImale = 47%) and rs5918 (ITGB3, RImale = 65%). Inversely, three SNPs displayed a low RI in females and RI greater than 95% in males, rs227855 (ATP2B2, RIfemale = 59%), rs6872664 (PITX1, RIfemale = 30%) and rs10951154 (HOXA1, RIfemale = 20%).

Table 2 Reproducibility indexes (RIs) in children with autism, in males and in females

The three separate risk scores were then constructed based on the sum of deleterious alleles in their corresponding SNPs. These included rs7794745, rs1861972 and rs7766973 for RSall, rs12410279 and rs5918 for RSfemale, and rs2278556, rs6872664 and rs10951154 for RSmale. The GRRs associated to one point increase in the RS were estimated to be 1.23 for RSall (P = 2.3 × 10-5; 95% confidence interval (CI) 1.12 to 1.36), 1.25 for RSmale (P = 5.8 × 10-4; 95% CI 1.10 to 1.41) and 2.29 for RSfemale (P = 1.7 × 10-6; 95% CI 1.57 to 3.34). The overall P value of the three tested scores were 3.1 × 10-9 with corresponding weights of 1.00, 1.00 and 2.00 for RSall, RSmale and RSfemale, respectively. The two genetic scores (GSs) were then constructed. GSmale ranged between 3 and 12 with a GRR associated to 1 point increase in the score of 1.23 (P = 2.2 × 10-6; 95% CI 1.13 to 1.34) and GSfemale ranged between 4 and 14 with a GRR of 1.41 (P = 1.9 × 10-5; 95% CI 1.21 to 1.65) for a highly significant global test with P = 8.4 × 10-10. Table 3 displays the sensitivity and specificity values for the GS in males and females. To define the high-risk group, GS values were selected in males and females with the aim to minimize the number of false positive below 20% and to maximize the sensitivity as high as possible. A genetic score threshold of nine points for males was associated with a moderate 0.24 sensitivity (95% CI 0.19 to 0.28) and a 0.86 specificity (95% CI 0.82 to 0.90) that minimizes the number of false positive test to 0.14 and lead to a 0.23 positive predictive value (PPV). For females, a genetic score threshold of 12 was associated with a similar specificity of 0.86 (95% CI 0.80 to 0.92) but a higher sensitivity of 0.37 (95% CI 0.29 to 0.44) and a PPV of 0.09. These two GS values were chosen as thresholds to define the group of children with a high risk of autism. AUCs were estimated to be 0.59 and 0.66 in males and females, respectively. They are both significantly different from the 0.5 null hypothesis (P = 2 × 10-8 and 1.5 × 10-7) indicating a predictive ability of the GSs.

Table 3 Genetic score (GS) sensitivities and specificities with their 95% CIs by sex estimated in the exploratory sample

In the replication sample (Table 4), sensitivity and specificity associated with the high-risk group GS threshold (GSmale = 9) were slightly higher in males (but not significantly different as it can be seen from the overlapping 95% CIs) with a 0.26 (95% CI 0.18 to 0.35) sensitivity and 0.87 (95% CI 0.76 to 0.98) specificity. The PPV reached 0.28 for a 0.16 sibling recurrence risk. Differences were observed in females for the sensitivity with an estimated 0.28 (95% CI 0.12 to 0.44) instead of 0.37 and the specificity with a 0.76 specificity (95% CI 0.64 to 0.89) instead of 0.86 but the differences were not significant (overlapping confidence intervals). In females, variances for sensitivity and specificity values were larger in the replication sample than in the exploratory sample because of the smaller number of females in the replication sample. As a consequence, the PPV (estimated to 5%) was very small and close to the 4% sibling recurrence risk.

Table 4 Sensitivity and specificity estimates in the exploratory and replication samples with their corresponding 95% CIs for the high-risk group

Extending the analysis to a broader definition of autism and including or excluding the index cases as was performed with the replication study did not change the characteristics of the genetic score or the associated significance levels.


Our results demonstrate that the sex difference in autism may have an important influence on the genetic score characteristics, and therefore, on the risk assessment. Taking sex and reproducibility of the SNPs into account led to two GSs with different characteristics that allowed the identification of a subgroup of siblings of children with autism with a high risk of autism in males. The genetic score model with four genes [24] was also tested on this large sample of families and its association was clearly lower (P = 7 × 10-4 in males and females as a whole) compared to those of the sex-specific GSs (P = 2.2 × 10-6 and 1.9 × 10-5 for males and females, respectively). The risk for males with a high GS to develop autism was 28%, almost three times higher than the reported 10% sibling recurrence risk. In females, the 10% recurrence risk seems overestimated and we estimate this value to 4% considering a 4.5:1 male to female sex ratio.

The GS model has been developed through the use of affected children and the pseudocontrol approach [52, 53]. This was confirmed by analyzing unaffected siblings of children with autism. The pseudocontrols approach has been validated for the estimation of diagnostic accuracy using only affected children compared to full population-based data [54]. We cannot exclude an over-representation of deleterious alleles in unaffected siblings compared to pseudocontrols, which are genetically the opposite of affected children, nor the effect of population controls that may lower the risk ratio between affected and unaffected siblings and consequently affect the discriminative ability of the GS models. This does not seem to occur for males since the high-risk class replicates its predictive accuracy but would need further investigation for females.

Reproducibility of effects is of major interest to enter in a predictive model since it conditions the reproducibility of the predictive model outside the study sample, which is of primary importance to validate such a model. According to the replication of the performance of the risk assessment model in males in an independent sample and the ability to find support for female specific variants despite the relatively small number of samples, the proposed approach can be used for developing stable and reproducible models. SLC25A12 associated and replicated in different studies [5558] did not reach the reproducibility thresholds, whereas JARID2 that reached a suggestive significant threshold in a unique GWAS [48] seems of more interest. Some markers were reproducible (high RI) in a specific sex only but did not show any statistically significant interaction with sex nor were reported as being sex specific in the literature. The SNP rs7794745 located within CNTNAP2 has a high RI in both sexes whereas a previous association with autism has been reported preferentially in males [10, 11]. Due to the low number of females analyzed, these studies lack power to observe any association in females [11]. Another SNP, rs5918 located within ITGB3, has been shown to be associated with autism in both sexes but with different risk effect [46], which could explain the difference of reproducibility observed in males and females. The stability is not necessarily linked to the sex specificity of the SNP or to the strength of previous association results. This may be explained in part by a study of Jakobsdotir et al. [59] which showed that a highly significant association of genes with a disease does not guarantee an effective discrimination between cases and controls.

Several limits of the study may be identified. The moderate number of females with autism in the replication sample as a consequence of the significant sex ratio in autism led to a lack of power for the replication of the high-risk group characteristics. Sibling recurrence risk of males and females were not estimated or reported from real data but calculated assuming a sibling recurrence risk of 10% [3] and the widely observed 4.5:1 male to female sex ratio. Reported PPVs are intuitive estimates that quantify the increase in the risk for an individual (a sibling of a child with autism) who has a genetic score that falls in the high-risk class. Accurate PPVs could be estimated by using observed and reported data. The selection of the genes and the SNPs included in the genetic scores could be discussed. The methodology used to select the common variants and the internal validation approach performed in this study strongly support the implication of these SNPs in autism as well as their discriminative ability. The addition of other SNPs from the same genetic region would have led to a much more complicated model because of the linkage disequilibrium (LD) between these SNPs as well as the haplotypes resulting from the different combination of alleles. Finally, other approaches may be used to select genes to enter in a genetic score. Genes may be selected using statistically significant results from GWAS [60, 61] or a complementary approach as in convergent functional genomics (CFG) autism [62, 63], when none or few association results reach significance as it is frequently the case in complex disease and particularly in autism.

The recent paper of Lu and Cantor [64] together with the present results highlights the importance of the sex in genetic study of autism. They showed that using sex as a risk factor in GWAS of multiplex autism families increased the power of the study and identified one new gene implicated in calcium channel defect. Stone et al. [28] also suggested that sex is an important factor in the genetics of autism and could be used to decrease heterogeneity in genetic study.


The results of this study confirm previous results [24] that predictive models are of major interest in autism and may help to identify siblings of children with autism at high risk of disease. The choice of genes to enter in the model must be made with caution since association and replication of a particular SNP in different studies are not sufficient justification to enter a SNP in a genetic score and sex is an important factor that needs to be included in autism risk evaluation.