Skip to main content
Log in

Whole exome sequencing in extended families with autism spectrum disorder implicates four candidate genes

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Autism spectrum disorders (ASDs) are a group of neurodevelopmental disorders, characterized by impairment in communication and social interactions, and by repetitive behaviors. ASDs are highly heritable, and estimates of the number of risk loci range from hundreds to >1000. We considered 7 extended families (size 12–47 individuals), each with ≥3 individuals affected by ASD. All individuals were genotyped with dense SNP panels. A small subset of each family was typed with whole exome sequence (WES). We used a 3-step approach for variant identification. First, we used family-specific parametric linkage analysis of the SNP data to identify regions of interest. Second, we filtered variants in these regions based on frequency and function, obtaining exactly 200 candidates. Third, we compared two approaches to narrowing this list further. We used information from the SNP data to impute exome variant dosages into those without WES. We regressed affected status on variant allele dosage, using pedigree-based kinship matrices to account for relationships. The p value for the test of the null hypothesis that variant allele dosage is unrelated to phenotype was used to indicate strength of evidence supporting the variant. A cutoff of p = 0.05 gave 28 variants. As an alternative third filter, we required Mendelian inheritance in those with WES, resulting in 70 variants. The imputation- and association-based approach was effective. We identified four strong candidate genes for ASD (SEZ6L, HISPPD1, FEZF1, SAMD11), all of which have been previously implicated in other studies, or have a strong biological argument for their relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Abbreviations

ASD:

Autism spectrum disorder

WES:

Whole exome sequencing

ADOS:

Autism Diagnostic Observational Schedule

ADI-R:

Autism Diagnostic Interview – Revised

BPASS:

Broader Phenotype Autism Symptom Scale

BAP:

Broader autism phenotype

OE:

Illumina HumanOmniExpress

HCE:

Illumina Human Core Exome

1KGP-EUR:

1,000 genome project Europeans

IV:

Inheritance vectors

MCMC:

Markov chain Monte Carlo

References

Download references

Acknowledgments

Research reported in this publication was supported by funding from the National Institute of Mental Health, and the National Institute on Aging, under award numbers R01MH092367, R01MH094293, R01MH094400, and R00AG040184 from the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would like to thank those with ASD and their families, because without their participation this research would not be possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ellen M. Wijsman.

Ethics declarations

Ethical approval

“All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.”

Electronic supplementary material

Appendix

Appendix

Marker selection for linkage analysis

We used the marker subpanels program of the PBAP suite (Nato et al. 2013), with the following parameters: minimum intermarker distance 0.5 cM, marker completion threshold 80 %, minor allele frequency >20 %, maximum linkage disequilibrium (r 2) between markers 0.04. We used allele frequencies and linkage disequilibrium estimates from the 1KGP-EUR population, and maps based on the sex-averaged Rutgers map (Matise et al. 2007). Map locations were converted from those based on the Kosambi map function in the Rutgers map to those based on the Haldane map function. This was necessary because of the implicit assumptions in the multipoint analysis imposed by use of the Lander–Green algorithm (Lander and Green 1987).

Linkage analysis

Our approach to linkage analysis in these families was to use sampled inheritance vectors (IVs) as a basis for analysis. The set of IVs at a particular genomic position represents possible paths of descent of the chromosomes at that position through the pedigree. The sampled IVs are drawn from the posterior distribution of IVs, conditional on marker information, family structure and genetic map, at each marker location along each chromosome. For the smallest pedigree, we sampled IVs from the exact posterior distribution, and for the larger pedigrees we used Markov chain Monte Carlo (MCMC) sampling. Using the sampled IVs as a basis for analysis enabled us to perform chromosome wide multipoint pedigree-based linkage analysis, even in our larger families. The same samples of IVs were also used for genotype imputation from the sequence data.

Linkage analysis from the IVs followed three steps. First, we sampled IVs for each chromosome and family combination using the program gl_auto from the MORGAN (MORGAN: A package for Markov chain Monte Carlo in genetic analysis (version 3.1.1) http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml 2012) package, saving 1,000 IV samples for analysis. For MCMC sampling, 50,000 scans were performed with each run using sequential imputation for setup, and the LMM sampler with 50 % L-sampler. We used allele frequencies based on each dataset, except in the families typed with CE alone, where we used the 1KGP-EUR frequencies. These families are generally well genotyped, and therefore we do not expect the results to be sensitive to the source of allele frequencies (Huang et al. 2004). Second, we identified equivalence classes among the sampled IVs at each marker, using the program IBDgraph (IBDgraph 2.0: another C-library add-on for MORGAN 3 http://www.stat.washington.edu/thompson/Genepi/pangaea.shtml 2010). Identifying equivalence classes allows computations to be performed on one representative of each class, rather than on all 1,000 samples. Finally, we performed linkage analyses using FASTLINK (O’Connell and Weeks 1995) for one representative of each equivalence class, and calculated likelihoods by a weighted average over equivalence classes, where the weights are the sampled probabilities of the classes. Since these analyses were done, the MORGAN program gl_lods has been released (MORGAN: A package for Markov chain Monte Carlo in genetic analysis (version 3.2) http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml 2013), which carries out the analyses directly from the output of gl_auto. In future, it will not be necessary to use FASTLINK for this type of analysis.

Imputation of exome variant dosages

GIGI uses the IVs generated by gl_auto for the entire family based on the SNP genotypes, in addition to exome variant genotypes in available individuals, to calculate the expected dosage of the variant allele in each person in the family. The frequencies of the alternate alleles were taken from the 1KGP-EUR population, unless the allele was absent from that dataset, in which case it was set to 0.01. Sex-averaged map positions were converted to positions based on the Haldane map function, as described above. In cases where variants did not appear in the Rutgers map, map position was interpolated based on physical position.

Example of imputation and association results

Table 6 shows the results of imputation and subsequent association tests for family AU071, as an example. AU071 is interesting because we imputed variant dosages in multiple unaffected individuals, and there are examples of the imputed dosage being very clear, as well as examples where the dosage was more ambiguous. Variant allele dosages are italicized if they are based on imputation, and not if they are directly observed. The table lists the 18 variants that pass the filters based on linkage analysis, frequency and function, and also have a Mendelian segregation pattern. There is an additional variant where the observed segregation pattern was not obviously Mendelian (pattern = ’no’), due to missing genotypes because of low read depth in affected individuals. For this variant (chr 1 pos 877,523), no copies of the alternate allele are imputed in unaffected individuals, but the status of two affected individuals (A3 and A4) remains ambiguous (imputed dosage = 0.5). Sanger sequencing clarified that both A3 and A4 carry a single copy of the variant, making this variant a very good candidate gene for ASD in this family. Even without taking into account the Sanger results, this is the only variant where the p value is ≤0.05, so the imputation and association-based results represent a substantially reduced set of variants relative to those based on requiring a Mendelian segregation pattern in affected individuals with WES only. Online Resource 3 shows detailed results similar to Table 6 for each of the seven families studied.

Table 6 Imputation and association testing in AU071

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chapman, N.H., Nato, A.Q., Bernier, R. et al. Whole exome sequencing in extended families with autism spectrum disorder implicates four candidate genes. Hum Genet 134, 1055–1068 (2015). https://doi.org/10.1007/s00439-015-1585-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-015-1585-y

Keywords

Navigation