High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation

Abstract

While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. Here, we sequenced at full-depth (≥ 30×), across two platforms (Illumina X Ten and Complete Genomics, Inc.), a moderately large (n = 738) cohort of samples drawn from the Ashkenazi Jewish population. We developed a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Quality control (QC) thresholds for the Illumina X Ten platform were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. QC procedures also identified numerous regions that are poorly mapped using current reference or alternate assemblies. After stringent QC, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels, especially in the range of rare variants that may be most critical to further progress in mapping of complex phenotypes. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Ankala A, Tamhankar PM, Valencia CA, Rayam KK, Kumar MM, Hegde MR (2015) Clinical applications and implications of common and founder mutations in Indian subpopulations. Hum Mutat 36:1–10

    Article  Google Scholar 

  2. Atzmon G, Hao L, Pe’er I, Velez C, Pearlman A, Palamara PF, Morrow B, Friedman E, Oddoux C, Burns E, Ostrer H (2010) Abraham’s children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern Ancestry. Am J Hum Genet 86(6):850–859

    CAS  Article  Google Scholar 

  3. Campbell IM, Gambin T, Jhangiani S, Grove ML, Veeraraghavan N, Muzny DM, Shaw CA, Gibbs RA, Boerwinkle E, Yu F, Lupski JR (2016) Multiallelic positions in the human genome: Challenges for genetic analyses. Hum Mutat 37:231–234

    CAS  Article  Google Scholar 

  4. Carmi S, Hui KY, Kochav E, Liu X, Xue J, Grady F, Guha S, Upadhyay K, Ben-Avraham D, Mukherjee S et al (2014) Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins. Nat Commun 5:4835

    CAS  Article  Google Scholar 

  5. Church DM, Schneider VA, Steinberg KM, Schatz MC, Quinlan AR, Chin CS, Kitts PA, Aken B, Marth GT, Hoffman MM, Herrero J, Mendoza ML, Durbin R, Flicek P (2015) Extending reference assembly models. Genome Biol 16:13

    Article  Google Scholar 

  6. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158

    CAS  Article  Google Scholar 

  7. Deelen P, Menelaou A, van Leeuwen EM, Kanterakis A, van Dijk F, Medina-Gomez C, Francioli LC, Hottenga JJ, Karssen LC, Estrada K, Kreiner-Møller E, Rivadeneira F et al (2014) Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands’. Eur J Hum Genet 22:1321–1326

    CAS  Article  Google Scholar 

  8. Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP et al (2014) Clinical interpretation and implications of whole-genome sequencing. JAMA 311:1035–1045

    CAS  Article  Google Scholar 

  9. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G et al (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327:78–81

    CAS  Article  Google Scholar 

  10. Druet T, Macleod IM, Hayes BJ (2014) Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions. Heredity 112(1):39–47

    CAS  Article  Google Scholar 

  11. Genome of the Netherlands Consortium (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46:818–825

    Article  Google Scholar 

  12. Genovese G, Fromer M, Stahl EA, Ruderfer DM, Chambert K, Landén M, Moran JL, Purcell SM, Sklar P, Sullivan PF, Hultman CM, McCarroll SA (2016 Nov) Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nat Neurosci 19(11):1433–1441

    CAS  Article  Google Scholar 

  13. Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, McGuire AL, Nussbaum RL, O’Daniel JM, Ormond KE, Rehm HL, Watson MS et al (2013) ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med 15:565–574

    CAS  Article  Google Scholar 

  14. Guha S, Rosenfeld JA, Malhotra AK, Lee AT, Gregersen PK, Kane JM, Pe’er I, Darvasi A, Lencz T (2012) Implications for health and disease in the genetic signature of the Ashkenazi Jewish population. Genome Biol 13(1):R2

    CAS  Article  Google Scholar 

  15. Heinzen EL, Neale BM, Traynelis SF, Allen AS, Goldstein DB (2015) The genetics of neuropsychiatric diseases: looking in and beyond the exome. Annu Rev Neurosci 38:47–68

    CAS  Article  Google Scholar 

  16. Highnam G, Wang JJ, Kusler D, Zook J, Vijayan V, Leibovich N, Mittelman D (2015) An analytical framework for optimizing variant discovery from personal genomes. Nat Commun 6:6275

    CAS  Article  Google Scholar 

  17. Hoffmann TJ, Witte JS (2015) Strategies for imputing and analyzing rare variants in association studies. Trends Genet 31:556–563

    CAS  Article  Google Scholar 

  18. Hou L, Kember RL, Roach JC, O’Connell JR, Craig DW, Bucan M, Scott WK, Pericak-Vance M, Haines JL, Crawford MH, Shuldiner AR, McMahon FJ (2017) A population-specific reference panel empowers genetic studies of Anabaptist populations. Sci Rep 7:6079

    Article  Google Scholar 

  19. Iglesias AI, van der Lee SJ, Bonnemaijer PWM, Höhn R, Nag A, Gharahkhani P, Khawaja AP, Broer L, International Glaucoma Genetics Consortium (IGGC), Foster PJ, Hammond CJ, Hysi PG et al (2017) Haplotype reference consortium panel: Practical implications of imputations with large reference panels. Hum Mutat 38:1025–1032

    CAS  Article  Google Scholar 

  20. Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL et al (2012) Exome sequencing and the genetic basis of complex traits. Nat Genet 44:623–630

    CAS  Article  Google Scholar 

  21. Laehnemann D, Borkhardt A, McHardy AC (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 17:154–179

    CAS  Article  Google Scholar 

  22. Lam HY, Clark MJ, Chen R, Chen R, Natsoulis G, O’Huallachain M, Dewey FE, Habegger L, Ashley EA, Gerstein MB, Butte AJ et al (2011) Performance comparison of whole-genome sequencing platforms. Nat Biotechnol 30:78–82

    Article  Google Scholar 

  23. Larmer SG, Sargolzaei M, Brito LF, Ventura RV, Schenkel FS (2017) Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy. BMC Genet 18(1):120

    Article  Google Scholar 

  24. Lawler M, Siu LL, Rehm HL, Chanock SJ, Alterovitz G, Burn J, Calvo F, Lacombe D, Teh BT, North KN, Sawyers CL; Clinical Working Group of the Global Alliance for Genomics and Health (GA4GH) (2015) All the world’s a stage: facilitating discovery science and improved cancer care through the global alliance for genomics and health. Cancer Discov 5(11):1133–1136

    Article  Google Scholar 

  25. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291

    Article  Google Scholar 

  26. Lencz T, Guha S, Liu C, Rosenfeld J, Mukherjee S, DeRosse P, John M, Cheng L, Zhang C, Badner JA et al (2013) Genome-wide association study implicates NDST3 in schizophrenia and bipolar disorder. Nat Commun 4:2739

    Article  Google Scholar 

  27. Li H (2014) Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30:2843–2851

    CAS  Article  Google Scholar 

  28. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    CAS  Article  Google Scholar 

  29. Lipson M, Loh PR, Sankararaman S, Patterson N, Berger B, Reich D (2015) Calibrating the human mutation rate via ancestral recombination density in diploid genomes. PLoS Genet 11:e1005550

    Article  Google Scholar 

  30. Lohmueller KE (2014) The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet 10:e1004379

    Article  Google Scholar 

  31. MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, Adams DR, Altman RB, Antonarakis SE, Ashley EA et al (2014) Guidelines for investigating causality of sequence variants in human disease. Nature 508:469–476

    CAS  Article  Google Scholar 

  32. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283

    CAS  Article  Google Scholar 

  33. Miga KH, Eisenhart C, Kent WJ (2015) Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res 43:e133

    PubMed  PubMed Central  Google Scholar 

  34. Mitt M, Kals M, Pärn K, Gabriel SB, Lander ES, Palotie A, Ripatti S, Morris AP, Metspalu A, Esko T, Mägi R, Palta P (2017) Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 25:869–876

    Article  Google Scholar 

  35. Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, Yamaguchi-Kabata Y, Yokozawa J, Danjoh I, Saito S et al (2015) Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun 6:8018

    CAS  Article  Google Scholar 

  36. Nagy PL, Mansukhani M (2015) The role of clinical genomic testing in diagnosis and discovery of pathogenic mutations. Expert Rev Mol Diagn 15:1101–1105

    CAS  Article  Google Scholar 

  37. Palamara PF, Lencz T, Darvasi A, Pe’er I (2012) Length distributions of identity by descent reveal fine-scale demographic history. Am J Hum Genet 91:809–822

    CAS  Article  Google Scholar 

  38. Palamara PF, Francioli LC, Wilton PR, Genovese G, Gusev A, Finucane HK, Sankararaman S; Genome of the Netherlands Consortium, Sunyaev SR, de Bakker PI, Wakeley J, Pe’er I, Price AL (2015) Leveraging distant relatedness to quantify human mutation and gene-conversion rates. Am J Hum Genet 97:775–789

    CAS  Article  Google Scholar 

  39. Pistis G, Porcu E, Vrieze SI, Sidore C, Steri M, Danjou F, Busonero F, Mulas A, Zoledziewska M, Maschio A et al (2015) Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 23:975–983

    Article  Google Scholar 

  40. Popitsch N, WGS500 Consortium, Schuh A, Taylor JC (2017) ReliableGenome: annotation of genomic regions with high/low variant calling concordance. Bioinformatics 33:155–160

    CAS  Article  Google Scholar 

  41. Rieber N, Zapatka M, Lasitschka B, Jones D, Northcott P, Hutter B, Jäger N, Kool M, Taylor M, Lichter P et al (2013) Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS One 8:e66621

    CAS  Article  Google Scholar 

  42. Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M et al (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328:636–639

    CAS  Article  Google Scholar 

  43. Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB (2013) Characterizing and measuring bias in sequence data. Genome Biol 14:R51

    Article  Google Scholar 

  44. Surakka I, Horikoshi M, Mägi R, Sarin AP, Mahajan A, Lagou V, Marullo L, Ferreira T, Miraglio B, Timonen S et al (2015) The impact of low-frequency and rare variants on lipid levels. Nat Genet 47:589–597

    CAS  Article  Google Scholar 

  45. UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JR, Xu C, Futema M, et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90

    Article  Google Scholar 

  46. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV et al (2013) From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinform 43:11.10.1–33

    Google Scholar 

  47. Ventura RV, Lu D, Schenkel FS, Wang Z, Li C, Miller SP (2014) Impact of reference population on accuracy of imputation from 6 K to 50 K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. J Anim Sci 92(4):1433–1444

    CAS  Article  Google Scholar 

  48. Wall JD, Tang LF, Zerbe B, Kvale MN, Kwok PY, Schaefer C, Risch N (2014) Estimating genotype error rates from high-coverage next-generation sequence data. Genome Res 24:1734–1739

    CAS  Article  Google Scholar 

  49. Walsh R, Thomson KL, Ware JS, Funke BH, Woodley J, McGuire KJ, Mazzarotto F, Blair E, Seller A, Taylor JC et al (2017) Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples. Genet Med 19:192–203

    Article  Google Scholar 

  50. Whiffin N, Minikel E, Walsh R, O’Donnell-Luria AH, Karczewski K, Ing AY, Barton PJR, Funke B, Cook SA, MacArthur D, Ware JS (2017) Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med 19(10):1151–1158

    Article  Google Scholar 

  51. Wong LP, Ong RT, Poh WT, Liu X, Chen P, Li R, Lam KK, Pillai NE, Sim KS, Xu H et al (2013) Deep whole-genome sequencing of 100 southeast Asian Malays. Am J Hum Genet 92:52–66

    CAS  Article  Google Scholar 

  52. Zhang P, Zhan X, Rosenberg NA, Zöllner S (2013) Genotype imputation reference panel selection using maximal phylogenetic diversity. Genetics 195:319–330

    Article  Google Scholar 

  53. Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M (2014) Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32:246–251

    CAS  Article  Google Scholar 

  54. Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, Weng Z, Liu Y, Mason CE, Alexander N et al (2016) Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 3:160025

    CAS  Article  Google Scholar 

  55. Zou J, Valiant G, Valiant P, Karczewski K, Chan SO, Samocha K, Lek M, Sunyaev S, Daly M, MacArthur DG (2016) Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nat Commun 7:13293

    CAS  Article  Google Scholar 

  56. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68–74

    Article  Google Scholar 

Download references

Acknowledgements

The authors are extremely grateful to Soren Germer, Ph.D. and his team at the New York Genome Center for performing the Illumina sequencing. We acknowledge financial support from the Human Frontier Science Program (SC); NIH research Grants AG042188 (GA), DK62429, DK062422, DK092235 (JHC), NS050487, NS060113 (LNC), AG021654, AG027734 (NB), MH089964, MH095458, MH084098 (TL), and CA121852 (computational infrastructure, IPe’er); NSF research grants 08929882 and 0845677 (IPe’er); Rachel and Lewis Rudin Foundation (HE); Northwell Health Foundation (TL); Brain & Behavior Foundation (TL); US-Israel Binational Science Foundation (TL, AD); LUNGevity Foundation (ZHG); New York Crohn’s Disease Foundation (IPeter); Edwin & Caroline Levy and Joseph & Carol Reich (SB); the Parkinson’s Disease Foundation (LNC); the Sharon Levine Corzine Cancer Research Fund (KO); and the Andrew Sabin Family Research Fund (KO).

Author information

Affiliations

Authors

Contributions

TL and IP led the analysis, and led the writing of the manuscript. JY, CP, and SC conducted the primary analyses. TL led the funding of the study. TL, AD, GA, DB, NB, and LNC provided samples and conducted lab work. TL, IP, NB, SB, AD, JHC, LNC, ZHG, VJ, RK, SL, KO, HO, LJO, IP, and GA initiated and designed the study, and provided funding.

Corresponding authors

Correspondence to Todd Lencz or Itsik Pe’er.

Ethics declarations

Conflict of interest

The authors declare no competing financial interests.

Accession codes

Whole genome sequence data have been deposited at the European Genome-phenome Archive (EGA, http://www.ebi.ac.uk/ega/), which is hosted by the EBI, under accession code EGAS00001000664. Genotype data for target samples is available at The database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap), under Accession number phs000448.v1.p1.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 738 KB)

Supplementary material 2 (XLSX 27 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lencz, T., Yu, J., Palmer, C. et al. High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 137, 343–355 (2018). https://doi.org/10.1007/s00439-018-1886-z

Download citation