Advertisement

Human Genetics

, Volume 137, Issue 4, pp 343–355 | Cite as

High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation

  • Todd Lencz
  • Jin Yu
  • Cameron Palmer
  • Shai Carmi
  • Danny Ben-Avraham
  • Nir Barzilai
  • Susan Bressman
  • Ariel Darvasi
  • Judy H. Cho
  • Lorraine N. Clark
  • Zeynep H. Gümüş
  • Vijai Joseph
  • Robert Klein
  • Steven Lipkin
  • Kenneth Offit
  • Harry Ostrer
  • Laurie J. Ozelius
  • Inga Peter
  • Gil Atzmon
  • Itsik Pe’er
Original Investigation

Abstract

While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. Here, we sequenced at full-depth (≥ 30×), across two platforms (Illumina X Ten and Complete Genomics, Inc.), a moderately large (n = 738) cohort of samples drawn from the Ashkenazi Jewish population. We developed a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Quality control (QC) thresholds for the Illumina X Ten platform were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. QC procedures also identified numerous regions that are poorly mapped using current reference or alternate assemblies. After stringent QC, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels, especially in the range of rare variants that may be most critical to further progress in mapping of complex phenotypes. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.

Notes

Acknowledgements

The authors are extremely grateful to Soren Germer, Ph.D. and his team at the New York Genome Center for performing the Illumina sequencing. We acknowledge financial support from the Human Frontier Science Program (SC); NIH research Grants AG042188 (GA), DK62429, DK062422, DK092235 (JHC), NS050487, NS060113 (LNC), AG021654, AG027734 (NB), MH089964, MH095458, MH084098 (TL), and CA121852 (computational infrastructure, IPe’er); NSF research grants 08929882 and 0845677 (IPe’er); Rachel and Lewis Rudin Foundation (HE); Northwell Health Foundation (TL); Brain & Behavior Foundation (TL); US-Israel Binational Science Foundation (TL, AD); LUNGevity Foundation (ZHG); New York Crohn’s Disease Foundation (IPeter); Edwin & Caroline Levy and Joseph & Carol Reich (SB); the Parkinson’s Disease Foundation (LNC); the Sharon Levine Corzine Cancer Research Fund (KO); and the Andrew Sabin Family Research Fund (KO).

Author contributions

TL and IP led the analysis, and led the writing of the manuscript. JY, CP, and SC conducted the primary analyses. TL led the funding of the study. TL, AD, GA, DB, NB, and LNC provided samples and conducted lab work. TL, IP, NB, SB, AD, JHC, LNC, ZHG, VJ, RK, SL, KO, HO, LJO, IP, and GA initiated and designed the study, and provided funding.

Compliance with ethical standards

Conflict of interest

The authors declare no competing financial interests.

Accession codes

Whole genome sequence data have been deposited at the European Genome-phenome Archive (EGA, http://www.ebi.ac.uk/ega/), which is hosted by the EBI, under accession code EGAS00001000664. Genotype data for target samples is available at The database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap), under Accession number phs000448.v1.p1.

Supplementary material

439_2018_1886_MOESM1_ESM.doc (738 kb)
Supplementary material 1 (DOC 738 KB)
439_2018_1886_MOESM2_ESM.xlsx (28 kb)
Supplementary material 2 (XLSX 27 KB)

References

  1. Ankala A, Tamhankar PM, Valencia CA, Rayam KK, Kumar MM, Hegde MR (2015) Clinical applications and implications of common and founder mutations in Indian subpopulations. Hum Mutat 36:1–10CrossRefPubMedGoogle Scholar
  2. Atzmon G, Hao L, Pe’er I, Velez C, Pearlman A, Palamara PF, Morrow B, Friedman E, Oddoux C, Burns E, Ostrer H (2010) Abraham’s children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern Ancestry. Am J Hum Genet 86(6):850–859CrossRefPubMedPubMedCentralGoogle Scholar
  3. Campbell IM, Gambin T, Jhangiani S, Grove ML, Veeraraghavan N, Muzny DM, Shaw CA, Gibbs RA, Boerwinkle E, Yu F, Lupski JR (2016) Multiallelic positions in the human genome: Challenges for genetic analyses. Hum Mutat 37:231–234CrossRefPubMedGoogle Scholar
  4. Carmi S, Hui KY, Kochav E, Liu X, Xue J, Grady F, Guha S, Upadhyay K, Ben-Avraham D, Mukherjee S et al (2014) Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins. Nat Commun 5:4835CrossRefPubMedPubMedCentralGoogle Scholar
  5. Church DM, Schneider VA, Steinberg KM, Schatz MC, Quinlan AR, Chin CS, Kitts PA, Aken B, Marth GT, Hoffman MM, Herrero J, Mendoza ML, Durbin R, Flicek P (2015) Extending reference assembly models. Genome Biol 16:13CrossRefPubMedPubMedCentralGoogle Scholar
  6. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158CrossRefPubMedPubMedCentralGoogle Scholar
  7. Deelen P, Menelaou A, van Leeuwen EM, Kanterakis A, van Dijk F, Medina-Gomez C, Francioli LC, Hottenga JJ, Karssen LC, Estrada K, Kreiner-Møller E, Rivadeneira F et al (2014) Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands’. Eur J Hum Genet 22:1321–1326CrossRefPubMedPubMedCentralGoogle Scholar
  8. Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP et al (2014) Clinical interpretation and implications of whole-genome sequencing. JAMA 311:1035–1045CrossRefPubMedPubMedCentralGoogle Scholar
  9. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G et al (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327:78–81CrossRefPubMedGoogle Scholar
  10. Druet T, Macleod IM, Hayes BJ (2014) Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions. Heredity 112(1):39–47CrossRefPubMedGoogle Scholar
  11. Genome of the Netherlands Consortium (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46:818–825CrossRefGoogle Scholar
  12. Genovese G, Fromer M, Stahl EA, Ruderfer DM, Chambert K, Landén M, Moran JL, Purcell SM, Sklar P, Sullivan PF, Hultman CM, McCarroll SA (2016 Nov) Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nat Neurosci 19(11):1433–1441CrossRefPubMedPubMedCentralGoogle Scholar
  13. Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, McGuire AL, Nussbaum RL, O’Daniel JM, Ormond KE, Rehm HL, Watson MS et al (2013) ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med 15:565–574CrossRefPubMedPubMedCentralGoogle Scholar
  14. Guha S, Rosenfeld JA, Malhotra AK, Lee AT, Gregersen PK, Kane JM, Pe’er I, Darvasi A, Lencz T (2012) Implications for health and disease in the genetic signature of the Ashkenazi Jewish population. Genome Biol 13(1):R2CrossRefPubMedPubMedCentralGoogle Scholar
  15. Heinzen EL, Neale BM, Traynelis SF, Allen AS, Goldstein DB (2015) The genetics of neuropsychiatric diseases: looking in and beyond the exome. Annu Rev Neurosci 38:47–68CrossRefPubMedGoogle Scholar
  16. Highnam G, Wang JJ, Kusler D, Zook J, Vijayan V, Leibovich N, Mittelman D (2015) An analytical framework for optimizing variant discovery from personal genomes. Nat Commun 6:6275CrossRefPubMedPubMedCentralGoogle Scholar
  17. Hoffmann TJ, Witte JS (2015) Strategies for imputing and analyzing rare variants in association studies. Trends Genet 31:556–563CrossRefPubMedPubMedCentralGoogle Scholar
  18. Hou L, Kember RL, Roach JC, O’Connell JR, Craig DW, Bucan M, Scott WK, Pericak-Vance M, Haines JL, Crawford MH, Shuldiner AR, McMahon FJ (2017) A population-specific reference panel empowers genetic studies of Anabaptist populations. Sci Rep 7:6079CrossRefPubMedPubMedCentralGoogle Scholar
  19. Iglesias AI, van der Lee SJ, Bonnemaijer PWM, Höhn R, Nag A, Gharahkhani P, Khawaja AP, Broer L, International Glaucoma Genetics Consortium (IGGC), Foster PJ, Hammond CJ, Hysi PG et al (2017) Haplotype reference consortium panel: Practical implications of imputations with large reference panels. Hum Mutat 38:1025–1032CrossRefPubMedGoogle Scholar
  20. Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL et al (2012) Exome sequencing and the genetic basis of complex traits. Nat Genet 44:623–630CrossRefPubMedPubMedCentralGoogle Scholar
  21. Laehnemann D, Borkhardt A, McHardy AC (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 17:154–179CrossRefPubMedGoogle Scholar
  22. Lam HY, Clark MJ, Chen R, Chen R, Natsoulis G, O’Huallachain M, Dewey FE, Habegger L, Ashley EA, Gerstein MB, Butte AJ et al (2011) Performance comparison of whole-genome sequencing platforms. Nat Biotechnol 30:78–82CrossRefPubMedPubMedCentralGoogle Scholar
  23. Larmer SG, Sargolzaei M, Brito LF, Ventura RV, Schenkel FS (2017) Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy. BMC Genet 18(1):120CrossRefPubMedPubMedCentralGoogle Scholar
  24. Lawler M, Siu LL, Rehm HL, Chanock SJ, Alterovitz G, Burn J, Calvo F, Lacombe D, Teh BT, North KN, Sawyers CL; Clinical Working Group of the Global Alliance for Genomics and Health (GA4GH) (2015) All the world’s a stage: facilitating discovery science and improved cancer care through the global alliance for genomics and health. Cancer Discov 5(11):1133–1136CrossRefPubMedGoogle Scholar
  25. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291CrossRefGoogle Scholar
  26. Lencz T, Guha S, Liu C, Rosenfeld J, Mukherjee S, DeRosse P, John M, Cheng L, Zhang C, Badner JA et al (2013) Genome-wide association study implicates NDST3 in schizophrenia and bipolar disorder. Nat Commun 4:2739CrossRefPubMedPubMedCentralGoogle Scholar
  27. Li H (2014) Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30:2843–2851CrossRefPubMedPubMedCentralGoogle Scholar
  28. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760CrossRefPubMedPubMedCentralGoogle Scholar
  29. Lipson M, Loh PR, Sankararaman S, Patterson N, Berger B, Reich D (2015) Calibrating the human mutation rate via ancestral recombination density in diploid genomes. PLoS Genet 11:e1005550CrossRefPubMedPubMedCentralGoogle Scholar
  30. Lohmueller KE (2014) The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet 10:e1004379CrossRefPubMedPubMedCentralGoogle Scholar
  31. MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, Adams DR, Altman RB, Antonarakis SE, Ashley EA et al (2014) Guidelines for investigating causality of sequence variants in human disease. Nature 508:469–476CrossRefPubMedPubMedCentralGoogle Scholar
  32. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283CrossRefPubMedPubMedCentralGoogle Scholar
  33. Miga KH, Eisenhart C, Kent WJ (2015) Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res 43:e133PubMedPubMedCentralGoogle Scholar
  34. Mitt M, Kals M, Pärn K, Gabriel SB, Lander ES, Palotie A, Ripatti S, Morris AP, Metspalu A, Esko T, Mägi R, Palta P (2017) Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 25:869–876CrossRefPubMedPubMedCentralGoogle Scholar
  35. Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, Yamaguchi-Kabata Y, Yokozawa J, Danjoh I, Saito S et al (2015) Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun 6:8018CrossRefPubMedPubMedCentralGoogle Scholar
  36. Nagy PL, Mansukhani M (2015) The role of clinical genomic testing in diagnosis and discovery of pathogenic mutations. Expert Rev Mol Diagn 15:1101–1105CrossRefPubMedGoogle Scholar
  37. Palamara PF, Lencz T, Darvasi A, Pe’er I (2012) Length distributions of identity by descent reveal fine-scale demographic history. Am J Hum Genet 91:809–822CrossRefPubMedPubMedCentralGoogle Scholar
  38. Palamara PF, Francioli LC, Wilton PR, Genovese G, Gusev A, Finucane HK, Sankararaman S; Genome of the Netherlands Consortium, Sunyaev SR, de Bakker PI, Wakeley J, Pe’er I, Price AL (2015) Leveraging distant relatedness to quantify human mutation and gene-conversion rates. Am J Hum Genet 97:775–789CrossRefPubMedPubMedCentralGoogle Scholar
  39. Pistis G, Porcu E, Vrieze SI, Sidore C, Steri M, Danjou F, Busonero F, Mulas A, Zoledziewska M, Maschio A et al (2015) Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 23:975–983CrossRefPubMedGoogle Scholar
  40. Popitsch N, WGS500 Consortium, Schuh A, Taylor JC (2017) ReliableGenome: annotation of genomic regions with high/low variant calling concordance. Bioinformatics 33:155–160CrossRefPubMedGoogle Scholar
  41. Rieber N, Zapatka M, Lasitschka B, Jones D, Northcott P, Hutter B, Jäger N, Kool M, Taylor M, Lichter P et al (2013) Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS One 8:e66621CrossRefPubMedPubMedCentralGoogle Scholar
  42. Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M et al (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328:636–639CrossRefPubMedPubMedCentralGoogle Scholar
  43. Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB (2013) Characterizing and measuring bias in sequence data. Genome Biol 14:R51CrossRefPubMedPubMedCentralGoogle Scholar
  44. Surakka I, Horikoshi M, Mägi R, Sarin AP, Mahajan A, Lagou V, Marullo L, Ferreira T, Miraglio B, Timonen S et al (2015) The impact of low-frequency and rare variants on lipid levels. Nat Genet 47:589–597CrossRefPubMedPubMedCentralGoogle Scholar
  45. UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JR, Xu C, Futema M, et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90CrossRefGoogle Scholar
  46. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV et al (2013) From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinform 43:11.10.1–33Google Scholar
  47. Ventura RV, Lu D, Schenkel FS, Wang Z, Li C, Miller SP (2014) Impact of reference population on accuracy of imputation from 6 K to 50 K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. J Anim Sci 92(4):1433–1444CrossRefPubMedGoogle Scholar
  48. Wall JD, Tang LF, Zerbe B, Kvale MN, Kwok PY, Schaefer C, Risch N (2014) Estimating genotype error rates from high-coverage next-generation sequence data. Genome Res 24:1734–1739CrossRefPubMedPubMedCentralGoogle Scholar
  49. Walsh R, Thomson KL, Ware JS, Funke BH, Woodley J, McGuire KJ, Mazzarotto F, Blair E, Seller A, Taylor JC et al (2017) Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples. Genet Med 19:192–203CrossRefPubMedGoogle Scholar
  50. Whiffin N, Minikel E, Walsh R, O’Donnell-Luria AH, Karczewski K, Ing AY, Barton PJR, Funke B, Cook SA, MacArthur D, Ware JS (2017) Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med 19(10):1151–1158CrossRefPubMedPubMedCentralGoogle Scholar
  51. Wong LP, Ong RT, Poh WT, Liu X, Chen P, Li R, Lam KK, Pillai NE, Sim KS, Xu H et al (2013) Deep whole-genome sequencing of 100 southeast Asian Malays. Am J Hum Genet 92:52–66CrossRefPubMedPubMedCentralGoogle Scholar
  52. Zhang P, Zhan X, Rosenberg NA, Zöllner S (2013) Genotype imputation reference panel selection using maximal phylogenetic diversity. Genetics 195:319–330CrossRefPubMedPubMedCentralGoogle Scholar
  53. Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M (2014) Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32:246–251CrossRefPubMedGoogle Scholar
  54. Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, Weng Z, Liu Y, Mason CE, Alexander N et al (2016) Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 3:160025CrossRefPubMedPubMedCentralGoogle Scholar
  55. Zou J, Valiant G, Valiant P, Karczewski K, Chan SO, Samocha K, Lek M, Sunyaev S, Daly M, MacArthur DG (2016) Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nat Commun 7:13293CrossRefPubMedPubMedCentralGoogle Scholar
  56. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68–74CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Todd Lencz
    • 1
    • 2
    • 3
  • Jin Yu
    • 2
    • 3
  • Cameron Palmer
    • 4
  • Shai Carmi
    • 5
  • Danny Ben-Avraham
    • 6
    • 7
  • Nir Barzilai
    • 6
    • 7
  • Susan Bressman
    • 8
  • Ariel Darvasi
    • 9
  • Judy H. Cho
    • 10
    • 11
  • Lorraine N. Clark
    • 12
    • 13
  • Zeynep H. Gümüş
    • 11
    • 14
  • Vijai Joseph
    • 15
  • Robert Klein
    • 11
    • 14
  • Steven Lipkin
    • 16
  • Kenneth Offit
    • 15
    • 17
  • Harry Ostrer
    • 6
    • 18
  • Laurie J. Ozelius
    • 19
  • Inga Peter
    • 11
    • 14
  • Gil Atzmon
    • 6
    • 7
    • 20
  • Itsik Pe’er
    • 4
    • 21
  1. 1.Departments of Psychiatry and Molecular MedicineHofstra Northwell School of MedicineHempsteadUSA
  2. 2.Division of Research, Department of PsychiatryThe Zucker Hillside Hospital Division of Northwell HealthGlen OaksUSA
  3. 3.Center for Psychiatric Neuroscience, The Feinstein Institute for Medical ResearchNorthwell HealthManhassetUSA
  4. 4.Department of Computer ScienceColumbia UniversityNew YorkUSA
  5. 5.Faculty of Medicine, Braun School of Public HealthHebrew University of JerusalemJerusalemIsrael
  6. 6.Department of GeneticsAlbert Einstein College of MedicineBronxUSA
  7. 7.Department of MedicineAlbert Einstein College of MedicineBronxUSA
  8. 8.Department of NeurologyBeth Israel Medical CenterNew YorkUSA
  9. 9.Department of Genetics, The Institute of Life SciencesThe Hebrew University of JerusalemJerusalemIsrael
  10. 10.Institute for Personalized MedicineIcahn School of Medicine at Mount SinaiNew YorkUSA
  11. 11.Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkUSA
  12. 12.Department of Pathology and Cell BiologyColumbia University Medical CenterNew YorkUSA
  13. 13.Taub Institute for Research of Alzheimer’s Disease and the Aging BrainColumbia University Medical CenterNew YorkUSA
  14. 14.Icahn Institute for Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkUSA
  15. 15.Clinical Genetics Service, Department of MedicineMemorial Sloan Kettering Cancer CenterNew YorkUSA
  16. 16.Departments of Medicine, Genetic Medicine and SurgeryWeill Cornell Medical CollegeNew YorkUSA
  17. 17.Cancer Biology and Genetics ProgramMemorial Sloan Kettering Cancer CenterNew YorkUSA
  18. 18.Department of PathologyAlbert Einstein College of MedicineBronxUSA
  19. 19.Department of NeurologyMassachusetts General HospitalBostonUSA
  20. 20.Department of Human BiologyHaifa UniversityHaifaIsrael
  21. 21.Center for Computational Biology and BioinformaticsColumbia UniversityNew YorkUSA

Personalised recommendations