Abstract
Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy–Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation—functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95 % credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.
Similar content being viewed by others
References
Battaile KP, Battaile BC, Merkens LS, Maslen CL, Steiner RD (2001) Carrier frequency of the common mutation IVS8-1G>C in DHCR7 and estimate of the expected incidence of Smith–Lemli–Opitz syndrome. Mol Genet Metab 72(1):67–71
Beales PL, Elcioglu N, Woolf AS, Parker D, Flinter FA (1999) New criteria for improved diagnosis of Bardet–Biedl syndrome: results of a population survey. J Med Genet 36:437–446
Browning SR, Thompson EA (2012) Detecting rare variant associations by identity-by-descent mapping in case–control studies. Genetics 190:1521–1531
Chong JX, Ouwenga R, Anderson RL, Waggoner DJ, Ober C (2012) A population-based study of autosomal-recessive disease-causing mutations in a founder population. Am J Hum Genet 91:608–620
Cooper DN, Krawczak M, Polychronakis C, Tyler-Smith C, Kehrer-Sawatzki H (2013) Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum Genet 132:1077–1130
Evans M, Hastings N, Peacock B (1993) Statistical Distributions, 2nd edn. Wiley, USA
Ewens WJ (2004) Mathematical Population Genetics, 2nd edn. Springer-Verlag, New York
Fitterer B, Hall P, Antonishyn N, Desikan R, Gelb M, Lehotay D (2014) Incidence and carrier frequency of Sandhoff disease in Saskatchewan determined using novel substrate with detection by tandem mass spectrometry and molecular genetic analysis. Mol Genet Metabol 111(3):382–389
Fu W, O’Connor TD, Jun G, Kang HM et al (2013) Analysis of 6515 exomes reveals the recent origin of most human protein-coding variants. Nature 493(7431):216–220
Griffiths RC, Tavare S (1998) The age of a mutation in a general coalescent tree. Commun Stat Stoch Models 14(1–2):273–295
Hardy GH (1908) Mendelian proportions in a mixed population. Science 28(706):49–50
Harnandez-Hernandez V, Pravincumar P, Diaz-Font A et al (2013) Bardet–Biedl syndrome proteins control the cilia length through regulation of actin polymerization. Hum Mol Genet 22(19):3858–3868
Hartl DL, Clark AG (1989) Principles of population genetics, 2nd edn. Sinauer Associates, Sunderland
Hostetler JA (1974) Hutterite Society. Johns Hopkins University Press, Baltimore
Jaynes ET (1976) Confidence intervals vs bayesian intervals. In: Harper AL, Hooker CA (eds) Foundations of probability, statistical inference, and statistical theories of science
Kim GH, Yang JY, Park JY, Lee JJ, Kim JH, Yoo HW (2008) Estimation of Wilson’s disease incidence and carrier frequency in the Korean population by screening ATP7B major mutations in newborn filter papers using the SYBR green intercalator method on the amplification refractory mutation system. Genet Test 12(3):395–399
Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738
Kimura M, Ohta T (1973) The age of a neutral mutant persisting in a finite population. Genetics 75:199–212
King CR, Rathouz PJ, Nicolae DL (2010) An evolutionary framework for association testing in resequencing studies. PLoS Genet 6:e1001202
Li M-H, Stranden I, Tiirikka T, Sevon-Aimonen M-L, Kantanen J (2011) A comparison of approaches to estimate the inbreeding coefficient and pairwise relatedness using genomic and pedigree data in a sheep population. PLoS One 6(11):e26256
Lyahyai J, Sbiti A, Barkat A, Ratbi I, Sefiani A (2012) Spinal muscular atrophy carrier frequency and estimated prevalence of the disease in Moroccan newborns. Genet Testing Mol Biomark 16(3):215–218
MacArthur DG, Balasubramanian S, Frankish A, Huang N et al (2012) A systematic survey of loss-of-function variants in the human protein-coding genes. Science 335(6070):823–828
MacArthur DG, Manolio TA, Dimmock DP, Rahm HL et al (2014) Guidelines for investigating causality of sequence variants in human disease. Nature 508:469–476
Nowaczyk MJM, Waye JS, Douketis JD (2006) DHCR7 mutation carrier rates and prevalence of the RSH/Smith–Lemli–Opitz syndrome: where are the patients? Am J Med Genet Part A 140A:2057–2062
Ober C, Cox NJ, Abney M, DiRienzo A et al (1998) Genome-wide search for asthma susceptibility loci in a founder population. The Collaborative Study on the Genetics of Asthma. Hum Mol Genet 7(9):1393–1398
Ricard G, Molina J, Chrast J, Gu W et al (2010) Phenotypic consequences of copy number variation: insights from Smith–Magenis and Potocki–Lupski Syndrome mouse models. PLoS Biol 8(11):e1000543
Riordan JR, Rommens JM, Kerem B, Alon N et al (1989) Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245:1066–1073
Rowe SM, Miller S, Sorscher EJ (2005) Cystic fibrosis. N Engl J Med 352:1992–2001
Santer R, Rischewski J, von Weihe M, Niederhaus M et al (2005) The spectrum of Aldolase B (ALDOB) mutations and the prevalence of hereditary fructose intolerance in Central Europe. Hum Mut 25(6):594
Stenson PD, Mort M, Ball EV, Shaw K, Phillips AD, Cooper DN (2014) The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133(1):1–9
Tabor HK, Auer PL, Jamal SM, Chong JX et al (2014) Pathogenic variants for Mendelian and complex traits in exomes of 6517 European and African Americans: implications for the return of incidental results. Am J Hum Genet 95(2):183–193
Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM, Broad GO, Seattle GO, on behalf of the NHLBI Exome Sequencing Project (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6069):64–69 (PMID: 22604720)
Thornton KR, Foran AJ, Long AD (2013) Properties and modeling of GWAS when complex disease risk is due to non-complementing, deleterious mutations in genes of large effect. PLoS Genet 9(2):e1003258
Tomita Y, Takeda A, Okinaga S, Tagami H, Shibahara S (1989) Human oculocutaneous albinism caused by single base insertion in the tyrosinase gene. Biochem Biophys Res Commun 164:990–996
Tripathi RK, Droetto S, Spritz RA (1992) Many patients with ‘tyrosinase-positive’ oculocutaneous albinism have tyrosinase gene mutations (abstract). Am J Hum Genet 51(suppl):A179
Wahlund S (1928) Zusammensetzung von Population und Korrelationserscheinung vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas 11:65–106
Weinberg W (1908) Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg 64:368–382
Wright S (1921) Systems of mating, I-V. Genetics 6:111–178
Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159
Wright S (1937) The distribution of gene frequencies in populations. Proc Natl Acad Sci 31(12):382–389
Xue Y, Chen Y, Ayub Q, Huang N et al (2012) Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet 91(6):1022–1032
Zhang L, Karsten P, Hamm S, Pogson JH et al (2013) TRAP1 rescues PINK1 loss-of-function phenotypes. Hum Mol Genet 22(14):2829–2841
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schrodi, S.J., DeBarber, A., He, M. et al. Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data. Hum Genet 134, 659–669 (2015). https://doi.org/10.1007/s00439-015-1551-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-015-1551-8