Skip to main content

Coevolution of Mathematics, Statistics, and Genetics

  • Reference work entry
  • First Online:
Handbook of the Mathematics of the Arts and Sciences
  • 290 Accesses

Abstract

Genetics is the science of studying heredity. Heredity is the process of transmitting genetic materials from parents to offspring. In genetic studies, hypotheses derived from biological theories and mathematical models are tested with the data from experiments or observations of genetic phenomena using statistical methodologies. Throughout the history of genetics, mathematics and statistics have been extensively used for genetic studies, and genetics, in turn, has influenced many fields of mathematics and statistics. In this chapter, we describe some of the most important mathematical models and statistical methods in the history of genetics. We especially focus on three periods: (1) the early days, when the basic concepts in genetics were established, such as genes, evolution, and inheritance, and mathematical models of such genetic mechanisms were laid out; (2) the period of studying family data from twins or large pedigrees in the mid- to late twentieth century; and (3) the present period of exploring big genetic data by complex modeling and machine learning. We show that various probabilistic models, differential equations, and graph and network theories have been applied to the analysis of genetic data. We also illustrate how statistical issues involved with model fitting, estimation, and hypothesis testing have been raised and resolved in the context of genetic studies, contributing to the field of statistics as well as that of genetics. In the discussion, we suggest some promising mathematical and statistical methods to be applied in future genetic studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65

    Article  Google Scholar 

  • 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526:68–74

    Article  Google Scholar 

  • Aickin M, Gensler H (1996) Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am J Public Health 86:726–728

    Article  Google Scholar 

  • Amos CI (2007) Successful design and conduct of genome-wide association studies. Hum Mol Genet 16:R220–R225

    Article  Google Scholar 

  • Barabási AL, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12:56–68

    Article  Google Scholar 

  • Baron M (2001) The search for complex disease genes: fault by linkage or fault by association? Mol Psychiatry 6:143–149

    Article  Google Scholar 

  • Bartels M, Rietveld MJ, Van Baal C, Boomsma DI (2002) Genetic and environmental influences on the development of intelligence. Behav Genet 32:237–249

    Article  Google Scholar 

  • Bates GP (2005) History of genetic disease: The molecular genetics of Huntington disease – a history. Nat Rev Genet 6:766–773

    Article  Google Scholar 

  • Biau DJ, Jolles BM, Porcher R (2010) P value and the theory of hypothesis ttesting: an explanation for new researchers. Clin Orthop Relat Res 468:885–892

    Article  Google Scholar 

  • Blackstock WP, Weir MP (1999) Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol 17:121–127

    Article  Google Scholar 

  • Brohée S, Helden JV (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinf 7:488

    Article  Google Scholar 

  • Brown TA (2002) Genomes, 2nd edn. Wiley-Liss, Oxford

    Google Scholar 

  • Chen C-Y, Ho A, Huang H-Y, Juan H-F, Huang H-C (2014) Dissecting the human protein-protein interaction network via phylogenetic decomposition. Sci Rep 4:7153

    Article  Google Scholar 

  • Chiras D (2012) Human biology, 7th edn. Jones & Barrett Learning, Sudbury

    Google Scholar 

  • Chong JX, Buckingham KJ, Jhangiani SN, Boehm C (2015) The genetic basis of mendelian phenotypes: discoveries, challenges, and opportunities. Am J Hum Genet 97:199–215

    Article  Google Scholar 

  • Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT (2011) Basic statistical analysis in genetic case-control studies. Nat Protoc 6:121–133

    Article  Google Scholar 

  • Clayton D (2003) P-values, false discovery rates, and Bayes factors: how should we assess the “significance” of genetic associations? Ann Hum Genet 67:630

    Google Scholar 

  • Compeau PE, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29:987–991

    Article  Google Scholar 

  • Cox DR (2002) Karl Pearson and the Chi-Squared Test. In: Huber-Carol C, Balakrishnan N, Nikulin M, Mesbah M (eds) Goodness-of-fit test and model validity (Statistics for industry and technology). Springer Science+Business Media, Boston, pp 3–8

    Chapter  Google Scholar 

  • Crow JF (1987) Population Genetics History: A Personal view. Annu Rev Genet 21:1–22

    Article  Google Scholar 

  • Crow JF (2002) Perspective: here’s to Fisher, additive genetic variance, and the fundamental theorem of natural selection. Evolution 56:1313–1316

    Google Scholar 

  • Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper and Row, New York

    MATH  Google Scholar 

  • Dawn-Teare M, Barrett JH (2005) Genetic linkage studies. Lancet 366:1036–1044

    Article  Google Scholar 

  • De Bruijn NG (1946) A combinatorial problem. Koninklijke Nederlandse Akademie v Wetenschappen 49:758–764

    MathSciNet  MATH  Google Scholar 

  • Deary IJ, Spinath FM, Bates TC (2006) Genetics of intelligence. Eur J Hum Genet 14:690–700

    Article  Google Scholar 

  • Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 32:227–234

    Article  Google Scholar 

  • Dunn R, Dudbridge F, Sanderson C (2005) The use of edge-betweenness clustering to investigate biological function in protein interaction networks. BMC Bioinf 6:39

    Article  Google Scholar 

  • Edwards AWF (1977) Foundations of mathematical genetics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Edwards AWF (2008) G. H. Hardy (1908) and Hardy–Weinberg equilibrium. Genetics 179:1143–1150

    Article  Google Scholar 

  • Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML (2014) Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 23:5866–5878

    Article  Google Scholar 

  • Fairbanks DJ, Schaalje GB (2007) The tetrad-pollen model fails to explain the bias in Mendel’s pea (Pisum sativum) experiments. Genetics 177:2531–2534

    Article  Google Scholar 

  • Falconer DS, MacKay TFC (1996) Introduction to quantitative genetics, 4th edn. Longmans Green, Harlow

    Google Scholar 

  • Fisher RA (1924) On a distribution yielding the error functions of several well known statistics. In: Proceedings of the International Congress of Mathematics, vol 2, Toronto, pp 806–813

    Google Scholar 

  • Fisher RA (1930) The genetical theory of natural selection. Clarendon, Oxford

    Book  MATH  Google Scholar 

  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512

    Article  Google Scholar 

  • Freimer N, Sabatti C (2003) The human phenome project. Nat Genet 34:15–21

    Article  Google Scholar 

  • Galton F (1874) On men of science, their nature and their nurture. In: Proceedings of the Royal Institution of Great Britain, vol 7, pp 227–236

    Google Scholar 

  • Galton F (1886) Regression Towards Mediocrity in Hereditary Stature. J Anthropol Inst G B Irel 15:246–263

    Google Scholar 

  • Gerlai R (2002) Phenomics: fiction or the future? Trends Neurosci 25:506–509

    Article  Google Scholar 

  • Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351

    Article  Google Scholar 

  • Gusella JF, Wexler NS, Conneally PM, Naylor SL, Anderson MA, Tanzi RE, Watkins PC, Ottina K, Wallace MR, Sakaguchi AY et al (1983) A polymorphic DNA marker genetically linked to Huntington’s disease. Nature 306:234–238

    Article  Google Scholar 

  • Halligan D, Keightley P (2006) Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res 16:875–884

    Article  Google Scholar 

  • Hardy GH (1908) Mendelian proportions in a mixed population. Science 28:49–50

    Article  Google Scholar 

  • Hindorff LA MJEBI, Morales J (European Bioinformatics Institute), Junkins HA, Hall PN, Klemm AK, Manolio TA (2018) A catalog of published genome-wide association studies. Available at: http://www.ebi.ac.uk/gwas. Accessed at Mar 2018

  • Ikram MK, Sim X, Jensen RA, Cotch MF, Hewitt AW, Ikram MA, Wang JJ, Klein R, Klein BE, Breteler MM et al. (2010) Four novel Loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo. PLoS Genet 28:e1001184

    Google Scholar 

  • Illumina (2010) Techinical note: software for tag single nucleotide polymorphism selection. Illumina, San Diego

    Google Scholar 

  • Karczewski KJ (2018) Integrative omics for health and disease. Nat Rev Genet 19:299–310

    Article  Google Scholar 

  • Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L et al (2009) Human protein reference database – 2009 update. Nucleic Acids Res 37:D767–D772

    Article  Google Scholar 

  • Kiechle FL, Zhang X, Holland-Staley CA (2004) The -omics era and its impact. Arch Pathol Lab Med 128:1337–1345

    Article  Google Scholar 

  • Laird NM, Lange C (2006) Family-based designs in the age of large-scale gene-association studies. Nat Rev Genet 7:285–394

    Article  Google Scholar 

  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  Google Scholar 

  • Lange K, Papp JC, Sinsheimer JS, Sobel EM (2014) Next generation statistical genetics: modeling, penalization, and optimization in high-dimensional data. Annu Rev Stat Appl 1:279–300

    Article  Google Scholar 

  • Liew S, Elsner H, Spector T, Hammond C (2005) The first “classical” twin study? Analysis of refractive error using monozygotic and dizygotic twins published in 1922. Twin Res Hum Genet 8:198–200

    Google Scholar 

  • Lin J-R, Cai Y, Zhang Q, Zhang W, Nogales-Cadenas R, Zhang ZD (2016) Integrated post-GWAS analysis sheds new light on the disease mechanisms of schizophrenia. Genetics 204:1587–1600

    Article  Google Scholar 

  • Lobo I, Shaw K (2008) Thomas Hunt Morgan, genetic recombination, and gene mapping. Nat Educ 1:205

    Google Scholar 

  • Long T, Hicks M, Yu HC, Biggs WH, Kirkness EF, Menni C, Zierer J, Small KS, Mangino M, Messier H (2017) Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat Genet 49:568–578

    Article  Google Scholar 

  • Lu Y-F, Goldstein DB, Angrist M, Cavalleri G (2014) Personalized medicine and human genetic diversity. Cold Spring Harb Perspect Med 4:a008581

    Article  Google Scholar 

  • Luo F, Yang Y, Chen CF, Chang R, Zhou J, Scheuermann RH (2007) Modular organization of protein interaction networks. Bioinformatics 23:207–214

    Article  Google Scholar 

  • MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J, Pendlington ZM, Welter D, Burdett T, Hindorff L, Flicek P, Cunningham F, Parkinson H (2017) The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 45:D896–D901

    Article  Google Scholar 

  • Magnello ME (1998) Karl Pearson's mathematization of inheritance: from ancestral heredity to Mendelian genetics (1895-1909). Ann Sci 55:35–94

    Article  MathSciNet  MATH  Google Scholar 

  • Magnello ME (2004) The reception of mendlism by the biometricians and the early Mendlians. In: Keynes M, Edwards AWF, Peel R (eds) A century of Mendelism in human genetics. CRC Press, Boca Raton, pp 17–30

    Google Scholar 

  • Masel J (2011) Genetic drift. Curr Biol 21:R837–R838

    Article  Google Scholar 

  • McClearn GE, Johansson B, Berg S, Pedersen NL, Ahern F, Petrill SA, Plomin R (1997) Substantial genetic influence on cognitive abilities in twins 80 or more years old. Science 276:1560–1563

    Article  Google Scholar 

  • McIntosh I, Dunston JA, Liu L, Hoover-Fong JE, Sweeney E (2005) Nail patella syndrome revisited: 50 years after linkage. Ann Hum Genet 69:349–363

    Article  Google Scholar 

  • McKusick-Nathans Institute of Genetic Medicine (2017) “OMIM Entry Statistics” Online Mendelian inheritance in man. Johns Hopkins University, Baltimore

    Google Scholar 

  • Merrriman C (1924) The intellectual resemblance of twins. Psychol Monogr 33:1–58

    Article  Google Scholar 

  • Morton NE (1955) Sequential tests for the detection of linkage. Am J Hum Genet 7:277–318

    Google Scholar 

  • Narayanan T, Gersten M, Subramaniam S, Grama A (2011) Modularity detection in protein-protein interaction networks. BMC Res Notes 4:569

    Article  Google Scholar 

  • Neapolitan RE (2003) Learning Bayesian networks. Prentice Hall, Englewood Cliffs

    Google Scholar 

  • Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

    Article  Google Scholar 

  • Orel V (2009) The “useful questions of heredity” before Mendel. J Hered 100:421–423

    Article  Google Scholar 

  • Park H, Lee S, Kim HJ, Ju YS, Shin JY, Hong D, von Grotthuss M, Lee DS, Park C, Kim JH, Kim B, Yoo YJ, Cho SI, Sung J, Lee C, Kim JI, Seo JS (2012) Comprehensive genomic analyses associate UGT8 variants with musical ability in a Mongolian population. J Med Genet49:747–752

    Article  Google Scholar 

  • Pasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H, Gupta N, Neale BM et al (2012) Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet 44:631–635

    Article  Google Scholar 

  • Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 5 50:157–175

    Article  MATH  Google Scholar 

  • Polderman TJC, Benyamin B, Leeuw CAD, Sullivan PF (2015) Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet 47:702–712

    Article  Google Scholar 

  • Power RA, Steinberg S, Bjornsdottir G, Rietveld CA, Abdellaoui A, Nivard MM, Johannesson M, Galesloot TE, Hottenga JJ et al (2015) Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nat Neurosci 18:953–955

    Article  Google Scholar 

  • Pulst SM (1999) Genetic linkage studies. Arch Neurol 56:667–672

    Article  Google Scholar 

  • Raja K, Patrick M, Gao Y, Madu D, Yang Y, Tsoi LC (2017) A review of recent advancement in integrating omics data with literature mining towards biomedical discoveries. Int J Genomics 2017:6213474

    Article  Google Scholar 

  • Renwick JH (1956) Nail-patella syndrome: evidence for modification by alleles at the main locus. An Hum Genet 21:159–169

    Article  Google Scholar 

  • Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517

    Article  Google Scholar 

  • Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32:D449–D451

    Article  Google Scholar 

  • Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014) Biological insights from 108 schizophrenia-associated genetic loci. Nature 511:421–427

    Article  Google Scholar 

  • Shalek AK, Benson M (2017) Single-cell analyses to tailor treatments. Sci Transl Med 9:eaan4730

    Article  Google Scholar 

  • Shendure J, Hanlee JI (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145

    Article  Google Scholar 

  • Siddartha M (2016) The gene: an intimate history, 1st edn. Scribner, New York

    Google Scholar 

  • Siemens H (1924) Zwillingspathologie: Ihre Bedeutung; ihre Methodik, ihre bisherigen Ergebnisse. Springer, Berlin

    Book  Google Scholar 

  • Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34:D535–D539

    Article  Google Scholar 

  • Stigler SM (1997) Regression toward the mean, historically considered. Stat Methods Med Res 6:103–114

    Article  Google Scholar 

  • Stigler SM (2010) Darwin, Galton and the statistical enlightenment. J R Stat Soc A Stat 173:469–482

    Article  MathSciNet  Google Scholar 

  • Stram DO (2005) Software for tag single nucleotide polymorphism selection. Hum Genomics 2:144–151

    Article  Google Scholar 

  • Su C, Andrew A, Karagas MR, Borsuk ME (2013) Using Bayesian networks to discover relations. BioData Min 6:6

    Article  Google Scholar 

  • Sun J, Zhao Z (2010) A comparative study of cancer proteins in the human protein-protein interaction network. BMC Genomics 11:S5

    Article  Google Scholar 

  • The Computational Pan-Genomics Consortium (2018) Computational pan-genomics: status, promises and challenges. Brief Bioinform 19:118–135

    Google Scholar 

  • Tian W, Dong X, Zhou Y, Ren R (2011) Predicting gene function using omics data: from data preparation to data integration. In: Kihara D (ed) Protein function prediction for omics era. Springer, London, pp 215–242

    Chapter  Google Scholar 

  • Trivodaliev K, Bogojeska A, Kocarev L (2014) Exploring function prediction in protein interaction networks via clustering methods. PLoS One 9:e99755

    Article  Google Scholar 

  • Tukey JW (1980) We need both exploratory and confirmatory. Am Stat 34:23–25

    Google Scholar 

  • Visscher PM, Hill WG, Wray NR (2008) Heritability in the genomics era – concepts and misconceptions. Nat Rev Genet 9:255–266

    Article  Google Scholar 

  • Walker F (2007) Huntington’s disease. Lancet 369:218–228

    Article  Google Scholar 

  • Waller JC (2012) Commentary: the birth of the twin study – a commentary on Francis Galton’s ‘The history of twins’. Int J Epidemiol 41:913–917

    Article  Google Scholar 

  • Wang J, Shete S (2017) Testing departure from Hardy-Weinberg proportions. Methods Mol Biol 1666:83–115

    Article  Google Scholar 

  • Weinberg W (1908) Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg 64:368–382

    Google Scholar 

  • Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Article  Google Scholar 

  • Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD et al (1999) Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285:901–906

    Article  Google Scholar 

  • Wood AR, Perry JR, Tanaka T, Hernandez DG, Zheng HF, Melzer D, Gibbs JR, Nalls MA, Weedon MN, Spector TD, Richards JB, Bandinelli S, Ferrucci L, Singleton AB, Frayling TM (2013) Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation. PLoS One 8:e64343

    Article  Google Scholar 

  • Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159

    Article  Google Scholar 

  • Wu X, AlHasan M, Chen J (2014) Pathway and network analysis in proteomics. J Theor Biol 2014:44–52

    Article  Google Scholar 

  • Yates F, Mather K (1963) Ronald Aylmer Fisher, 1890–1962. Biogr Mem Fellows R Soc 9:91–129

    Article  Google Scholar 

  • Zhang J, Chiodini R, Badr A, Zhang G (2011) The impact of next-generation sequencing on genomics. J Genet Genomics 38:95–109

    Article  Google Scholar 

  • Zitvogel L, Galluzzi L, Viaud S, Vétizou M, Daillère R, Merad M, Kroemer G (2015) Cancer and the gut microbiota: an unexpected link. Sci Transl Med 7:271ps1

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant NRF-2015R1A1A3A04001269 and NRF-2018R1A2B6008016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Joo Yoo .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Yoo, Y.J. (2021). Coevolution of Mathematics, Statistics, and Genetics. In: Sriraman, B. (eds) Handbook of the Mathematics of the Arts and Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-57072-3_28

Download citation

Publish with us

Policies and ethics