Using Statistics to Shed Light on the Dynamics of the Human Genome: A Review

  • Francesca Chiaromonte
  • Kateryna D. Makova
Part of the Contributions to Statistics book series (CONTRIB.STAT.)


In this article we review a number of recent studies in which information derived from genomic alignments and data concerning composition, location and biochemical features of the nuclear DNA are used to investigate salient properties and determinants of change (mutations) in the human genome. The studies under review, all conducted by an interdisciplinary group of investigators at The Pennsylvania State University, required the use of a range of statistical techniques—from regression, to multivariate analysis, to the modeling of latent structures.


Repeat Number Fragile Site Mutagenic Process Genomic Landscape Microsatellite Mutability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We wish to thank G. Ananda, A. Fungtammasan, Y.D. Kelkar, E.M. Kvikstad, P. Kuruppumullage Don and S. Tyekucheva—the brilliant and hard working graduate students that took the lead and collaborated with each other in the studies reviewed in this article. We also wish to thank our collaborators in the Center for Medical Genomics of The Pennsylvania State University, in particular K. Eckert whose group performed experimental work critical for our studies of microsatellites and common fragile sites. Finally, we are in debt to a reviewer of this manuscript who offered useful and interesting comments on our work. Our research over the years has been supported by various sources; particularly important for the studies reviewed here were awards from the NSF (DBI 0965596) and the NIH (General Medical Sciences R01 GM087472-01).


  1. 1.
    Ananda, G., Chiaromonte, F., Makova, K.D.: A genome-wide view of mutation rate co-variation using multivariate analyses. Genome Biol. 12(3), R27 (2011)CrossRefGoogle Scholar
  2. 2.
    Kvikstad, E.M., Makova, K.D.: The (r)evolution of SINE vs LINE distributions in primate genomes: Sex chromosomes are important. Genome Res. 20, 600–613 (2010)CrossRefGoogle Scholar
  3. 3.
    Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, pp. 21–123. Academic, New York (1969)CrossRefGoogle Scholar
  4. 4.
    Hasegawa, M., Kishino, H., Yano, T.: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22(2), 160–74 (1985)CrossRefGoogle Scholar
  5. 5.
    Webster, M.T., Smith, N.G., Ellegren, H.: Microsatellite evolution inferred from human-chimpanzee genomic sequence alignments. Proc. Nat. Acad. Sci. USA 99, 8748–8753 (2002)CrossRefGoogle Scholar
  6. 6.
    Li, W.H., Yi, S., Makova, K.D.: Male-driven evolution. Curr. Opinion Genetics Develop. 12, 650–656 (2002)CrossRefGoogle Scholar
  7. 7.
    Gaffney, D.J., Keightley, P.D.: The scale of mutational variation in the murid genome. Genome Res. 15, 1086–1094 (2005)CrossRefGoogle Scholar
  8. 8.
    Kvikstad, E.M., Tyekucheva, S., Chiaromonte, F., Makova, K.D.: A macaque’s-eye view of human insertions and deletions: differences in mechanisms. PLoS Comput. Biol. 3(9)e176, 1772–1782 (2007)Google Scholar
  9. 9.
    Tyekucheva, S., Makova, K.D., Karro, J., Hardison, R.C., Miller, W., Chiaromonte, F.: Human-macaque comparisons illuminate variation in neutral substitution rates. Genome Biol. 9(4), 76 (2008)CrossRefGoogle Scholar
  10. 10.
    Kelkar, Y.D., Tyekucheva, S., Chiaromonte, F., Makova, K.: The genome-wide determinants of microsatellite evolution. Genome Res. 18, 30–38 (2008)CrossRefGoogle Scholar
  11. 11.
    Kelkar, Y.D., Strubczewski, N., Hile, S.E., Chiaromonte, F., Eckert, K.A., Makova, K.D.: What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats. Genome Biol. Evolu. 2, 620–635 (2010)CrossRefGoogle Scholar
  12. 12.
    International HapMap Consortium: The International HapMap Project. Nature 426(6968), 789–96 (2003)CrossRefGoogle Scholar
  13. 13.
    International HapMap Consortium: A haplotype map of the human genome. Nature 437(7063), 1299–320 (2005)CrossRefGoogle Scholar
  14. 14.
    Ananda, G., Walsh, E., Jacob, K.D., Krasilnikova, M., Eckert, K.A., Chiaromonte, F., Makova, K.D.: Distinct mutational behaviors distinguish simple tandem repeats from microsatellites in the human genome. Genome Biol. Evolu. 5(3), 606–620 (2012)CrossRefGoogle Scholar
  15. 15.
    1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–73 (2010)CrossRefGoogle Scholar
  16. 16.
    Muggeo, V.: Estimating regression models with unknown break-points. Stat. Med. 22(19), 3055–71 (2003)CrossRefGoogle Scholar
  17. 17.
    Muggeo, V.: Segmented: an R package to fit regression models with broken-line relationships. R. News. 8, 20–25 (2008).
  18. 18.
    Fungtammasan, A., Walsh, E., Chiaromonte, F., Eckert, K.A., Makova, K.D.: A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome? Genome Res. 22, 993–1005 (2012)CrossRefGoogle Scholar
  19. 19.
    Mrasek, K., Schoder, C., Teichmann, A.C., Behr, K., Franze, B., Wilhelm, K., Blaurock, N., Claussen, U., Liehr, T., Weise, A.: Global screening and extended nomenclature for 230 aphidicolin-inducible fragile sites, including 61 yet unreported ones. Int. J. Oncol. 36, 929–940 (2010)Google Scholar
  20. 20.
    Kuruppumullage, D.P., Ananda, G., Chiaromonte, F., Makova, K.D.: Segmenting the human genome based on states of neutral genetic divergence. Proc. Nat. Acad. Sci. USA 110(36), 14699–14704 (2013)CrossRefGoogle Scholar
  21. 21.
    Majoros, W.H., Pertea, M., Antonescu, C., Salzberg, S.L., Glimmer, M.: Exonomy and unveil: three ab initio eukaryotic gene finders. Nucleic Acids Res. 31(13), 3601–3604 (2003)CrossRefGoogle Scholar
  22. 22.
    Ernst, J., et al.: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473(7345), 43–49 (2011)CrossRefGoogle Scholar
  23. 23.
    Dunham, I., ENCODE Project Consortium, et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)CrossRefGoogle Scholar
  24. 24.
    Taramasco, O., Bauer, S.: R package RHmm. (2007)
  25. 25.
    Eddy, S.R.: What is a hidden Markov model? Nature Biotechnol. 22(10), 1315–1316 (2004)CrossRefGoogle Scholar
  26. 26.
    Hodgkinson, A., Chen, Y., Eyre-Walker, A.: The large scale distribution of somatic mutations in cancer. Hum. Mut. 33(1), 136–143 (2012)CrossRefGoogle Scholar
  27. 27.
    Davoli, et al.: Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape in the cancer genome. Cell 155(4), 948–962 (2013)Google Scholar
  28. 28.
    Makova, K.D., Li, W.H.: Strong male-driven evolution of DNA sequences in humans and apes. Nature 416(6881), 624–626 (2002)CrossRefGoogle Scholar
  29. 29.
    Carrel, L., Park, C., Tyekucheva, S., Dunn, J., Chiaromonte, F., Makova, K.D.: Genomic environment predicts expression patterns on the human inactive X chromosome. PLoS Gen. 2(9) e151, 1477–1486 (2006)Google Scholar
  30. 30.
    Cook, R.D., Li, B., Chiaromonte, F.: Dimension reduction in regression without matrix inversion. Biometrika 94, 569–584 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Tyekucheva, S., Chiaromonte, F.: Augmenting the bootstrap to analyze high dimensional genomic data (invited discussion article). Test 17, 1–18 (article) and 47–55 (rejoinder) (2008)Google Scholar
  32. 32.
    Chiaromonte F., Yang S., Elnitski L., Bing Yap V., Miller W., Hardison R.C.: Association between divergence and interspersed repeats in mammalian noncoding genomic DNA. Proc. Natl. Acad. Sci. USA. 98(25), 14503–14508 (2001)Google Scholar
  33. 33.
    Hardison R.C., Roskin K.M., Yang S., Diekhans M., Kent J.W., Weber R., Elnitski L., Li J., O'Connor M., Kolbe D., Schwartz S., Furey T.S., Whelan S., Goldman N., Smit A., Miller W., Chiaromonte F., Haussler D.: Co-variation in frequencies of substitution, deletion, transposition and recombination during eutherian evolution. Genome Res. 13, 13–26 (2003)Google Scholar
  34. 34.
    Yang S., Smit A.F., Schwartz S., Chiaromonte F., Roskin K. M., Haussler D., Miller W., Hardison R.C.: Patterns of insertions and their covariation with substitutions in the rat, mouse and human genomes. Genome Res. 14, 517–527 (2004)Google Scholar
  35. 35.
    Hodgkinson, A., Chen, Y., Eyre-Walker, A.: The large scale distribution of somatic mutations 534 in cancer. Hum. Mut. 33(1), 136–143 (2012)Google Scholar
  36. 36.
    Lukusa T., Fryns J.P.: Human chromosome fragility. Biochim Biophys Acta. 1779, 3–16 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Francesca Chiaromonte
    • 1
    • 2
    • 3
  • Kateryna D. Makova
    • 2
    • 3
    • 4
  1. 1.Department of StatisticsThe Pennsylvania State UniversityUniversity ParkUSA
  2. 2.Center for Medical GenomicsThe Pennsylvania State UniversityUniversity ParkUSA
  3. 3.Huck Institutes of the Life SciencesThe Pennsylvania State UniversityUniversity ParkUSA
  4. 4.Department of BiologyThe Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations